RSS Feeds

Migrating the OpenOffice.Org forums to a common code-base

Background

The OpenOffice.org (“OOo”) forums (“Community Forums”) started life in Nov 2007 with the creation of the English forum running on phpBB 3.0.0 on a Sun Coolstack platform with PostgreSQL 8.1 on Solaris x86 infrastructure in the Sun Munich Facility. We subsequently upgraded to phpBB 3.0.1 and PostgreSQL 8.2 and added the Hungarian, French, Spanish and Japanese, Vietnamese and Chinese forums. Each forum had its own branding tweaks.

However, given the proliferation of forums, the general growth in transactional volumes and the fact that phpBB forum application is typically upgraded every six months, the administration overhead was becoming just to heavy. We therefore decided to upgrade the H/W infrastructure and install a clean re-baseline for all forums. This note describes the broad approach that I adopted for the migration of the forums from their current platform to the new installation. The starting point we faced was therefore a set of Community Forums each running on its own separate phpBB instance, all at version phpBB 3.0.1, but with disjoint code hierarchies, over a common PostgreSQL 8.2 database on a Solaris/Coolstack platform. The content of the Community Forums is largely maintained in the central database, though attachment and image data are maintained in the file separate file hierarchies, and all such content had to be preserved on migration. The new hardware platform is running Solaris 10, plus Coolstack 1.3.1 including the optional CSKPython package, and the databases to be migrated to the the 32bit MySQL 5.0 that is bundled with CSK.

The key migration goal was to move all the Community Forums to a single shared code-base, but with each retaining it own set of phpBB tables within MySQL to implements the language-specific forum (an “NL Forum”) so that each NL Forum could retain its own designated lead “NL Administrator” with final responsibility for configuration of the forum within the administration functions provided by phpBB (which is maintained within the set of table for that NL instance).

Whilst the phpBB installation is largely “out of the box”, I had to maintain some custom functionality (such as OOo branding) that has been developed to meet specific requirements. There are also a large number of supported languages (typically one per NL forum, but some have 2 or 3). Whilst all supported languages are available to all NL Forums, it is up to the lead NL Administrators to decided which languages are enabled for their forums, and also decide on the default language.

Directory and File Organisation

I decided to follow the Debian convention (sorry Sun) for maintaining the common code-base under a directory hierarchy in /var/lib/phpBB with each version having its own comXYY sub-directory, so phpBB version 3.04 is maintained under /var/lib/phpBB/com304.

Each instance is maintained under /var/www/NN/forum, where NN is the two letter language identifier (e.g. “en” for English) with sym-links back to relevant comXYY file / directory. Unfortunately phpBB establishes its root by any script that is directly executed through a URI examining its current directory. Hence these URI referenced scripts must be executed in the correct /var/www/NN sub-directory. These are:

Each forum has private copies of these sub-directories (the forum root, adm, download) along with the other sub-directories that contain forum-specific files (cache, download, files, avatars-upload and store). The remaining subdirectories can be are sym-linked to the corresponding directory in the common repository. Note that whilst listforums.php must be executed from /var/www/NN/forum/listforums.php for the paths to work, this itself can in fact be a sym-link to the corresponding listforums.php in the /var/lib/phpBB/comXYY directory.

In general the forum sub-directories are cleanly split into holding shared or instance-specific data. The one exception which is a little messy is the images subdirectory. This contains shared content such as the avatar image galleries and instance-specific data such as uploaded avatars. However, the locations of this instance-specific data is configurable and I have therefore tidied this up by moving these instance-specific data to a separate subdirectory, so that images can be shared:

Config item

Type

Current Default

New Values

avatar_path

Specific

images/avatars/upload

avatars-upload

avatar_gallery_path

Shared

images/avatars/gallery

images/avatars/gallery

upload_icons_path

Shared

images/upload_icons

images/upload_icons

ranks_path

Shared

images/ranks

images/ranks


So the final mapping looks like this:

forum:
  adm
  avatars-upload
  cache
  download
  files
  store
  common.php -> /var/lib/phpBB/comXYY/common.php
  config.php -> /var/lib/phpBB/comXYY/config.php
  cron.php -> /var/lib/phpBB/comXYY/cron.php
  docs -> /var/lib/phpBB/comXYY/docs
  faq.php -> /var/lib/phpBB/comXYY/faq.php
  images -> /var/lib/phpBB/comXYY/images
  includes -> /var/lib/phpBB/comXYY/includes
  index.php -> /var/lib/phpBB/comXYY/index.php
  language -> /var/lib/phpBB/comXYY/language
  mcp.php -> /var/lib/phpBB/comXYY/mcp.php
  memberlist.php -> /var/lib/phpBB/comXYY/memberlist.php
  posting.php -> /var/lib/phpBB/comXYY/posting.php
  report.php -> /var/lib/phpBB/comXYY/report.php
  search.php -> /var/lib/phpBB/comXYY/search.php
  style.php -> /var/lib/phpBB/comXYY/style.php
  styles -> /var/lib/phpBB/comXYY/styles
  ucp.php -> /var/lib/phpBB/comXYY/ucp.php
  viewforum.php -> /var/lib/phpBB/comXYY/viewforum.php
  viewonline.php -> /var/lib/phpBB/comXYY/viewonline.php
  viewtopic.php -> /var/lib/phpBB/comXYY/viewtopic.php
forum/adm:
  images -> /var/lib/phpBB/comXYY/adm/images
  index.php -> /var/lib/phpBB/comXYY/adm/index.php
  style -> /var/lib/phpBB/comXYY/adm/style
  swatch.php -> /var/lib/phpBB/comXYY/adm/swatch.php
forum/avatars-upload:
  <instance specific uploaded avatars go here>
forum/cache:
  index.htm -> /var/lib/phpBB/comXYY/cache/index.htm
    <instance specific generate cache files go here>
forum/download:
  file.php -> /var/lib/phpBB/comXYY/download/file.php
  index.htm -> /var/lib/phpBB/comXYY/download/index.htm
forum/files:
  index.htm -> /var/lib/phpBB/comXYY/files/index.htm
  <instance specific uploaded attachment files go here>

Caching Setup and Policy

Whilst the above may all seem convolved, I have also enabled the APC (Another PHP Cache) cache for the site.

apc.enabled=1
apc.shm_segments=1
apc.cache_by_default=1
apc.optimization=0
apc.shm_size=64
apc.ttl=240
apc.user_ttl=240
apc.gc_ttl=120
apc.stat=1
apc.num_files_hint=1024
apc.user_entries_hint=200
apc.mmap_file_mask=/dev/zero
apc.enable_cli=1

And one major benefit of this sym-linking is that APC resolves any sym-links to the true file path. It therefore treats all NL instances as a single common set for caching. Hence all NL instance code-bases map onto a single cached byte-code copy within APC, excepting the the truly instance-specific cache directories. The net result is that a 64Mb APC code cache can happily cache all forums (which it couldn’t do previously). This results in zero cache churn. In other words our live system does no PHP code compilation.

SQL Query Caching

As I will describe in an another note (A PHP Cache Overview and Cache Tuning), phpBB implements SQL Query Caching at an application level, that it caches some costly but low volatility queries (typically forum-specific meta data). The phpBB out-of-the-box used a standard file-based cache, but this was proving to have quite a performance hit on Solaris for our transactional loads, so I explored some alternatives. The phpBB application provides hooks to allow the implementation of alternative cache strategies, and tthe phpBB code repository includes an example which is based on APC data caching. Unfortunately a bug in our version of Coolstack means that using APC data caching can hang Apache, so I therefore used this APC cache example as a template to implement my own MySQL based query cache. Now this may seem like a contradiction: to cache SQL queries in a SQL table, but what this achieves in practice is to replace a complex execution plan by a simple query of the format:

SELECT cached_results, ttl FROM cache_table WHERE query_hash='<some MD5>'.

And this type of query is what MySQL itself caches in its own in-memory query cache. Hence, even though using this approach involves calling MySQL, the resulting query is memory-cached with a high hit rate.

Conclusions

The net result on this migration is that I’ve now moved all NL forums onto a common code base that is highly cached. We avoid all PHP code compilation and a high percentage of putative disk I/O is avoided through file or MySQL caching (at least 95% of pre migration I/O). The net result is that whilst CPU utilisations on the previous system were often in excess of 40% and the forums poorly responsive at peek periods, the current system now runs at 1-3% utilisation and response is always good.

As I will describe in a later note, this overall approach also materially simplifies forum upgrade.