phpBB Performance – Reducing the data cache overhead

As I described in a previous article, you can use Linux utility strace to instrument a web request simulation and suPHP model.

sudo -u www-data strace -tt -o /tmp/strace.log php -r '$_REQUEST["f"]="2"; include( "viewforum.php" );' > /dev/null
egrep "( exec| open| exit)" /tmp/strace.log | vi -

Using $_GET["f"]="2"; and include() are a standard trick to make the script act as if a request parameter f=2 has been included (but because of the way that the phpBB argument parsing function work, you have to set $_REQUEST instead). Suppressing the output prevents console overheads contaminating the result. Filtering this logfile for open statements lets me to track which files are being loaded in the process. By removing the image activation related opens, I get the summary shown in Table 1 below. The PHP environment activation takes some 95 mS and the script execution takes another 110mS. (This is with my filesystem cache fully primed.) The application has to load to execute this request:

  • 18 application script files comprising some 18.5K lines (550Kbytes) of source and
  • 14 data cache files comprising some 288 lines (77Kbytes) of data values. (I’ve indented these in the listing to stand out.)

I know from previous benchmarking that compiling the 18K lines of source should take maybe 40 mS, so this corresponding figure of 100 mS is slightly high, but not unbelievably so. However more to the point, running this script (which simply displays a forum page) involves opening and reading these 32 files. The system overhead for doing this is of the same order of time as the script compilation, but the killer in response time for the end-user is going to be any physical I/O associated with reading data from disk if the content is not already in the filesystem cache. This number of seeks could easily add up a second or two of extra wait time, and up to an order of magnitude greater if there is contention and queuing on the the server’s physical disk subsystem.

Whilst the overheads of compilation and the execution of the application PHP opcodes to deliver the viewforum function are reasonably inflexible; it is also clear to me that these are relatively small percentage of the total response time in a shared service. So as with the techniques that I’ve adopted in my blog, can I make some simple changes to the application to reduce these file loads by a factor of ten say? My answer to this is yes with two strategies:

  • Add a new acm_file caching algorithm which uses a single composite for all cached datasets to replace the existing file-per-dataset cache. I want to spend the rest of this article describing this implementation.
  • Use a one-time code mashelling strategy as I has adopted in my blog to replace all the includes with a single marshalled file load. Clearly this strategy is only optimal in a suPHP-style evironment. It would be a performance killer in an mod_php + APC or Xcache environment. I want to leave this to a subsequent article.

How data caching is used and what changes are required to implement a composite

The data cache is primarily hooked into the Database Abstraction Layer, and largely to address an issue that the application must execute some reasonably complex SQL statements to render the forum pages. These can take quite of execution overhead on each request in the forum database engine, but the actual results are pretty of low volatility, so it makes sense to cache these results sets. An example of this type of query is this SQL join across four tables:

SELECT s.style_id, t.template_storedb, t.template_path, t.template_id, t.bbcode_bitfield, c.theme_path, 
   c.theme_name, c.theme_storedb, c.theme_id, i.imageset_path, i.imageset_id, i.imageset_name 
FROM   styles s, styles_template t,styles_theme c, styles_imageset i 
WHERE  s.style_id = $style 
AND    t.template_id = s.template_id 
AND    c.theme_id = s theme_id 
AND    i.imageset_id = s.imageset_id";

You really don’t want to do this on every request when the result set would be lucky to change one per day.

Unlike the DAL, the phpBB wiki doesn’t as yet provide and programmer documentation on the caching system, so I have to fall back to the started Open-Source method for this one (retro-engineering the virtual documentation from the code). The class acm is used as a common interface which is defined by the various cache implementations such as the standard file cache, acm/acm_file.php, and this must define methods for:

  • creation, loading some common objects, and unloading all objects,
  • tidy to purge stale objects,
  • destroy and purge to delete single, groups and all objects,
  • get and put access methods for single variables,
  • and a slew of SQL query related methods to plugging into the DAL: load, save, exists, fetchrow, fetchfield, rowseek and freeresult.

The objects are created or loaded on demand. Each has a creation timestamp and a time-to-live. If a subsequent request is made within the timestamp + time-to-live then the cached value is used, if not then the object is recreated from the SQL query or whatever, and a periodic task flushes expired objects from the cache.

For boards of moderate complexity, the total number of cache objects is only a few factors greater than that accessed in a single query, for example my forum has a couple of dozen compared to the 14 used in this query. So my putative approach is very simple:

  • maintain all objects in a single serialised structure,
  • zlib compress this,
  • only write back to compressed file the content is dirty – that is has been updated.

OK, so why compress the file? Simple: because the trade-off between runtime and I/O is compelling. A 300K serialised text stream typically compresses by 7x or so. On my laptop it takes a couple of milliseconds to expand this 45 Kbytes. However whilst there is a good chance that a 45 Kbyte file will be read with a physical read, there is some chance (on Ext3 file systems) that a 300K file might be fragment and require multiple reads. The minimal overheads of compression are worth the gain, so my next step is to take the current acm/acm_file.php module and hack this down to create a new acm/acm_globfile.php version.

< 4 hours of code hacking and testing intervenes >

How does the new cache perform?

OK, I now have my working version of what I call the globfile ACM module, I’ve done some testing and it seems to work well. I’ve probably not covered all paths, but it seems stable. At the moment the cache now contains a single file containing all the cache objects. The code to load it into the cache property data, is as follows:

function load()
{
    // grab the global cache
    $cache_contents = @file_get_contents( "$this->cache_dir/cache_data.gz" );        
    $this->data = empty( $cache_contents ) ? array() : unserialize ( gzinflate( $cache_contents ) );
    // now delete any time-expired entries before any access methods are called
    reset( $this->data );
    while ( list( $var_name, $cache_entry) = each ($this->data) ) 
    {
        if ($cache_entry[1] < $this->time_now) 
        {
            $this->$number_aged++;
            unset( $this->data[$var_name] );
        }
    }
    // now initialise any other private variables
    $this->time_now  = time();
    $this->is_modified = false;
    $this->number_aged = 0;
    $this->sql_rowset = array();
    $this->sql_row_pointer = array();
}

I have conformed to the phpBB coding standards which I find rather verbose, as well as using the rather yukky PHP4-compliant object model as phpBB still offers PHP 4.3 support. Still, the new acm_globfile module is 350 lines – roughly half the size of the size of acm_file module. I have given the corresponding strace at Table 2 below. The runtime (with all files cached) is largely unchanged (on this sample the new version is actually slightly smaller). However, the main benefit is that the 14 cache data files read (of some 75 Kbytes) have been replaced by a single file that is considerably smaller. Yes, it is compressed but the runtime overhead of inflating this is minimal. This all helps to reduce the “first request” overhead.

This plugin is compatible with phpBB 3.0.x versions, though it does require the Zlib module to be supported. (This is optional for phhBB itself, but recommended for most installations). I am not concerned about this because nearly all PHP service offerings include Zlib support. For the small minority of shared services that don’t, the sysadmin can still fall back to using the acm_file module.

One of the characteristics of this solution is that it will suffer a scaling fall-off for more active boards with larger numbers of forums. The reason for this is that these transaction rates drive up the number of SQL queries that are cached, and therefore the per-request overhead. However, this isn’t the sweet-spot for this option; in such scenarios the board should be configured for one of the memory based caches, and the admin can still fall back to using the acm_file module. So this change seems to have achieved all my goals here.

I set up a cron job to pull down a forum listing every half hour over a 24-hr window from the Webfusion based forum, first with the default file cache (mean 1.77s, std dev 0.69s) and then with the globfile cache (mean 1.34s, std dev 0.39s). These were consecutive weekdays, so the general load on the Webfusion service was comparable. So the globfile is resulting in at 25% faster response on average, but the long tail is a lot better: the timings are just more consist, so I think that this exercise can be deemed a success.

The next step is to investigate code marshalling techniques and that will be the subject of my next article. However, this one will involve a couple of days development effort that need to be done first.

Just a bit of fun doing the path analysis

What do I mean by path coverage? If you want to know more then read this Wikipedia article, but the essence is that when you make a change to a piece of software, then the presumption is that you may have made logic or coding errors so that you should make sure that you’ve covered all of the possible code paths in the program. OK, if lives depend on this software (like the control system of a passenger plane), then you really need to do this well, but in this case I was just doing a sanity check to ensure that all of the methods were being called in the test requests that I was making of test forum with this new acm_globfile module installed.

The one method that I hadn’t knowing checked was the destroy() method, especially in the case of its purge of SQL query results relating to single and multiple table deletions. So I did a quick grep -R ">destroy(" * of the code tree, and discovered that the multiple table case isn’t actually used, but a good case to test the simple call was to ban a user. (It was my test instance so he’d never know. HeHe.) . It was just a case of monitoring the cache before and after with this bit of magic:

php -r '$x=file_get_contents( "php://filter/zlib.inflate/resource=cache/cache_data.gz"); 
        foreach(array_values(unserialize($x)) as $v) if(sizeof($v)==3) echo date(DATE_ATOM,$v[1])."\t$v[2]\n";'

And bang! Found an error. Oops! Well two actually. I’d replaced an unnecessary loop in the original by using a preg_match(), so instead of looping around for each table (say table_a and table_b) , I’d just use the pattern /(?:\Wtable_a\W|\Wtable_b\W)/. Unfortunately I’d forgotten the closing bracket in the line that code that builds this:

$regexp = '/(?:\W' . implode ( '\W|\W', $table ) . '\W)/';

and (I am ashamed to say) I’d swapped the order of the subject and matches argument in the preg_match() itself. The first of these failed with an error to the Apache error log (which I routinely check for reported errors) and the second failed silently, so only explicit testing picked it up. Still the module is now working well.

Table 1 – Summary of Strace on viewforum.php&f=2

Time (mS)    System Call
  0  execve("/usr/bin/php", ["php", "-r", "$_GET[\"f\"]=\"2\"; include( \"viewfo"...], [/* 17 vars */]) = 0
     ...
 95  open("/var/www/forum/viewforum.php", O_RDONLY) = 3
100  open("/var/www/forum/common.php", O_RDONLY) = 3
103  open("/var/www/forum/config.php", O_RDONLY) = 3
105  open("/var/www/forum/includes/acm/acm_file.php", O_RDONLY) = 3
110  open("/var/www/forum/includes/cache.php", O_RDONLY) = 3
114  open("/var/www/forum/includes/template.php", O_RDONLY) = 3
119  open("/var/www/forum/includes/session.php", O_RDONLY) = 3
135  open("/var/www/forum/includes/auth.php", O_RDONLY) = 3
138  open("/var/www/forum/includes/functions.php", O_RDONLY) = 3
147  open("/var/www/forum/includes/functions_content.php", O_RDONLY) = 3
149  open("/var/www/forum/includes/constants.php", O_RDONLY) = 3
152  open("/var/www/forum/includes/db/mysqli.php", O_RDONLY) = 3
153  open("/var/www/forum/includes/db/dbal.php", O_RDONLY) = 3
155  open("/var/www/forum/includes/utf/utf_tools.php", O_RDONLY) = 3
160  open("/usr/share/mysql/charsets/Index.xml", O_RDONLY) = 4
162     open("/var/www/forum/cache/data_global.php", O_RDONLY) = 4
164  open("/var/www/forum/includes/hooks/index.php", O_RDONLY) = 4
165     open("/var/www/forum/cache/data_hooks.php", O_RDONLY) = 4
166  open("/var/www/forum/includes/functions_display.php", O_RDONLY) = 4
170     open("/var/www/forum/cache/data_bots.php", O_RDONLY) = 4
170  open("/var/www/forum/includes/auth/auth_db.php", O_RDONLY) = 4
172     open("/var/www/forum/cache/sql_2da46b40e41140c377da98b55570311c.php", O_RDONLY) = 4
175     open("/var/www/forum/cache/data_acl_options.php", O_RDONLY) = 4
176     open("/var/www/forum/cache/data_acl_options.php", O_RDONLY) = 4
177  open("/var/www/forum/language/en/common.php", O_RDONLY) = 4
179     open("/var/www/forum/cache/sql_fa08afe4b73d228fd0c41ccc07b840c7.php", O_RDONLY) = 4
180     open("/var/www/forum/cache/data_cfg_theme_prosilver.php", O_RDONLY) = 4
181     open("/var/www/forum/cache/data_cfg_template_prosilver.php", O_RDONLY) = 4
181     open("/var/www/forum/cache/data_cfg_imageset_prosilver.php", O_RDONLY) = 4
183     open("/var/www/forum/cache/sql_327391a14f23d6a19a541f98daaafa19.php", O_RDONLY) = 4
187     open("/var/www/forum/cache/tpl_prosilver_message_body.html.php", O_RDONLY) = 4
188     open("/var/www/forum/cache/tpl_prosilver_overall_header.html.php", O_RDONLY) = 4
190     open("/var/www/forum/cache/tpl_prosilver_overall_footer.html.php", O_RDONLY) = 4
202  exit_group(0)           = ?

Table 2 – Summary of Strace on viewforum.php&f=2 with glob cache

Time (mS)    System Call
  0  execve("/usr/bin/php", ["php", "-r", "$_GET[\"f\"]=\"2\"; include( \"viewfo"...], [/* 17 vars */]) = 0
101  open("/var/www/forum/viewforum.php", O_RDONLY) = 3
105  open("/var/www/forum/common.php", O_RDONLY) = 3
106  open("/var/www/forum/config.php", O_RDONLY) = 3
107  open("/var/www/forum/includes/acm/acm_globfile.php", O_RDONLY) = 3
109  open("/var/www/forum/includes/cache.php", O_RDONLY) = 3
111  open("/var/www/forum/includes/template.php", O_RDONLY) = 3
114  open("/var/www/forum/includes/session.php", O_RDONLY) = 3
122  open("/var/www/forum/includes/auth.php", O_RDONLY) = 3
126  open("/var/www/forum/includes/functions.php", O_RDONLY) = 3
139  open("/var/www/forum/includes/functions_content.php", O_RDONLY) = 3
142  open("/var/www/forum/includes/constants.php", O_RDONLY) = 3
144  open("/var/www/forum/includes/db/mysqli.php", O_RDONLY) = 3
145  open("/var/www/forum/includes/db/dbal.php", O_RDONLY) = 3
147  open("/var/www/forum/includes/utf/utf_tools.php", O_RDONLY) = 3
151    open("/var/www/forum/cache/cache_data.gz", O_RDONLY) = 3
157  open("/usr/share/mysql/charsets/Index.xml", O_RDONLY) = 4
160  open("/var/www/forum/includes/hooks/index.php", O_RDONLY) = 4
161  open("/var/www/forum/includes/functions_display.php", O_RDONLY) = 4
165  open("/var/www/forum/includes/auth/auth_db.php", O_RDONLY) = 4
168  open("/var/www/forum/language/en/common.php", O_RDONLY) = 4
174  open("/var/www/forum/cache/tpl_prosilver_message_body.html.php", O_RDONLY) = 4
175  open("/var/www/forum/cache/tpl_prosilver_overall_header.html.php", O_RDONLY) = 4
177  open("/var/www/forum/cache/tpl_prosilver_overall_footer.html.php", O_RDONLY) = 4
192  exit_group(0)           = ?

Leave a Reply