The Anatomy and Timing of a Web Request – Part II

In Part I of this analysis, I looked at the overall timeline of viewing a webpage, and my main recommendations were:

  • The correct webserver configuration is essential to ensure that files are correctly cached at the client’s browser and compressed to ensure that any network transfer times are kept to an absolute minimum when content is transferred over the internet.
  • The application should be written to minimise the number of supporting files needed to render the page by aggregating content where practical.  It should also ensure that when the client browser revalidates scripted content that the script processes the request correctly and issues a status 304 request when appropriate.

Whilst the application changes are beyond the scope of most application installers, getting the webserver correct, through properly configured .htaccess files can easily improve response times by a factor of 3 or more.  Having done this though, the application response for the delivery of the main document content becomes the main performance constraint and I want to use phpBB to explore the factors which drive this response time.  Whilst I realise that this article is in many ways a reprise of two earlier articles, it is clear from my dialogue on the phpBB developers forum that we continue to talk at cross purposes, so I wanted to drill down and parameterise some of these performance factors to put quantitative numbers of this responsiveness.

To do this, I wrote a small script (see Listing 1) which I ran on local server and used this to “hit” the harness (see Listing 2) at my Webfusion based forum (pointed to by $site) using this script with the output logged to a file.  Running the three timings serially like this is safe because the file itself is in the target directory so all of the path Inodes are resolved outside the timing windows, and the three variants load disjoint versions of the phpBB common.php include hierarchy:

  • the ‘out-of-the box’ load which hierarchically loads 14 source files comprising some 15.5K lines of source;
  • a compacted version of these 14 files which comprises 1 source file with whitespace collapsed, comments removed and therefore a reduced 11K lines of source;
  • a variant of the 1 source file where the single source file is gzip compressed and included through a PHP zlib stream.

I let this run for 36 hrs and dropped the results into a spreadsheet for analysis.  Figure 1 to the right (click to enlarge) shows a scatter plot and linear regression fits for the three cases with the delay between queries in minutes as the X axis and time taken in mSec as the logarithmic Y axis (Hence the linear trends looking bendy).  The formulae give the time in millseconds with N the number of minutes delay.

  • 14 files: 77 + 15 N (Blue X’s and trend line)
  • 1 file: 71 + 0.1 N (Red Squares and trend line)
  • 1 gzip file: 77 + 0.4 N (Yellow Diamonds and trend line)

Figure 2 shows the error histograms around these three linear fits.  The 14 file case has a standard deviation of 211 mSec and a typical log-normal skew so the top 10% are > 250 mSec greater than the trend.  The variance on the other two cases is bugger-all (3 mSec for the 1 file case), so in other words the response for the 1 file compacted case is a solid 71-75 mSec, the 14 file load is double that at a 5 minute interval, treble at a 10 minute interval and the actual times are all over the place, so the top 10% about 6x the single file load version.  The 71 mSec pretty much represents a base load as this inclusion is roughly 75% of the code needed to be compiled.

To put this in context, the complete time to execute the PHP script for, say, viewforum.php is as follows.  I have given the ballpark times in mSec for two cases: (a) where phpBB is running on a dedicated server or VM with the system, Apache, PHP and MySQL tuned to memory-cache hot code, tables, and files so as to avoid unnecessary physical I/O, etc, and (b) on a shared service such as Webfusion’s.

Time (mSec)   Dedicated Time (mSec)   Shared Description of Component
0 80 PHP impage activation.  Not needed on Dedicated as PHP is permanently mapped into Apache through mod_php
0 70 Compile ~20,000 lines of PHP needed to run the viewtopic function.  Not needed on Dedicated as PHP code is cached in Opcode Cache.
0 10 – 50 Fetch cached variables from ACM file cache.  Not needed on Dedicated as variables cached in memory ACM cache.
15 15 Execute PHP opcodes needed to run the viewtopic function.
15 30 Execute SQL queries needed to run the viewtopic function.  Significantly less on Dedicated as (i) connection to the mysql instance will be by local socket rather than TCP to database server; (ii) the tables will be memory cached on a dedicated D/B.
0 50-500 Delays in loading data from NFS served files.  On Dedicated, these will be fully cached in the VFC.
30 30 Apache Handling Overhead and stream compression.
0 0 – 1500 Download any uncached images if Apache caching not properly defined in .htaccess files.  This will usually be correctly set up on Dedicated.
60
285-2275
Total

The reason that I’ve said “ballpark” about these timings is that I’ve collected them on separate timing tests.  One thing that I want to add to phpBB is a DEBUG_TIMING option which can be turned on in config.php, so that these stats can be properly to the error log.  Nonetheless, the overall pattern is correct: phpBB is a fairly lightweight application, and the per-request load an a dedicated VM or system should be excellent.  However the timings for running it on a shared server are different.  The service time roughly 0.3 – 2.5 seconds mainly depending how the forum administrator has configured the service (through the .htaccess file) to maximise caching.

File access is the main performance killer on shared services, since Virtual Filesystem Cache (VFC) is rapidly flushed by contention with the other applications.  However, the data storage architecture on Linux services is mitigates this because the VFC acts as a first level cache, with the file system then accessing the shared NAS using an RPC tunnelled NFS protocol.  The NAS itself has a large RAM cache, so active files are normally in cache.  As a consequence the second order file I/O penalty is that of the RPC transport itself, which is again mitigated because the network drivers including the TCP stack run in kernel mode and the IP Gbit Enet switched infrastructure use jumbo frames, and the NFS (at least on the Webfusion services) is configured with a 32 Kbyte blocksize.  Hence the overhead for a VFC cache-miss is RPC to the NAS server and this is only a few milliseconds per I/O if the data content is itself in the NAS cache.

However, because phpBB needs to read some 30+ code and data files per web request in its default configuration, as the gap between transactions increases a higher percentage of these I/Os result in RPC overheads which rapidly aggregate up to become the major component of request response.

Lastly note that all of the times relate to the service time for the request. As soon as the server starts to become resource limited then queuing will add multiplier factors to these delays.


Listing I

file="$site/common_comp.php
for ((i=240; $i; i=$i-1)); do
  sleep $($php -r 'echo rand(1,900);')
  date "+%d %H:%M:%S"
  $wget -o /dev/null -O - "$file?v=0"
  $wget -o /dev/null -O - "$file?v=1"
  $wget -o /dev/null -O - "$file?v=2"
done

Listing 2

<?php
//the float version of microtime truncates the LSDs 
function now() {$x = explode(' ', microtime()); return ( ($x[1] % 1000) + $x[0] ); }  
$path=realpath('.');
$common=array('./_debug_common.php', './common.php', "php://filter/zlib.inflate/resource=common.php.gz");
$v = (int) $_GET['v'];
$start = now();
define('IN_PHPBB', true);
$phpbb_root_path = (defined('PHPBB_ROOT_PATH')) ? PHPBB_ROOT_PATH : './';
$phpEx = substr(strrchr(__FILE__, '.'), 1);
header("Content-Type: text/plain");
//echo "including $common[$v]\n"; 
$status = require( $common[$v] );
$elapsed = now() - $start;
printf("%-50.50s %9.6F (require status=%u)\n", $common[$v], $elapsed, $status);

Leave a Reply