My academic background was mathematics, specialising in operations research and statistics. I put this to good use when I first started my career in an IT consultancy, in working one the development and use of detailed simulations of large-scale Army Command and Control Systems. You might wonder what on earth modelling large voice communications networks over thirty years ago has to do with using modern web services, but in fact the conceptualisation and analytic techniques are very similar, (though the time constants involved have shrunk from seconds to milliseconds). This foundation in modelling and analysing large communicating sequential systems has proved invaluable in my work in systems optimisation during my career, and has influenced my approach to systems engineering.
What I show in this article is that:
- Correctly optimising your webserver configuration to ensure that files are correctly cached at the client’s browser and compressed to ensure that any network transfer times are kept to an absolute minimum when content is transferred over the internet.
- The application should be written to minimise the number of supporting files needed to render the page by aggregating content where practical. It should also ensure that when the client browser revalidates scripted content that the script processes the request correctly and issues a status 304 request when appropriate.
Using the Google Chrome network analyser
I now want to explore a typical phpBB query in depth, and in one specific case: displaying the phpBB community forum’s board index. To understand what goes on in the interval between a user clicking the board index link and the time taken to complete page is assembled for viewing, you need to have a suitable tool to instrument this process. I recommend using Google Chrome, because the developer tools that you need are part of the standard download (though most modern browsers have an add-on pack which provides similar functionality). The main view that I will use to do this instrumentation is the Network Panel. You can access this when visiting any website with Chrome by typing Shift+Ctrl+i whilst viewing the page.
What I want to do first is to discuss the base case when the browsers local cache is empty. This Chrome explorer window is active so you can click on different options and objects to drill down for more information. Alternatively, you can right-click and select “Export all to HAR” which copies the analysis content to the clipboard in JSON format, then paste this into a test editor and save it to file. I have written a simple PHP filter, (which you can download from here) to convert this HAR format into tab-separated variable (TSV) for loading into Calc or Excel for further analysis (which is what I do). To give you some idea of what this tool does, the screen snapshot to the right shows the display for this index page. (Click on the image to get the full-size version).
If you look at the zoomed figure, then you will see that it took my browser some 5.02 seconds to download the 55 files comprising some 314 Kbytes needed to render the page. Revisiting this page later in the same browsing session took some 1.06 seconds and this time nearly all files were already cached locally on the PC with only 2 files comprising 9 Kbytes needing to be downloaded. If I did a “page refresh” then the time increased to 3.51 seconds but with still only 2 files comprising 9 Kbytes needing to be downloaded.
The user experience on these extremes is very different: on the first view the page is terribly sluggish and stuttered during loading; on the second it is almost a case of click-2-3-bang and the page was displayed. Note that whilst the exact timing will vary from request to request, the general timing pattern will be similar. I want to look at little more into the loading timeline and consider why were these are so different. In the HTTP 1.1 protocol, the individual requests are multiplexed over shared TCP/IP connections, and by convention, the browser will open up at most two such streams per host site referenced. Each request can be thought of as a Remote Procedure Call (RPC) from the browser, where the input parameters are a set of request headers, and the output is a set of response headers with an optional content.
The timeline for the phpBB communitiy website bulletin index
|0 – 873
0 – 975
0 – 898
|598 – 2,132
667 – 706
643 – 1.498
|1,903 – 3,204
708 – 732
1,255 – 1,986
|The 9 CSS files linked in the header are downloaded. These total 85Kb and the static CSS files are uncompressed. Once these are downloaded the browser now has enough information to start to render the page. This takes a few tens of mSec. If the image tags have defined dimensions then the browser can allocate frame-space to hold each image ahead of loading it and subsequently avoid reflow of the page as images download.|
|3,246 – 5,018
733 – 1,024
2,043 – 3,658
|The 40 files linked in the CSS and the HTML body are downloaded. Image files are already in a compressed format, so no further compression is necessary. These total 124 Kbytes. As each is downloaded, the browser adds it to the page.|
There is a big difference for the user between a 1 second and a 5 second delay in rendering a web page. These are typical timings when the phpBB site was relatively lightly loaded. The variance in response means that the worst quartile will be a lot longer than these times. The phpBB admins have got a reasonable set of Apache configuration settings have also configured a Varnish caching accelerator to boost response, but these settings still fall short of optimum. I find the refresh time of ~3.5 seconds quite worrying. A typical phpBB installer using a shared hosting service with default Apache settings would perform at or below this 5 second mark. This is quite unnecessary as the .htaccess files can be used to achieve most of the required performance gains.
I have also got an ODS spreadsheet detail (here) for those interested in looking at the detailed figures. But the main points that I want to emphasise are:
- Loading a single web page results in a cascade of secondary requests needed to view the complete page.
- Most of these (the static elements) can be cached on the user’s local browser cache and you need to ensure that your web server is configured wherever possible (in the <VirtualHost> section or .htaccess files on Apache and the web.config file on IIS) to inform the client browser that they can be cached.
- You need to be aware that many users configure their browsers to revalidate local cache entries once-per-session or even on every page, so you need to configure the Etag or Last-Modified responses so that the browser can bypass the download. Scripted content can also make use of cache negotiation, but again the script author must let the browser know that it can negotiate by setting these responses, and I cover this point in further detail below..
- If the content is being generated by script then remember that users will typically view your site in a burst sequence, visiting maybe 4-6 pages in one session, so script authors need to ask themselves “is this information likely to change on every view, once-per-session, or even very occasionally and set the caching parameters accordingly.
- I have configured my OpenOffice.org phpBB forums to maximise possible caching, but the split is still roughly 75% of user page requests which have locally cached all of this “content furniture” vs. 25% of requests which either download or renegotiate these additional files. This 25% miss-rate may seem a good ‘batting average’, but remember that this will trigger 30+ further requests to the webserver.
- Whenever possible configure the browser (or PHP in the case of scripts) to compress the output. This has minimal compute overhead on the server, but can make a dramatic impact on perceived user response.
- Applications developers should minimise the number of files needed to display a given page, because each separate file brings with it network overheads and per request load on the server. (For example, the Google homepage needs 12 requests, my blog needs 7 requests and the phpBB community board index needs 55 requests.) This is done by aggregating content whenever possible:
- Images used for enriching the page can be grouped together using a technique know as CSS sprites. See So why CSS Sprites? for a discussion of how I did this on my blog.
- The per-request overhead is also dependent on the number of cookies defined for that site as the browser passes a copy of all cookies for a given site back to the server with every request. In the case of phpBB extra 0.5Kb upload for 54 of the requests. For this reason, many sites use a resource-only shadow site to server the static furniture. As this is a second (cookie-less) site, these requests don’t carry this overhead. Also most browsers will parallel up request streams to separate sites. So most high performance sites use this trick. (Google uses ssl.gstatic.com, Wikipedia uses bits.wikimedia.org.)
- Applications developers also need to be aware of the inter-dependencies between files needed to render the webpage. These can serialise download activities and extend the total time needed to render the page. Examples here include:
- Load the CSS needed to render the page ASAP, and this means putting all CCS definitions and links in the HTML header. Don’t nest CCS style sheets using the @import directive.
These guidelines are discussed in detail on the Google Web Performance Best Practices page.
Specific comments on the phpBB website
Just looking at this page, I can cast these general observations about phpBB and this site. Given the expertise of the phpBB team, The fact that I can make them about this site shows that they can make quite a difference to site performance if you aren’t careful. This is more an issue of website configuration rather than an application coding one, because my OOo community forums run over 2x faster than the phpBB ones (1.78s vs 5.02s for clear-cache load and 0.47s vs 1.06s for primed-cache load on the trial that I did.)
- [Defect] The phpBB style.php script neither defines nor processed the Etag or Last-Modified responses, so the entire stylesheet can be unnecessarily downloaded on page refresh.
- [Enhancement] The same aggregator callback approach could be adopted for style sheets to reduce the number and size to be downloaded. Say in this case from 9 files comprising ~150Kbyte to 1 of 40Kbytes.
- [Enhancement] The number of images could be significantly reduced using a CSS sprite implementation. They can’t be conveniently used for all images, but nonetheless the the number of images to be downloaded could easily be reduced from 40 to 20.
- [Enhancement] Image tags should have their explicit dimensions set in the document HTML.
- [Enhancement] By allowing the downloading of static files from a configurable second site, e.g. static.phpBB.com (and this might just be a DNS synonym for forum site), (i) this avoids having to upload these cookies on all static references, (ii) the browser will open extra TCP channels to this site to improve download concurrency.
Programmatic cache negotiation
There are three aspects to ensuring that PHP scripts carry out optimum negotiation with the client browser:
- Ensure that the script emits the correct Expires and Cache-Control directives to let the browser know when it can safely cache content/
- Generate Etag or Last-Modified headers to facilitate negotiation when the browser needs to revalidate content.
- If the browser has previously cached content with either or both of these headers, it will attempt to revalidate content by supplying If-None-Match or If-Modified-Since request fields. Unfortunately the CGI 1.1 specification does guarantee that these will be supplied as, but if HTTP_IF_NONE_MATCH and HTTP_IF_MODIFIED_SINCE, are supplied then the script can use these to bypass unnecessary processing and return a 304 status code (Not Modified).
What still constrains performance is the server-side internal overheads of responding to this request, and this will be the subject of my next article.