Performance in a Webfusion shared service and guidelines for optimising PHP applications

I manage a few websites on a hobby / pro-bono basic, with this personal site being the smallest and the OpenOffice.org user forums being the largest (nicknamed usOOo after initials of the domain name, it is also ranked in the world top 2,000 according to BizInformation , so even if you take this statistic with a pinch of salt, it’s reasonable to call usOOo large).  When I first started building and administering the usOOo site and forum applications, they took over 50% CPU at peak times on a dedicated quad-core Solaris box.  Since then, I’ve tuned the site and its Apache/MySQL/phpBB stack to the extent that it now runs at <5% CPU at peak times with perhaps 30% more transactions, so I think I can say that I am reasonably knowledgeable about Apache/PHP tuning.

In the following discussion, I am assuming that you have read my related article, Using PHP applications on a Webfusion hosted service (Linux), which sets the context.  In terms of overall performance throughput, there are six major components which hit overall performance:

  • The overheads of the Apache system and the hosting OS.  In the case of a tuned application-dedicated service such as usOOo, everything else is so tuned that these overheads do start to become a significant factor in system throughput.  However, in the case of a Webfusion shared service, (a) you have no control over the configuration or the overheads here, and (b) in percentage terms this represent a tiny fraction of the end-to-end load, so my advice is to ignore this aspect, and especially any cautions about the overheads of using the mod_rewrite engine: these are but a flea on the back of an elephant.
  • Any per request overheads associated image activation and termination.  UsOOo uses mod_php to avoid such overheads.  However, Webfusion has to host thousands of independent domains on its shared infrastructure.  It must guarantee that all hosted domains (and their data) are protected if any one hosted domain attempts hostile or accidental access outside its scope.  It can place no trust in any individual users.  The only effective way it can do this is to run any user applications and scripts within a user-specific UID so that the Linux OS can enforce UID based access control.  It uses the suPHP as the CGI execution layer to achieve this, as this is slightly more efficient than using bare CGI and it also provides a slightly cleaner scripting interface.  Unfortunately the price of this separation is a performance hit incurred because suPHP must use a separate process to run each web request which invokes a scripting engine such as PHP — and this carries the overhead of process creation and termination.  (The more efficient alternative called FastCGI avoids this overhead by maintaining per-user persistent scripting processes; unfortunately, whilst this scales well to the tens of domains per server, it is unworkable at the concurrency scales applied to this service offering.)
  • Any per request overheads to compile any scripts that you execute.  Unlike scripting environments such as Python which support incremental compilation as part of the fundamental architecture, PHP is essentially a compile-and-go environment.  There are PHP accelerators, such as APC or Xcache, which use various bytecode caching strategies to avoid compilation, but again these don’t work with suPHP at these concurrency scales.  You must therefore assume that the PHP interpreter will compile any scripts that you require in your application — whether or not you ultimately execute these lines of code.  Luckily modern processors can compile hundreds of lines of code in a millisecond, but this could will still be an issue if you include tens of thousands of lines in your script headers.
  • File-system delays associated with accessing programs and data on disk.  The file-system cache and the MySQL database cache are the two system caches that might help your performance.  These are large on this type of server, and will give reasonable cache hit ratios for a burst of activity, but also remember that with the huge number of other web services that coexist on the server, there will also be a lot of contention for these caches and that stale data and meta-data will usually be flushed within minutes.  So for infrequent web requests, the source files or data that are need to to run its script are not in file-system cache.  The main performance constraint here script files is so much the size of the files but rather the number of files to be read, because each uncached file read will usually require at least one physical disk access and this requires moving metal.  Your script process will be placed into a wait-state for each such access and it will then go to the back of the processing and I/O queue.  If you are loading a dozen files, then these delays all aggregate to a material delay in response.
  • The application processing itself.  If you are using a third party application, such as the Webfusion one-click applications, then you will have little control over this application processing, but some applications such as Mediawiki do a lot of processing per request, and therefore you must accept that you will often get poor response if you run choose to them within a shared service.  In general, lighter applications are more suited to this type of offering.
  • Network and client delays.  These are largely external to the Wefusion data-centre and therefore outside your control accept in two very important respects: (a) All modern browsers support local caching of images and content, so you should ensure that client browsers know when requests to your application can be cached locally.  Such local caching on the end-user’s PC avoids both the delays associated with network transfers, and also reduces processing load at the server end.  (b) All modern browsers support on-the-fly compression of most content, so HTML, CSS, and any scripts should be marked for compression.  This does incur a small server overhead, but the typical 75% reduction in data volumes make this a very important performance option.  You can achieve both though .htaccess configuration.

The usOOo website and my personal webservice are at the extremes of the span of service performance characteristics that users would expect to experience.

  • With the first, I have full root access to the usOOo server and I have tuned it to run one dedicated application, so that it sits there 24×7, processing hundreds of thousands of page-views per day;  everything is pretty much cached as much as possible, so no script compilation occurs, and very little physical disk I/O.
  • With the second, my personal web-service shares Webfusion infrastructure with perhaps thousands of other domains.  It might receive zero or a few hits some days , and even at peak use I might only have a dozen or so accesses in a five minute window.  This pattern is typical of most domains hosted on the Webfusion shared infrastructure.  What I expect is reasonable performance from the Webfusion service at these sorts of load patterns and also full data protection from other domains hosted on the service, and it is this type of service that the Webfusion infrastructure has been configured and tuned to match.

If I want higher performance from my service, the next step would be to use a hosted VM service, such as Webfusion’s VPS range of offerings, but for now the Fusion Professional service meets my needs.  (I actually run a VM on my laptop configured with the same LAMP stack as the Fusion Professional service for testing and development.)  What I want to describe in the second half of this article is some guidelines for developing applications that are optimised to run on this service.

Guidelines for optimising PHP applications

Most of the following rules are general good practice, but they are important on a shared service where process creating/termination overheads hit and there will be little useful caching going on on the server for you.  Unfortunately, their applicability depends on the level of involvement that you have with the development of the applications that you are hosting.  At one extreme, you might have no involvement other than some ability to configure the application and tweak the .htaccess files.  At the other you are developing your own application and can therefore can adopt them fully.

Rules 1-4 are extremely important if you want to ensure good response.

  1. Only use scripts when you need to, and use fixed files alternatives when practical as these will be served directly by Apache with minimal overheads.  Any web request that involves a script will result in process creation/termination and the need to read and compile the script source.  There is a runtime overhead for all this.  HTML, SHTML, CSS, JS files are all handled directly by the persistent Apache processes far more efficiently.  In third-party applications, avoid configuration options which introduce such script-based external references.
  2. Avoid decomposing your code into small modules, and then putting blanket include headers in your scripts.  This technique is convenient for the programmer, but always including a dozen or so PHP modules means that if this is the first page reference in more than a minute or so then any previously cached copies of file inodes and content will probably have been flushed from the file-system cache.  So the server could end up scheduling a few dozen physical I/Os to load these in and a similar number of context switches, as well as the interpreter having to compile thousands of lines of unused code.  (Note that this isn’t the case for systems with PHP running under mod_php and with a decent PHP accelerator; here such decomposition is good practice, but on a shared service it’s a performance killer.)
  3. Understand the usage patterns in your code and optimise loading for the main path usage.  In the case of my blog, over 90% of accesses are for the overall home page or an individual article, so I have made this code as lean as possible.  For example loading the home page requires only three PHP files of less than 600 lines to be loaded.  Other functions such as searching, administration and adding comments to posts are done relatively infrequently so I take a more KISS (Keep it Simple Stupid) approach here keeping the code easy to understand.  Even if the involves a 2x speed hit on these pages, this has minimal impact on overall site performance,
  4. If you have additional code that might be required to handle error conditions and supplementary options then consider using a “Just in Time” load strategy.  One way to do this (if you are comfortable with using object oriented programming in PHP) is to use autoload classes.  I use an alternative method for simple applications which don’t merit the complexity of OOP, and that is to use a simple dynamic loader.  This assumes a naming convention where you name all externally referenced routines in module xyz with the prefix xyz, and you can then call the routine using something like “$rtn = callFunction( 'xyzProcessError', $arg1 );”.  Anyway here is a copy of this routine.
    /*
    * JiT autoloader. Assumed Mixed case notation with prefix being module name so aboutPage() is in the module about, etc. 
    * This app is too simple to get into classes and the use of autoload classes. Note dieCleanly is my error handler.
    */
    function callFunction( $function, $opt_arg = NULL ) {
      global $blog_root_dir;
      if( !function_exists( $function ) ) {
        preg_match( '/^ ([a-z]+)/x', $function, $part ) || dieCleanly ( "Invalid function call $function");
        require "$blog_root_dir/_include/{$part[1]}.php";
      }
      if( is_array( $opt_arg ) ) return call_user_func_array( $function, $opt_arg );
      elseif( is_null( $opt_arg ) ) return call_user_func( $function );
      else return call_user_func( $function, $opt_arg );
    }
  5. Consider using file-based application caches to minimise the need to include PHP scripted CSS stylesheets and to hoist overhead code into one-time execution.  In one application, I need to make a limited number of tweaks to the javascript bundle used to load the client-side TinyMCE editor.  To do this, I have an assembly routine which computes an MD5 fingerprint for this, and checks an includes directory to see if it exists.  If not, then it creates it, It then returns this MD5 fingerprinted file name to be included in the HTML source.  Yes, I could have done this with a second php script, but this would double the image activation overheads.  Another good example of this is my templating system which I discuss in a separate article, My blog’s templating engine.
  6. If you want to use server-side includes, then you can do so but remember that you must include the following directive in a .htaccess file in the (parent) directory.  Also note that Options IncludesNOEXEC has been set in the Apache Configuration, so you can’t invoke executable scripts within your SHTML files.
    Options +Includes
  7. Avoid unnecessary downloading of any static HTML, CSS and JS files by letting the client browser know that these can be cached.  You achieve this by including the following lines in the .htaccess file in your public_html directory.  If the browser later decides to validate the file (this will depend on the user’s preferences), it sends a special request including an If-None-Match header.  The sever can respond to this with a 304 status code, which tells the browser that the file hasn’t been modified and the locally cached copy can safely be used.  This is short packet with minimal overhead in an HTTP 1.1 connection.
    ExpiresActive On
    ExpiresByType text/css "access plus 1 month"
    ExpiresByType text/javascript "access plus 1 month"
    ExpiresByType text/plain "access plus 1 month"
  8. If you have static html or xml, then you should consider adding the corresponding directives for text/html and text/xml.  Also note that the Apache core directive FileETag can be used to Etag all static files handled by the core, and the default settings (FileETag INode MTime Size) already achieves what you need here.
  9. If your script is generating non-volatile / piece-wise constant output then also consider issuing the same header in your code.  (Note that you must use the PHP header() function for this as the HTTP module isn’t loaded.)  You can use Etag , Expires, and Last-Modified response headers to achieve this.  Various examples are discussed in the PHP documentation for the header function.
  10. Compress responses wherever practical.  If the user’s browser supports compression, then it will issue an Accept-Encoding: gzip, deflate HTTP request.  The current Apache configuration already uses mod_deflate and enables compression for XHTM, HTML, XML and PLAIN types.  It doesn’t compress CSS or JS because MSIE6 has a bug and mishandles compressed CSS and JS, but this shouldn’t be a major issue if you have enabled the caching settings as above.  Also you could consider ignoring this MSIE6 issue as its usage is <5% and falling (mainly some corporate users).  Some other legacy browsers can get confused, but the configuration already includes the necessary BrowserMatch directives to handle these.
  11. Where feasible place HTML embedded scripts and script references at the bottom of the body and not in the header.  If you have enabled caching then include them by an external href link, rather than placing them inline.
  12. Try to share components such as CSS stylesheets and Javascript modules across your pages.  That way, if you have caching enabled then they will be preloaded for subsequent pages.

Wrapping this up with some real-world examples

I have completely rewritten my Blog engine, partially as a nice little programming project, but also since this engine is designed to run on this type of shared service, I have optimised it for this type of use.  I will be documenting how I have achieved this is a series of subsequent posts.

I also have a phpBB application running on my shared service.  I will also document how I have tweaked this to run more efficient on the shared service.

So watch this space.

Leave a Reply