RSS Feeds

Performance in a Webfusion shared service and guidelines for optimising PHP applications

I manage a few websites on a hobby / pro-bono basic, with this personal site being the smallest and the OpenOffice.org user forums being the largest (nicknamed usOOo after initials of the domain name, it is also ranked in the world top 2,000 according to BizInformation , so even if you take this statistic with a pinch of salt, it’s reasonable to call usOOo large).  When I first started building and administering the usOOo site and forum applications, they took over 50% CPU at peak times on a dedicated quad-core Solaris box.  Since then, I’ve tuned the site and its Apache/MySQL/phpBB stack to the extent that it now runs at <5% CPU at peak times with perhaps 30% more transactions, so I think I can say that I am reasonably knowledgeable about Apache/PHP tuning.

In the following discussion, I am assuming that you have read my related article, Using PHP applications on a Webfusion hosted service (Linux), which sets the context.  In terms of overall performance throughput, there are six major components which hit overall performance:

The usOOo website and my personal webservice are at the extremes of the span of service performance characteristics that users would expect to experience. 

If I want higher performance from my service, the next step would be to use a hosted VM service, such as Webfusion’s VPS range of offerings, but for now the Fusion Professional service meets my needs.  (I actually run a VM on my laptop configured with the same LAMP stack as the Fusion Professional service for testing and development.)  What I want to describe in the second half of this article is some guidelines for developing applications that are optimised to run on this service.

Guidelines for optimising PHP applications

Most of the following rules are general good practice, but they are important on a shared service where process creating/termination overheads hit and there will be little useful caching going on on the server for you.  Unfortunately, their applicability depends on the level of involvement that you have with the development of the applications that you are hosting.  At one extreme, you might have no involvement other than some ability to configure the application and tweak the .htaccess files.  At the other you are developing your own application and can therefore can adopt them fully.

Rules 1-4 are extremely important if you want to ensure good response.

  1. Only use scripts when you need to, and use fixed files alternatives when practical as these will be served directly by Apache with minimal overheads.  Any web request that involves a script will result in process creation/termination and the need to read and compile the script source.  There is a runtime overhead for all this.  HTML, SHTML, CSS, JS files are all handled directly by the persistent Apache processes far more efficiently.  In third-party applications, avoid configuration options which introduce such script-based external references. 

  2. Avoid decomposing your code into small modules, and then putting blanket include headers in your scripts.  This technique is convenient for the programmer, but always including a dozen or so PHP modules means that if this is the first page reference in more than a minute or so then any previously cached copies of file inodes and content will probably have been flushed from the file-system cache.  So the server could end up scheduling a few dozen physical I/Os to load these in and a similar number of context switches, as well as the interpreter having to compile thousands of lines of unused code.  (Note that this isn’t the case for systems with PHP running under mod_php and with a decent PHP accelerator; here such decomposition is good practice, but on a shared service it’s a performance killer.)

  3. Understand the usage patterns in your code and optimise loading for the main path usage.  In the case of my blog, over 90% of accesses are for the overall home page or an individual article, so I have made this code as lean as possible.  For example loading the home page requires only three PHP files of less than 600 lines to be loaded.  Other functions such as searching, administration and adding comments to posts are done relatively infrequently so I take a more KISS (Keep it Simple Stupid) approach here keeping the code easy to understand.  Even if the involves a 2x speed hit on these pages, this has minimal impact on overall site performance,

  4. If you have additional code that might be required to handle error conditions and supplementary options then consider using a “Just in Time” load strategy.  One way to do this (if you are comfortable with using object oriented programming in PHP) is to use autoload classes.  I use an alternative method for simple applications which don’t merit the complexity of OOP, and that is to use a simple dynamic loader.  This assumes a naming convention where you name all externally referenced routines in module xyz with the prefix xyz, and you can then call the routine using something like “$rtn = callFunction( 'xyzProcessError', $arg1 );”.  Anyway here is a copy of this routine.

    /*
    * JiT autoloader. Assumed Mixed case notation with prefix being module name so aboutPage() is in the module about, etc. 
    * This app is too simple to get into classes and the use of autoload classes. Note dieCleanly is my error handler.
    */
    function callFunction( $function, $opt_arg = NULL ) {
      global $blog_root_dir;
      if( !function_exists( $function ) ) {
        preg_match( '/^ ([a-z]+)/x', $function, $part ) || dieCleanly ( "Invalid function call $function");
        require "$blog_root_dir/_include/{$part[1]}.php";
      }
      if( is_array( $opt_arg ) ) return call_user_func_array( $function, $opt_arg );
      elseif( is_null( $opt_arg ) ) return call_user_func( $function );
      else return call_user_func( $function, $opt_arg );
    }
  5. Consider using file-based application caches to minimise the need to include PHP scripted CSS stylesheets and to hoist overhead code into one-time execution.  In one application, I need to make a limited number of tweaks to the javascript bundle used to load the client-side TinyMCE editor.  To do this, I have an assembly routine which computes an MD5 fingerprint for this, and checks an includes directory to see if it exists.  If not, then it creates it, It then returns this MD5 fingerprinted file name to be included in the HTML source.  Yes, I could have done this with a second php script, but this would double the image activation overheads.  Another good example of this is my templating system which I discuss in a separate article, My blog’s templating engine.

  6. If you want to use server-side includes, then you can do so but remember that you must include the following directive in a .htaccess file in the (parent) directory.  Also note that Options IncludesNOEXEC has been set in the Apache Configuration, so you can’t invoke executable scripts within your SHTML files.

    Options +Includes
  7. Avoid unnecessary downloading of any static HTML, CSS and JS files by letting the client browser know that these can be cached.  You achieve this by including the following lines in the .htaccess file in your public_html directory.  If the browser later decides to validate the file (this will depend on the user’s preferences), it sends a special request including an If-None-Match header.  The sever can respond to this with a 304 status code, which tells the browser that the file hasn’t been modified and the locally cached copy can safely be used.  This is short packet with minimal overhead in an HTTP 1.1 connection. 

    ExpiresActive On
    ExpiresByType text/css "access plus 1 month"
    ExpiresByType text/javascript "access plus 1 month"
    ExpiresByType text/plain "access plus 1 month"
  8. If you have static html or xml, then you should consider adding the corresponding directives for text/html and text/xml.  Also note that the Apache core directive FileETag can be used to Etag all static files handled by the core, and the default settings (FileETag INode MTime Size) already achieves what you need here. 

  9. If your script is generating non-volatile / piece-wise constant output then also consider issuing the same header in your code.  (Note that you must use the PHP header() function for this as the HTTP module isn’t loaded.)  You can use Etag , Expires, and Last-Modified response headers to achieve this.  Various examples are discussed in the PHP documentation for the header function

  10. Compress responses wherever practical.  If the user’s browser supports compression, then it will issue an Accept-Encoding: gzip, deflate HTTP request.  The current Apache configuration already uses mod_deflate and enables compression for XHTM, HTML, XML and PLAIN types.  It doesn’t compress CSS or JS because MSIE6 has a bug and mishandles compressed CSS and JS, but this shouldn’t be a major issue if you have enabled the caching settings as above.  Also you could consider ignoring this MSIE6 issue as its usage is <5% and falling (mainly some corporate users).  Some other legacy browsers can get confused, but the configuration already includes the necessary BrowserMatch directives to handle these. 

  11. Where feasible place HTML embedded scripts and script references at the bottom of the body and not in the header.  If you have enabled caching then include them by an external href link, rather than placing them inline. 

  12. Try to share components such as CSS stylesheets and Javascript modules across your pages.  That way, if you have caching enabled then they will be preloaded for subsequent pages.

Wrapping this up with some real-world examples

I have completely rewritten my Blog engine, partially as a nice little programming project, but also since this engine is designed to run on this type of shared service, I have optimised it for this type of use.  I will be documenting how I have achieved this is a series of subsequent posts.

I also have a phpBB application running on my shared service.  I will also document how I have tweaked this to run more efficient on the shared service.

So watch this space.