RSS Feeds

Putting it all together: my blog engine architecture

This blog has pretty much turned into a self-referential exercise as a major theme in my articles is the development and performance of the blog engine itself.  This started because in last October, I decided to take an active interest in my blog again, but dislike the creative experience offered by the existing blog engine, and so I decided to rewrite the engine from scratch as a project.  This rewrite is largely complete so I have decided to write up my architecture and make a copy of the code publicly available.  This article describes this architecture.

Perhaps the five major design criteria for the engine were the following:

I have now reached the point where I have achieved all five and the engine is quite stable, though I still maintain a “To-Do list” of minor fixes and enhancements which I am working through.  But let us move on to the overall architecture.

Use of HTACCESS dispatch

As with all applications targeted at a shared service environment, I have no access to the Apache Web Server layer as the developer and therefore must rely on the appropriate .htaccess file for such configuration.  As I have described in previous articles (“Using .htaccess files on a Webfusion shared service” and “A lightweight HTML cache for a Webfusion shared service”), I have configured the system:

Script architecture

All application requests come through a common entry script, which includes some common modules for configuration data, a database access layer, common utilities, and the templating engine.  It then dispatches to a dynamically loaded module which handles all of the page-specific processing for that page, so the module search.php is used to process all search requests, etc.  These modules are loaded from a private subdirectory _include.  I have separate modules for

Each module will typically use the templating engine to prepare the output page by doing $page->assign() calls to bind any output fields and then a $page->ouput() calls to render the page itself, and to create an HTML cache copy if necessary.  The use of the templating engine and the database abstraction layer helps to keep this module code short (typically 100-200 lines), and my source for the entire application is less than 3,000 lines. (The standard TinyMCE code is on top of this).

As I discussed in previous performance articles, the main factor in keeping the response latency short in this type of service plateform is to minimise the number of script files that have to be loaded and parsed to execute any given request.  However I also need to balance this against sensible modularisation practices for maintainability.  To square this circle, I have written a simple PHP script marshalling utility, stripAssemble.php.  (The code is an appendix to this article).  I maintain the master unmarshalled version of the dispatching index.php as _debug_index.php.  (The leading underscore prevents it being directly accessed by a web request.)  When I am debugging I just overwrite the index.php with this, and when completed I can run stripAssemble to rebuild a new index.php.  This scans the debug version replacing any request() or include() statements starting in column 1 with the content inline and then removes all comments to create a single consolidated production index.php.  The preamble in this debug file is currently as follows

ini_set( "arg_separator.output", "&amp;" );
ini_set( 'error_reporting', E_ALL  |  E_STRICT );
ini_set( 'display_errors', False );
ini_set( 'include_path', './_include' . PATH_SEPARATOR . './_cache' );
# Get the standard context, then preload the PHP module if necessary and despatch to the requested page

Hence this standard aggregated index.php module includes code to handle the home page and normal article views within a single file load.  OK, this script is some 700 lines long, but this still only takes a few milliseconds to compile on a modern server, and this is a couple of orders of magnitude less than the I/O delays in reading in the separate half dozen or so files if not already in the system file cache.  Processing the other less frequent functions will typically require an extra PHP module load and a template load (except of course in the 90% plus cases where the request is already cached in the HTML cache and no script execution is required at all.)

I wrap this stripAssemble call in a script which first clears the template cache, replaces the root index.php with the debug version and then use wget to prime the template cache for the EN versions of the index and article templates before calling stripAssemble to assemble the production version.


#! /usr/bin/php
$outFile = $argv[1];
chdir( dirname( $outFile ) );
$inFile = file_get_contents( '_debug_' . basename( $outFile ) ); 
if( preg_match( '/^ \s* ini_set \s*\(\s* ["\']include_path["\'].*/xm', $inFile, $m ) )
    eval ($m[0]);
# scan input file replacing any include or require statements starting in column 1
$source = preg_replace_callback(
   '/^ (require|include) \s*\(\s* (\'|") (.*?) \\2   \s*\)\s* ;/xm',
   function( $inc ) {
      $contents = @file_get_contents("$inc[3]", FILE_USE_INCLUDE_PATH);
      if( $contents === false ) $contents = "";
      echo ($contents ? "Including " : "Missing "), $inc[3], "\n";
      return preg_replace( array( "/^<\?php/", "/\?>$/"), array("",""), $contents, 1 );    
   }, $inFile);
# Now pack the source code.  This is based on the Tokenizer example in the PHP Documentation
$tokens = token_get_all($source);
$output = "";
foreach( $tokens as $token ) {
   if(is_string( $token ) ) {
      // simple 1-character token
      $output .= $token;
   } else {
      // token array
      list( $id, $text ) = $token;
      switch( $id ) { 
         case T_COMMENT: 
         case T_DOC_COMMENT:
         case T_WHITESPACE:
            $output .=  ( ( strpos( $text, "\n" ) === false ) ? ' ' : "\n" );
            $output .= $text;   // anything else -> output "as is"
file_put_contents( $outFile, $output );
echo "$outFile processed.  Total input = ",strlen($source),", output = ",strlen($output),"\n";