<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>
<channel>
	<title>Terry Ellison's Blog</title>
	<atom:link href="http://blog.ellisons.org.uk/rss-blog" rel="self" type="application/rss+xml" />
	<link>http://blog.ellisons.org.uk/</link>
	<description>Terry Ellison's Blog</description>
	<lastBuildDate>Tue, 07 Feb 2012 19:17:01 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>daily</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
    <copyright><![CDATA[© Terry Ellison 2012 under under the CCA 3.0 licence (http://creativecommons.org/licenses/by/3.0/).]]></copyright>
	<generator>http://blog.ellison6.home/search-blogEngine</generator>
	<docs>http://blogs.law.harvard.edu/tech/rss</docs>
	<item>
		<title><![CDATA[A slight case of ME/CFS – four years on]]></title>
		<description><![CDATA[
<p>I&rsquo;ve just finished reading <a
href="http://blog.ellisons.org.uk/article-41">my last update</a>, written just over
a year ago.&nbsp; From the ME/CFS perspective, I guess that things are pretty much on a plateau
now.&nbsp; For example, my walking range hasn&rsquo;t improved much over this last year and I
still have to be very careful about what I eat, as my food intolerances can punish me if I am
lax in this department.&nbsp; The first half of 2011 was constrained by my knee injury, surgery
to repair the cartilage (or more accurately cut out the wrecked bits).&nbsp; <a
href="http://en.wikipedia.org/wiki/Arthroscopy#Knee_arthroscopy">Knee arthroscopy</a> is now
a routine operation, but even so it took about 6 months to recover strength and flexibility in
my knee.&nbsp; No doubt this impacted my general mobility and exercise levels.&nbsp; I am still
sometimes troubled by knee-pain when sitting or in bed.</p> 
<p>So one decision that I did make was to set myself the target of reducing my body weight to
what it was when I was 30 years of age.&nbsp; Like most people moving into late middle-age, my
weight has been slowly and steadily creeping up over the years; nothing sudden, just the odd
pound or so each year.&nbsp; Achieving this target involved a conscious dietary change.&nbsp;
Given that I already avoid nearly all processed foods and the wheat family, you might think that
there is little more that I could do.&nbsp;However, in essence I have materially eliminated carbohydrates
and potatoes from my diet, and increased the amount of vegetables and fruit to compensate. &nbsp;
Within six months, I&rsquo;d lost maybe 25lb, and I&rsquo;ve lost a few more since.&nbsp; This
is largely what Dr Myhill calls a <a
href="http://www.drmyhill.co.uk/wiki/Stone_Age_Diet_-_this_is_a_diet_which_we_all_should_follo">stone
age diet</a>, and it&rsquo;s very similar to one that <a
href="http://www.terrywahls.com/about-Terry-Wahls">Prof Terry Wahls</a> described in her very
thought provoking TEDTalk: <a href="http://www.youtube.com/watch?v=KLjgBLwH3Wc">Minding Your
Mitochondria</a>.&nbsp; An interesting side-effect of moving onto this diet is that my bowels
have really settled down, and I now seem to have no sign of &ldquo;irritable bowel&rdquo; symptoms.</p> 
<p>It has still been a long slow road to recovery, but I regard myself as lucky because none
of the ME/CFS sufferers that I know have made anywhere near the improvments that I have, and
my life is a world-away from the year that I spend exhausted and bedridden four years ago.</p> 
<p> [...] ]]></description>
		<link>http://blog.ellisons.org.uk/article-66</link>
		<pubDate>Tue, 07 Feb 2012 15:07:00 +0000</pubDate>
		<dc:creator></dc:creator>
        <guid isPermaLink="true">http://blog.ellisons.org.uk/article-66</guid>
		<content:encoded><![CDATA[
<p>I&rsquo;ve just finished reading <a
href="http://blog.ellisons.org.uk/article-41">my last update</a>, written just over
a year ago.&nbsp; From the ME/CFS perspective, I guess that things are pretty much on a plateau
now.&nbsp; For example, my walking range hasn&rsquo;t improved much over this last year and I
still have to be very careful about what I eat, as my food intolerances can punish me if I am
lax in this department.&nbsp; The first half of 2011 was constrained by my knee injury, surgery
to repair the cartilage (or more accurately cut out the wrecked bits).&nbsp; <a
href="http://en.wikipedia.org/wiki/Arthroscopy#Knee_arthroscopy">Knee arthroscopy</a> is now
a routine operation, but even so it took about 6 months to recover strength and flexibility in
my knee.&nbsp; No doubt this impacted my general mobility and exercise levels.&nbsp; I am still
sometimes troubled by knee-pain when sitting or in bed.</p> 
<p>So one decision that I did make was to set myself the target of reducing my body weight to
what it was when I was 30 years of age.&nbsp; Like most people moving into late middle-age, my
weight has been slowly and steadily creeping up over the years; nothing sudden, just the odd
pound or so each year.&nbsp; Achieving this target involved a conscious dietary change.&nbsp;
Given that I already avoid nearly all processed foods and the wheat family, you might think that
there is little more that I could do.&nbsp;However, in essence I have materially eliminated carbohydrates
and potatoes from my diet, and increased the amount of vegetables and fruit to compensate. &nbsp;
Within six months, I&rsquo;d lost maybe 25lb, and I&rsquo;ve lost a few more since.&nbsp; This
is largely what Dr Myhill calls a <a
href="http://www.drmyhill.co.uk/wiki/Stone_Age_Diet_-_this_is_a_diet_which_we_all_should_follo">stone
age diet</a>, and it&rsquo;s very similar to one that <a
href="http://www.terrywahls.com/about-Terry-Wahls">Prof Terry Wahls</a> described in her very
thought provoking TEDTalk: <a href="http://www.youtube.com/watch?v=KLjgBLwH3Wc">Minding Your
Mitochondria</a>.&nbsp; An interesting side-effect of moving onto this diet is that my bowels
have really settled down, and I now seem to have no sign of &ldquo;irritable bowel&rdquo; symptoms.</p> 
<p>It has still been a long slow road to recovery, but I regard myself as lucky because none
of the ME/CFS sufferers that I know have made anywhere near the improvments that I have, and
my life is a world-away from the year that I spend exhausted and bedridden four years ago.</p> 
<p><a name="endtaster"></a></p>
]]></content:encoded>
	</item>
	<item>
		<title><![CDATA[A quick point about timing]]></title>
		<description><![CDATA[
<p>I've just deployed my V3 blog engine to live.&nbsp; In doing so, the time per query jumped
from &lt;10 mSec per query to ~2.5s&nbsp;&mdash; which meant that the typical page response time
for non-cached pages increased from an average 0.25s to nearly 3s.&nbsp; (Cached pages are loaded
from a static copy, so the render time is typically under 0.2s.&nbsp; This was odd as the timing
for V3 remained much the same as the V2 engine (&lt; 10 mSec) for my development and test versions.&nbsp;
After quick 'binary chop' through the code, looking at the micro-timing (I have debug routines
to do this), I found that entire increase was down to a single MySQL query which I used in the
initialisation of my extension to the <b>mysqli</b> class:</p> 
<pre>SELECT TABLE_NAME AS name
FROM information_schema.tables
WHERE TABLE_SCHEMA = '&lt;DBname&gt;'
AND TABLE_NAME LIKE '&lt;TablePrefix&gt;%'</pre> 
<p> [...] ]]></description>
		<link>http://blog.ellisons.org.uk/article-65</link>
		<pubDate>Wed, 01 Feb 2012 18:12:00 +0000</pubDate>
		<dc:creator></dc:creator>
        <guid isPermaLink="true">http://blog.ellisons.org.uk/article-65</guid>
		<content:encoded><![CDATA[
<p>I've just deployed my V3 blog engine to live.&nbsp; In doing so, the time per query jumped
from &lt;10 mSec per query to ~2.5s&nbsp;&mdash; which meant that the typical page response time
for non-cached pages increased from an average 0.25s to nearly 3s.&nbsp; (Cached pages are loaded
from a static copy, so the render time is typically under 0.2s.&nbsp; This was odd as the timing
for V3 remained much the same as the V2 engine (&lt; 10 mSec) for my development and test versions.&nbsp;
After quick 'binary chop' through the code, looking at the micro-timing (I have debug routines
to do this), I found that entire increase was down to a single MySQL query which I used in the
initialisation of my extension to the <b>mysqli</b> class:</p> 
<pre>SELECT TABLE_NAME AS name
FROM information_schema.tables
WHERE TABLE_SCHEMA = '&lt;DBname&gt;'
AND TABLE_NAME LIKE '&lt;TablePrefix&gt;%'</pre> 
<p><a name="endtaster"></a>and this was taking 2.3 &ndash; 2.4 seconds to run, so I replaced
this by a functional equivalent which ran in &lt; 2 mSec:</p> 
<pre>SHOW TABLES LIKE '&lt;TablePrefix&gt;%'</pre> 
<p>I picked the former because I wanted to explicitly name the column in the result set.&nbsp;
However, coding around with the second query only adds an extra line of code. &nbsp; For this
extra line,&nbsp; my script time drops by 100x and the user response at the browser decreases
by 10x.&nbsp; This all goes to show you can spend ages worrying about whether this particular
way of coding (say select case vs. if esleif&nbsp; chains) which only changes runtimes by &micro;Sec,
but if the runtime suddenly increases in a bizarre manner, then its usually some bizarre quirk
that you hadn't anticipated, and you should do a timing drill-down to work out why.</p>
]]></content:encoded>
	</item>
	<item>
		<title><![CDATA[More fun with the Webfusion configuration]]></title>
		<description><![CDATA[
<p>This is a follow-up to my article on building a <a
href="http://blog.ellisons.org.uk/article-61">Webfusion test VM</a>.&nbsp;
My aim is to create a test environment which reflects my Webfusion environment at a file system
organisation and PHP programmatic level.&nbsp; The Webfusion service itself uses name-based <a
href="http://httpd.apache.org/docs/2.2/vhosts/mass.html">dynamically configured mass virtual
hosting</a> that is more complex than the examples in this Apache technical note, and its implementation
requires patches to the Apache build.&nbsp; (See <a
href="http://blog.ellisons.org.uk/62#Webfusion">below</a> for more details).&nbsp;
In my case, it is a lot simpler to stick to a standard Debian Apache configuration, with a virtual
host for each site to be tested in the <tt>/etc/apache2/sites_available</tt> directory as in
<a
href="http://blog.ellisons.org.uk/62#Listing1">listing 1</a> below, as this removes the need to add rewrite rules in the
<tt>&lt;VirtualHost&gt;</tt> section and works with an &ldquo;out of the box&rdquo; Apache install.&nbsp;
Note that <tt>192.168.1.245</tt> is my static IP address for my VM and this would need updating
for another installation.&nbsp; With this configuration, I now have full access to the error
and rewrite logs and can now fully debug my applications locally.</p> 
<p>The last thing that I need to do is to set up my D/B account (either in phpMyAdmin or by
executing the following from the MySQL root account, with the database name, username and password
as appropriate):</p> 
<pre>   CREATE DATABASE dddddddddddddd   CHARACTER SET utf8;
   CREATE USER    'uuuuuuuuuuuuuu'@'localhost' IDENTIFIED BY 'pppppppppppppp';
   GRANT  ALL ON   dddddddddddddd.* TO 'uuuuuuuuuuuuuu'@'localhost';</pre> 
<p> [...] ]]></description>
		<link>http://blog.ellisons.org.uk/article-62</link>
		<pubDate>Mon, 05 Dec 2011 01:20:00 +0000</pubDate>
		<dc:creator></dc:creator>
        <guid isPermaLink="true">http://blog.ellisons.org.uk/article-62</guid>
		<content:encoded><![CDATA[
<p>This is a follow-up to my article on building a <a
href="http://blog.ellisons.org.uk/article-61">Webfusion test VM</a>.&nbsp;
My aim is to create a test environment which reflects my Webfusion environment at a file system
organisation and PHP programmatic level.&nbsp; The Webfusion service itself uses name-based <a
href="http://httpd.apache.org/docs/2.2/vhosts/mass.html">dynamically configured mass virtual
hosting</a> that is more complex than the examples in this Apache technical note, and its implementation
requires patches to the Apache build.&nbsp; (See <a
href="http://blog.ellisons.org.uk/62#Webfusion">below</a> for more details).&nbsp;
In my case, it is a lot simpler to stick to a standard Debian Apache configuration, with a virtual
host for each site to be tested in the <tt>/etc/apache2/sites_available</tt> directory as in
<a
href="http://blog.ellisons.org.uk/62#Listing1">listing 1</a> below, as this removes the need to add rewrite rules in the
<tt>&lt;VirtualHost&gt;</tt> section and works with an &ldquo;out of the box&rdquo; Apache install.&nbsp;
Note that <tt>192.168.1.245</tt> is my static IP address for my VM and this would need updating
for another installation.&nbsp; With this configuration, I now have full access to the error
and rewrite logs and can now fully debug my applications locally.</p> 
<p>The last thing that I need to do is to set up my D/B account (either in phpMyAdmin or by
executing the following from the MySQL root account, with the database name, username and password
as appropriate):</p> 
<pre>   CREATE DATABASE dddddddddddddd   CHARACTER SET utf8;
   CREATE USER    'uuuuuuuuuuuuuu'@'localhost' IDENTIFIED BY 'pppppppppppppp';
   GRANT  ALL ON   dddddddddddddd.* TO 'uuuuuuuuuuuuuu'@'localhost';</pre> 
<p><a name="endtaster"></a>And also adding the D/B server name (in my case <tt>cust_mysql01)</tt> as
an alias for localhost in <tt>/etc/hosts</tt>.</p> 
<h2><a name="Listing1"></a>Listing 1 &ndash; Apache configuration changes to mirror the Webfusion
service</h2> 
<pre># Change the StartServers,MinSpareServers,MaxSpareServers,MaxClients to 2,1,3,50
sed -i -f - /etc/apache2/apache2.conf &lt;&lt;END
/mpm_prefork_module/,+6 {
  s/\(StartServers\W*\)[0-9]*/\12/
  s/\(MinSpareServers\W*\)[0-9]*/\11/
  s/\(MaxSpareServers\W*\)[0-9]*/\13/
  s/\(MaxClients\W*\)[0-9]*/\150/
}
/ports.conf/s/^/#/
END

# Create VirtualHost configuration for test hosts
cat &gt; /etc/apache2/sites-available/webfusion &lt;&lt;END
Listen 192.168.1.245:80
NameVirtualHost 192.168.1.245:80
&lt;VirtualHost 192.168.1.245:80&gt;
    ServerSignature      On
    UseCanonicalName     Off
    UseCanonicalPhysicalPort Off

    ServerName  ellisons.org.home
    ServerAlias blog.ellisons.org.home test.blog.ellisons.org.home

    AddDefaultCharset ISO-8859-1

    suPHP_Engine on

    DocumentRoot /websites/LinuxPackage02/el/li/so/ellisons.org.uk/public_html

    LogLevel  debug
    LogFormat "02 %{VHOST}e %V %h %l %u %t \"%r\" %&gt;s %O \"%{Referer}i\" \"%{User-Agent}i\"" vhost
    CustomLog "/var/log/apache2/access.log" vhost

    DirectoryIndex index.html index.php index.htm

    RewriteLog "/var/log/apache2/rewrite.log"
    RewriteLogLevel 6

    Setenv DOCUMENT_ROOT_REAL /websites/LinuxPackage02/el/li/so/ellisons.org.uk/public_html
    RewriteEngine On

    &lt;Directory /websites/LinuxPackage??/??/??/??/*/public_html/&gt;
      AllowOverride All
      Options       SymLinksIfOwnerMatch IncludesNOEXEC Multiviews -Indexes
    &lt;/Directory&gt;

    &lt;Files php.ini&gt;
        Order deny,allow
        Deny from all
    &lt;/Files&gt;

    RLimitCPU 15
    RLimitNPROC 7
    RLimitMEM 95000000
&lt;/VirtualHost&gt;
END

rm /etc/apache2/sites-enabled/000-default
ln -s ../sites-available/webfusion /etc/apache2/sites-enabled/000-webfusion</pre> 
<h2><a name="Webfusion"></a>Some details on the Webfusion implementation</h2> 
<p>I have merged in the relevant content of the Webfusion Apache configuration into my VM&rsquo;s
Debian-type Apache configuration given above.&nbsp; The rewrite directives in the Webfusion configuration
are perhaps worth a little futher discussion.&nbsp; These were:</p> 
<pre>     RewriteMap  wfvhost      prg:mod_rewrite_scripts/wf-rewrite-vhost.pl
     RewriteMap  wfgetbase    prg:mod_rewrite_scripts/wf-rewrite-base.pl

(1A) RewriteCond %{ENV:SSL}   ^1$
(1)  RewriteRule .*   -                 [skip=3]
(2)  RewriteRule ^(/websites/(LinuxPackage\d+|symlinks)/../../../([^/]+)/.*)$ \
                      $1                [E=VHOST:${wfvhost:$3},skip=5]
(3)  RewriteRule ^/websites/(LinuxPackage\d+|symlinks) \
                      x                 [F]
(4)  RewriteRule ^/~([^/_@]+[_@])?([^/]+)(/.*)?$ \
                      ${wfgetbase:$2}$3 [E=VHOST:${wfvhost:$2},skip=3]
(5A) RewriteCond %{HTTP_HOST}  ^$
(5)  RewriteRule .*   -                 [F]
(6)  RewriteRule .*   -                 [E=VHOST:${wfvhost:%{HTTP_HOST}}]
(7)  RewriteRule (.*) ${wfgetbase:%{ENV:VHOST}}$1
(8)  RewriteRule .*   -                 [E=DOCUMENT_ROOT_REAL:${wfgetbase:%{ENV:VHOST}}]</pre> 
<p>where I&rsquo;ve removied the comment text and numbered the RewriteRules to make skip calculations
easier.&nbsp; As far as I can see the Webfusion guys have modified the Apache code to use the
environment variable <tt>DOCUMENT_ROOT_REAL</tt> as a way of dynamically initialising <tt>DOCUMENT_ROOT</tt>.&nbsp;
However, this implementation is unnecessarily inefficient as each cycle can call up to 3 rewrite
map functions.&nbsp; Also not only can the user <tt>.htaccess</tt> rewrites generate additional
rewrite cycles, but also rules (2), (4) and (7) can all trigger another rewrite cycle likewise.&nbsp;
Hence on my initial system test (which was a close copy of the Webfusion setup), my blog rules
were generating 18-21 map calls per request.&nbsp;</p> 
<p>Apache implements each prg: map by forking off a child process when the Apache server is
started, and the rewriting engine then calls this as an out-of-process function: that is on each
map-function lookup Apache sends the input key as a string on a socket connected to the process&rsquo;s
<tt>stdin</tt> and then reads its returned looked-up value as a string from its <tt>stdout</tt>.&nbsp;
Hence 20 map requests will generate 40 process context-switches.&nbsp; However, the two perl
scripts referenced in the maps cache previous lookups in a hash, so even though a database query
is still required for any cache miss, any subsequent lookups are quite cheap with the main cost
is that these are out of process calls.&nbsp;</p> 
<p>However, there should be no need to do more than two calls per request, and most retries
can easily be avoided by using the persistent environment variables. (I did a few tweaks to these
rules and got this down to 4 map calls per request, but it&rsquo;s not my job to debug the Webfusion
installation.)</p>
]]></content:encoded>
	</item>
	<item>
		<title><![CDATA[Creating a test VM to mirror the Webfusion Shared Service]]></title>
		<description><![CDATA[
	
<p>In previous articles I have discussed how difficult it is for users of a <a
href="http://en.wikipedia.org/wiki/Shared_web_hosting_service">shared web hosting service</a> such
as <a href="http://www.webfusion.co.uk/web-hosting/">Webfusion's</a> to develop and to debug
applications on a service that they're using to provide services such as a blog or a forum to
a client community over the web.&nbsp; In my view, there are two main challenges to developing
on such 'production' environments:</p> <ul> 
<li>
<p>The supplier of the hosting service will typically optimise the service for normal access
– that is for live use – and this usually involves disabling all debug features and diagnostics.</p> </li>
<li>
<p>The development process can temporarily 'break' functional code or introduce bugs.&nbsp;
You will normally want to keep the existing service running in parallel to this development without
its failing.</p> </li></ul> 
<p>For these reasons, mature IT organisations invariably separate out development from live
'production' and use multiple environments for development teams, pre-production testing, user
trials, production itself and support.&nbsp; Not doing this is asking for trouble and is just
bad practice, so I use VMs hosted either on my laptop or on my home Ubuntu server to do all such
testing.&nbsp; I find that a standard Ubuntu LAMP stack (such as on my laptop) is good enough
for functional development.&nbsp; However this type of LAMP stack is different in subtle but
important ways from the Webfusion shared service architecture, so I have also set up a VM configuration
which far more closely mirrors the Webfusion set-up.&nbsp; </p> 
<p> [...] ]]></description>
		<link>http://blog.ellisons.org.uk/article-61</link>
		<pubDate>Tue, 29 Nov 2011 23:02:29 +0000</pubDate>
		<dc:creator></dc:creator>
        <guid isPermaLink="true">http://blog.ellisons.org.uk/article-61</guid>
		<content:encoded><![CDATA[
	
<p>In previous articles I have discussed how difficult it is for users of a <a
href="http://en.wikipedia.org/wiki/Shared_web_hosting_service">shared web hosting service</a> such
as <a href="http://www.webfusion.co.uk/web-hosting/">Webfusion's</a> to develop and to debug
applications on a service that they're using to provide services such as a blog or a forum to
a client community over the web.&nbsp; In my view, there are two main challenges to developing
on such 'production' environments:</p> <ul> 
<li>
<p>The supplier of the hosting service will typically optimise the service for normal access
– that is for live use – and this usually involves disabling all debug features and diagnostics.</p> </li>
<li>
<p>The development process can temporarily 'break' functional code or introduce bugs.&nbsp;
You will normally want to keep the existing service running in parallel to this development without
its failing.</p> </li></ul> 
<p>For these reasons, mature IT organisations invariably separate out development from live
'production' and use multiple environments for development teams, pre-production testing, user
trials, production itself and support.&nbsp; Not doing this is asking for trouble and is just
bad practice, so I use VMs hosted either on my laptop or on my home Ubuntu server to do all such
testing.&nbsp; I find that a standard Ubuntu LAMP stack (such as on my laptop) is good enough
for functional development.&nbsp; However this type of LAMP stack is different in subtle but
important ways from the Webfusion shared service architecture, so I have also set up a VM configuration
which far more closely mirrors the Webfusion set-up.&nbsp; </p> 
<p><a name="endtaster"></a>Just in case others would like to do this, I have used this article
to explain how I did this in some detail.&nbsp; Unfortunately it has grown quite large as I’ve
documented this.&nbsp; I apologise for its length.&nbsp; To keep the article manageable, I have
split it into the following sections: <a
href="http://blog.ellisons.org.uk/61#requirements">deciding on your requirements</a>,
<a
href="http://blog.ellisons.org.uk/61#choosingVM">choosing the VM for you</a> and <a
href="http://blog.ellisons.org.uk/61#buildingVM">building and configuring
the VM</a>.&nbsp; I will also be deferring some content to a later article.</p> 
<h2><a name="requirements"></a>Deciding on your requirements</h2> 
<p>I would recommend at a minimum for any developer for a web application:</p> <ul> 
<li>
<p>Don't make untested changes to production.&nbsp; Use a development/test environment which
is quite separate from the production (and use more if if is necessary to work on multiple aspects
which you want to keep separate).&nbsp; Make sure that each such environment appropriately mirrors
the live environment. (I'll discuss this later.)</p> </li>
<li>
<p>Use some form of development cycle base around this / these environments:</p> <ul> 
<li>
<p>For each change decide what its scope is and how it needs to be tested.</p> </li>
<li>
<p>Develop and test the change on the development environment.</p> </li>
<li>
<p>When you are confident that the change is working successfully then release the change into
production, <i>making sure that you have some form of simple rollback strategy if things go
wrong</i>.</p> </li></ul> </li></ul> 
<p>Of course you might be able to buy extra environments from your service suppliers, but in
my view this is a waste of money and not as effective as having a local environment over you
have administrative control.&nbsp; I will be discussing how to use a free VM product to configure
a VM to use as a test service for the LAMP-based Webfusion shared services, but I first want
to discuss a few issues.</p> <ul> 
<li>
<p><b>What is an 'appropriate' development environment?</b>&nbsp; At a minimum, you will need
to have somewhere to run a LAMP stack.&nbsp; I use Ubuntu for all of my home environments, so
my main laptop already supports a LAMP stack, and this is adequate for most purposes.&nbsp; I
have a copy of my blog engine here that I use to author articles before &quot;syncing&quot; them
to the public service.&nbsp; However if you are running Windows then running Apache/PHP/MySQL
over Windows (WAMP) is a pain to set up and is also different to LAMP in many ways, so IMHO,
it's just a lot easier to run a LAMP VM.&nbsp; A typical default LAMP stack is not the same as
that configured by Webfusion, and these difference can sometimes be significant.&nbsp; It is
a lot easier to tailor a dedicated LAMP VM to mirror the target production environment as I do
in this article.</p> </li>
<li>
<p><b>Release and Rollback</b>.&nbsp; By '<i>release</i>', I mean the process of moving a tested
set of changes onto your production service.&nbsp; There are many ways to do this, but unless
your application is absolutely huge, <i>the simplest method is probably the best</i>.&nbsp;
What I have a <tt>_private/kits</tt> directory, and I copy any releases into this in the form
of a <tt><b>.tar.bz2</b></tt> file.&nbsp; (I could just as easily use a <b>zip</b> file, but
using a single compressed file makes the copy a lot easier under FTP or a WebDAV-based explorer.)&nbsp;
I also have a simple script for each application that takes the 'release ID' as a parameter and
explodes the corresponding kit into the target application directory hierarchy.&nbsp; Rollback
is easy: release the previous version using the same script.&nbsp; I use a remote command (as
I discuss in <a
href="http://blog.ellisons.org.uk/article-43">this article</a>) to execute this script, but you just as
easily create a PHP variant which you invoke over the web and which takes the application, release
ID, and enabling password and run as a child process.&nbsp; Remember to ensure that the release
script does not overwrite any content subdirectories, if the application hierarchy includes these
(e.g. using an <tt>--exclude</tt> option or equivalent in the script).</p> </li>
<li>
<p><b>Consider using Source Version Control</b>.&nbsp; I use Git in my development cycle, so
I typically just keep the current and previous versions on my production server, as I can remake
any historic version easily.&nbsp; However, setting up this sort of system can be quite complex,
so you might just prefer to keep a local archive of your release kits.</p> </li></ul> 
<h2><a name="choosingVM"></a>Choosing the VM for you</h2> 
<p>The first thing that you need to decide is which virtualisation product to use.&nbsp; There
are two clear alternatives in my opinion:</p> <ul> 
<li>
<p><a href="http://www.vmware.com/products/player/overview.html">VMare Player</a>.&nbsp; This
is the simplest end-user VMware product to use within a desktop / laptop environment, and the
one that I initially used at home because it's the market leader and I'd used VMware at work.&nbsp;
I would avoid the free VMware products as they are either obsolescent or too complex for this
type of simple VM application.&nbsp; One advantage of using VMware is its <a
href="http://www.vmware.com/appliances/">Appliance Store</a> which provides simple pre-build
LAMP appliances such as the free <a
href="http://www.vmware.com/appliances/directory/82293">TurnKey LAMP Appliance</a> which I discuss
below.</p> </li>
<li>
<p><a href="https://www.virtualbox.org/">VirtualBox</a> (commonly abbreviated to VBox).&nbsp;
This is the main market alternative to VMware, and in my view is the easiest product to install
and to use on Windows, OS X and most major Linux variants.&nbsp; Unlike VMware Player, VBox includes
a fully featured VM development environment.&nbsp; The disadvantage for starters is the lack
of an integrated appliance store, but many of the store subscribers offer ISO downloads such
as <a href="http://www.turnkeylinux.org/download?file=turnkey-lamp-11.2-lucid-x86.iso">Turnkey's</a>.</p> </li></ul> 
<p>I switched to using VBox about five years ago, and I would still recommend it overall because
of its features, performance and wide platform support, so the remainder of this article assumes
that you will be using VBox as your virtualisation platform.</p> 
<p>The next step that you need to do its to understand your vendors hosting environment:</p> <ul> 
<li>
<p>Which versions of Apache / PHP / MySQL are you running.&nbsp; The starting point here is
the response from a <tt>phpinfo()</tt> request as I've discussed in my <a
href="http://blog.ellisons.org.uk/search-Webfusion">earlier Webfusion articles</a>, which tells me that I am running but
I also ended up running <tt>httpd -v</tt>, <tt>php -v</tt> and <tt>mysql --version</tt> on the
server to get the versions (Apache/2.2.3, PHP 5.2.16 and MySQL 5.1.52)&nbsp;</p> </li>
<li>
<p>Which PHP and Apache modules are loaded and how they are configured.&nbsp; I exported a tarball
of <tt>/etc/httpd</tt> and <tt>/etc/php*</tt> to work this out as PHP runs as a cli child image
of mod_suphp rather than an in-process library within mod_php.</p> </li>
<li>
<p>You need to replicate this closely on your VM and be aware of any material differences (e.g.
some nice PHP features such as anonymous functions were only released with PHP 5.3 and don't
work under 5.2).&nbsp; However, the main issue here is to cover the feature that you use, e.g.
you might decide not to bother with some modules that you don't use in your applications.</p> </li></ul> 
<h2><a name="buildingVM"></a>Building and Configuring the VM</h2> <ul> 
<li>
<p><b>Step 1: Install the VM product</b></p> </li></ul> 
<p>See <a href="https://www.virtualbox.org/wiki/Downloads">VirtualBox Downloads</a>.&nbsp; In
the case of Windows and OS X this is an self-installing exe / package which you download and
run.&nbsp; In the case of Linux variants the Linux page tells you how to add the Virtual box
repository to the sources so that you just use your standard package installer to do the initial
install and track any product updates.&nbsp; For this type of VM, VirtualBox is pretty much load
and go.</p> <ul> 
<li>
<p><b>Step 2: Initial install of the VM product</b></p> </li></ul> 
<p>I have used TurnKey's <a href="http://www.turnkeylinux.org/lampstack">LAMP Appliance</a> over
VBox in this example.&nbsp; You will need to tailor this script if you use another media (such
as the Ubuntu-supplied 10.04-LTS CD distribution or the <a
href="http://archive.ubuntu.com/ubuntu/dists/lucid/main/installer-i386/current/images/netboot/mini.iso">Ubuntu
10.04 LTS netboot mini.iso</a>) or use VMware Player as your VM product.&nbsp; The sweet-spot
for this appliance is Amazon EC2, but it is still a good starting point for me to use here.&nbsp;
The advantages of this appliance is that it is packaged minimal Ubuntu 10.04-LTS LAMP server
build (~ 210Mb download compared to 680Mb for the standard Ubuntu ISO), and has&nbsp; some nice
management tools preinstalled that novice sysadmins will find useful.&nbsp; There are two possible
approaches to using it: the first is to download the <a
href="http://www.turnkeylinux.org/download?file=turnkey-lamp-11.2-lucid-x86-ovf.zip">OVF version</a> of
the appliance, unzip it and use the VBox import function to convert / import into Virtualbox.&nbsp;
However, I find it easier just to use the <a
href="http://www.turnkeylinux.org/download?file=turnkey-lamp-11.2-lucid-x86.iso">ISO version</a> and
do a bare load.</p> 
<p>Use the <tt>VirtualBox-&gt;New Menu</tt> option to create a new VM:</p> <ul> 
<li>
<p>Linux/Ubuntu VM;</p> </li>
<li>
<p>System: 384 Mb;</p> </li>
<li>
<p>Boot:CD+HDD, no floppy;</p> </li>
<li>
<p>Storage: the IDE Controller is sufficient, so add a new 2Gb HDD to the IDE primary master
(this will leave about 1Gb for your Application + database, so increase this if you need more),
add the LAMP distribution ISO to the secondary master; delete the SATA controller;</p> </li>
<li>
<p>Network: enable a single adapter bridged;&nbsp;</p> </li>
<li>
<p>Audio and USB: disabled;</p> </li>
<li>
<p>Everything else is as per the defaults.</p> </li></ul> 
<p>Now start the machine and hit enter to boot into the Linux installation.&nbsp; Select &quot;Guided&nbsp;–
Use entire disk&quot; as the partitioning method and hit enter.&nbsp; Let the installer split
the disk (into the root and swap partitions), select &quot;Yes&quot; and hit enter to continue.&nbsp;
Select &quot;Yes&quot; to the question &quot;Install the GRUB boot loader to the master boot
record?&quot;&nbsp; The system will now install the LAMP stack, prompt to remove the CD (which
you do by hitting right-clt then selecting the Devices-&gt;CD-&gt;host drive then hitting enter;
the sytem will now boot into Ubuntu.&nbsp; This all takes a couple of minutes.</p> 
<p>The Turnkey configuration menu first prompts for root and MySQL root passwords.&nbsp; This
is a test VM which will only be accessible locally from your PC, so I suggest using a common
password for all these as there's little point in high security (but <i>don't</i> do this for
live systems).&nbsp; Skip the Turnkey Backup and Migration feature, and skip daily security updates.&nbsp;
You will now be presented with the URIs for web access, WebShell, Webmin, PHPMyAdmin, and SSH/SFTP.&nbsp;
Now select Advanced-&gt;Shutdown to close down the server.&nbsp; When you restart, you now have
all of the features that you need to tailor your server to your specific development needs.&nbsp;</p> 
<p>Two hints here: (i) you don't need to shut down the VM when you've finished using it; you
can instead simply save it to disk by using the Hostkey(Right-Ctl)-Q option in the VBox console
window. (ii) You don't need to have a VBox console window running (and keep capturing you mouse
if you accidentaly select it) as you can do everything that you need to do with the WebShell,
Webmin, PHPMyAdmin, and SSH/SFTP tools provided; instead use the <tt><b>VBoxHeadless -s </b></tt><tt><i><b>{vmname}</b></i></tt><tt> </tt><tt><b>-v
off</b></tt> from the command line to start your VM and <tt><b>VBoxManage controlvm </b></tt><tt><i><b>{vmname}</b></i></tt><tt> </tt><tt><b>savestate</b></tt> to
save it.</p> <ul> 
<li>
<p><b>Step 3: Tailor the VM to mirror your production service</b></p> </li></ul> 
<p>At this point you have a good platform to do most of your LAMP development.&nbsp; However
in my case the VM is different to the Webfusion service (WfS) in some ways that I accept.&nbsp;
However I want to mirror some of these, so I still need to have more work to do:</p> <ul> 
<li>
<p><b>Apache</b>: (WfS:2.2.3 vs VM:2.2.14).&nbsp; <b>Accept</b> as there are no material function
changes between these versions.&nbsp; WfS loads a richer set of authorisation modules but I don't
use this functionality.&nbsp; It also uses <tt>mod_suphp</tt> and <tt>mod_expires </tt>as opposed
to the VM which currently supports <tt>mpd_cgi</tt> and <tt>mod_php</tt>, and these have different
runtime characteristics so I want to fix up these.</p> </li>
<li>
<p><b>MySQL</b>: (WfS: 5.1.52 vs. VM:5.1.41).&nbsp; <b>Accept</b> as there are no material function
changes between these versions.</p> </li>
<li>
<p><b>PHP</b>: (WfS: 5.2.16 vs.VM:5.3.2).&nbsp; <b>Accept</b> as there's little that I can do
about it as obtaining a 5.2 version is a pretty stretch.&nbsp; The main issue here is that I
need to avoid using the new PHP 5.3 functionality.</p> </li></ul> 
<p>So what this really boils down to is that I've got to swap out <tt>mod_php</tt> for <tt>suPHP
</tt>(this also requires dropping the PHP Xcache accelerator as this isn't supported under suPHP
and isn't used on WfS anyway.&nbsp; I also want to add some extra components to make debugging
easier and facilitate transfer from VM to the hosting machine and the production VM.&nbsp; My
goal is to have the VM look identical to my Webfusion environment at a PHP programming level.&nbsp;
I am more relaxed about other differences between the VM and the production service that are
visible at a PHP coding level so I have decided not to mirror these (for example the VM only
runs one user service and uses an in-VM database and file storage, whereas Webfusion runs thousands
of users on a server farm with the dedicated database dedicated servers and the user file storage
on <a href="http://en.wikipedia.org/wiki/Network-attached_storage">Network-attached storage
(NAS)</a>.&nbsp; The listings below gives the command scripts that I used to tailor the system.&nbsp;</p> 
<p>I will discuss copied the <tt>/etc</tt> updates (for Apache, suPHP, &amp;c) in a following
article.&nbsp; You can just paste this into a WebShell window on your server to do this yourself.&nbsp;
(You will, of course, need to change this if you are using a service other than Webfusion shared
LAMP service.)&nbsp; I am currently only using this to test PHP scripts so I need to think about
what I do about python support etc..</p> <ul> 
<li>
<p><b>Step 3: Install the application on the VM </b> </p> </li></ul> 
<p>You need to set up entries in your <tt>hosts</tt> file to map test versions of your domain
onto the VMs IP.&nbsp; I allocate all my VMs static IPs in the address range 198.162.1.128-250,
and so this is easy, eg.</p> 
<pre>192.168.1.245   ellisons.org.home blog.ellisons.org.home files.ellisons.org.home ... </pre>
<p> You can now use an SFTP, SSH, WebShell, etc. to access and set up the VM in very much the
same way with the target service.&nbsp;</p> 
<h2>Listings — VM tailor script</h2> 
<p>The first step is to install some additional tools that I find useful for debugging:</p> 
<pre>apt-get update
apt-get install linux-image-virtual \
   libapache2-mod-suphp php5-cli php5-gd php5-imagick php5-curl \
   php5-xdebug vim zip unzip bzip2 strace
halt</pre>
<p> After restarting the VM booting the the virtual kernel (This will now be the default and
is labelled <tt>linux-image-generic-pae</tt> for some reason), I strip out the Turnkey components
that are really aimed at is Amazon EC2 offering:</p> 
<pre>apt-get remove linux-image-generic php5-xcache libapache2-mod-php5 lvm2 \
   tklbam tklbam-duplicity tklbam-python-boto webmin-tklbam 
apt-get clean
apt-get autoclean
apt-get autoremove
rm /var/cache/debconf/*old /var/cache/apt/*cache.bin
rm /etc/php5/conf.d/{imagick,xcache}.ini
rm -R /etc/tklbam /usr/lib/tklbam
rm /etc/cron.daily/tklbam-backup
DISABLE_MODS=&quot;authz_default authz_groupfile cgi perl python reqtimeout ssl&quot;
ENABLE_MODS=&quot;auth_digest authn_anon authz_dbm expires headers include rewrite unique_id&quot;
for m in $DISABLE_MODS; do rm /etc/apache2/mods-enabled/$m.*; done
for m in $ENABLE_MODS;  do ln -s ../mods-available/$m.load /etc/apache2/mods-enabled/$m.load; done</pre>
<p> Lastly I set up a shadow account so that my working UID / directory is in the same location
as on the Webfusion system.&nbsp; Here I just execute this script with parameters <tt>/websites/LinuxPackage02/el/li/so
ellisons.org.uk 9999999 terrye &quot;Terry Ellison&quot; </tt>:</p> 
<pre>NAMEROOT=&quot;$1&quot;
DOMAIN=&quot;$2&quot;
USERID=&quot;$3&quot;
USERNAME=&quot;$4&quot;
USERDESC=&quot;$5&quot;
DOCROOT=$NAMEROOT/$DOMAIN
addgroup --gid $USERID $USERNAME 
mkdir -p $DOCROOT/logfiles
chgrp $USERNAME $DOCROOT/logfiles
chmod 750 $DOCROOT/logfiles
adduser --uid $USERID --ingroup $USERNAME --home $DOCROOT/public_html --gecos &quot;$USERDESC&quot; $USERNAME</pre>
<p> <b>To be be covered in next </b>article: Apache config changes, Mysql account set up, ...</p> 
]]></content:encoded>
	</item>
	<item>
		<title><![CDATA[Oh, the embarrassment!]]></title>
		<description><![CDATA[
<p>I&rsquo;ve just discovered a bug in my blog: any comment posts get thrown into the bit-bucket
and are never passed to me for acceptance / publishing if the user making the comment is a guest
(that is not a blog author) and accessing the public instance on my <b>blog.ellisons.org.uk</b> domain
rather than my private and locally installed development system.&nbsp;</p> 
<p>I can only plea in mitigation is that this combination made it easier to miss this bug in
my module and integration tests.&nbsp; I did a quick check on my access logs (I've got the last
18 months archived), and I've work out how many comment posts were made in this period &ndash;
roughly 4 a week.&nbsp; If I ignore the 21 spammer probes that were 404'ed, there were a total
of 295 reader comments.&nbsp; The comments themselves are gone forever, but appended an article
league table for comments, with a cut at 5 comments.&nbsp; My sincere apologies to all these
blog readers, and I really regret losing this valuable feedback.</p> 
<p>There were a couple of factors that helped my missing this.&nbsp; The first is that I decided
to do a complete reimplementation of the blog engine based on some of the conclusions that I've
come to in these articles on PHP performance, and I've been working on that on my development
system.&nbsp; The second is that I had some heavy commitments in apache.org that took most of
my spare time for a couple of months. Both of these will be the subject of future articles.&nbsp;
However, I first needed to fix this bug before I could publish this article from my test system
to live &ndash; otherwise I would have whithered with shame! [...] ]]></description>
		<link>http://blog.ellisons.org.uk/article-60</link>
		<pubDate>Fri, 25 Nov 2011 00:35:00 +0000</pubDate>
		<dc:creator></dc:creator>
        <guid isPermaLink="true">http://blog.ellisons.org.uk/article-60</guid>
		<content:encoded><![CDATA[
<p>I&rsquo;ve just discovered a bug in my blog: any comment posts get thrown into the bit-bucket
and are never passed to me for acceptance / publishing if the user making the comment is a guest
(that is not a blog author) and accessing the public instance on my <b>blog.ellisons.org.uk</b> domain
rather than my private and locally installed development system.&nbsp;</p> 
<p>I can only plea in mitigation is that this combination made it easier to miss this bug in
my module and integration tests.&nbsp; I did a quick check on my access logs (I've got the last
18 months archived), and I've work out how many comment posts were made in this period &ndash;
roughly 4 a week.&nbsp; If I ignore the 21 spammer probes that were 404'ed, there were a total
of 295 reader comments.&nbsp; The comments themselves are gone forever, but appended an article
league table for comments, with a cut at 5 comments.&nbsp; My sincere apologies to all these
blog readers, and I really regret losing this valuable feedback.</p> 
<p>There were a couple of factors that helped my missing this.&nbsp; The first is that I decided
to do a complete reimplementation of the blog engine based on some of the conclusions that I've
come to in these articles on PHP performance, and I've been working on that on my development
system.&nbsp; The second is that I had some heavy commitments in apache.org that took most of
my spare time for a couple of months. Both of these will be the subject of future articles.&nbsp;
However, I first needed to fix this bug before I could publish this article from my test system
to live &ndash; otherwise I would have whithered with shame!<a name="endtaster"></a></p> 
<h2>A postscript one month on</h2> 
<p>I continued to track the Apache logs and saw regular POSTs to articles but none were arriving
on my administration queue for approval, so I added small diagnostic to dump any POSTs to the
article page to a debug file.&nbsp; Yup lots of posts but all spam, which my "simple sum" validation
was defeating.&nbsp; Nobody loves me after all :-(</p> 
<hr/> 
<h2>League table of lost comments</h2> 
<table width="80%"> <col width="70%"/></col> <col width="10$"/></col> <tbody> 
<tr> <td> 
<p><b>Article</b></p> </td> <td> 
<p><b>No of posts</b></p> </td> </tr> 
<tr> <td> 
<p><a
href="http://blog.ellisons.org.uk/article-9">VBA vs OOo Basic Performance II</a></p> </td> <td> 
<p>46</p> </td> </tr> 
<tr> <td> 
<p><a
href="http://blog.ellisons.org.uk/article-28">A PHP Cache Overview and Cache Tuning</a></p> </td> <td> 
<p>27</p> </td> </tr> 
<tr> <td> 
<p><a
href="http://blog.ellisons.org.uk/article-13">Building a VMware Appliance Playpen</a></p> </td> <td> 
<p>17</p> </td> </tr> 
<tr> <td> 
<p><a
href="http://blog.ellisons.org.uk/article-35">My blog’s templating engine</a></p> </td> <td> 
<p>16</p> </td> </tr> 
<tr> <td> 
<p><a
href="http://blog.ellisons.org.uk/article-52">phpBB Performance – Reducing the data cache overhead</a></p> </td> <td> 
<p>16</p> </td> </tr> 
<tr> <td> 
<p><a
href="http://blog.ellisons.org.uk/article-25">Migrating the OpenOffice.Org forums to a common code-base</a></p> </td> <td> 
<p>11</p> </td> </tr> 
<tr> <td> 
<p><a
href="http://blog.ellisons.org.uk/article-53">phpBB Performance – Reducing the script load overhead</a></p> </td> <td> 
<p>11</p> </td> </tr> 
<tr> <td> 
<p><a
href="http://blog.ellisons.org.uk/article-8">VBA vs OOo Basic Performance</a></p> </td> <td> 
<p>11</p> </td> </tr> 
<tr> <td> 
<p><a
href="http://blog.ellisons.org.uk/article-33">Using .htaccess files on a Webfusion shared service</a></p> </td> <td> 
<p>10</p> </td> </tr> 
<tr> <td> 
<p><a
href="http://blog.ellisons.org.uk/article-42">Access and Permissions on a Webfusion shared service</a></p> </td> <td> 
<p>9</p> </td> </tr> 
<tr> <td> 
<p><a
href="http://blog.ellisons.org.uk/article-1">Error Handling in VBScript - Part I</a></p> </td> <td> 
<p>8</p> </td> </tr> 
<tr> <td> 
<p><a
href="http://blog.ellisons.org.uk/article-27">Use cases for phpBB</a></p> </td> <td> 
<p>8</p> </td> </tr> 
<tr> <td> 
<p><a
href="http://blog.ellisons.org.uk/article-4">An example of how to use VBA/COM over Outlook</a></p> </td> <td> 
<p>7</p> </td> </tr> 
<tr> <td> 
<p><a
href="http://blog.ellisons.org.uk/article-48">So why CSS Sprites?</a></p> </td> <td> 
<p>7</p> </td> </tr> 
<tr> <td> 
<p><a
href="http://blog.ellisons.org.uk/article-10">Differences in Error Handling between VBA and OOo Basic</a></p> </td> <td> 
<p>6</p> </td> </tr> 
<tr> <td> 
<p><a
href="http://blog.ellisons.org.uk/article-34">Performance in a Webfusion shared service and guidelines for
optimising PHP applications</a></p> </td> <td> 
<p>6</p> </td> </tr> 
<tr> <td> 
<p><a
href="http://blog.ellisons.org.uk/article-44">More on optimising PHP applications in a Webfusion shared service</a></p> </td> <td> 
<p>6</p> </td> </tr> 
<tr> <td> 
<p><a
href="http://blog.ellisons.org.uk/article-41">A miserable case of ME/CFS – three years on</a></p> </td> <td> 
<p>5</p> </td> </tr> 
<tr> <td> 
<p><a
href="http://blog.ellisons.org.uk/article-47">Putting it all together: my blog engine architecture</a></p> </td> <td> 
<p>5</p> </td> </tr> </tbody> </table>
]]></content:encoded>
	</item>
	<item>
		<title><![CDATA[More on using Rewrite rules in .htaccess files]]></title>
		<description><![CDATA[
	
<p>This article is a further discussion how to use rewrite rules on a shared hosting service
(SHS) such as the one supplied by Webfusion and that I use.&nbsp; It develops some earlier discussions
in the following blog articles:</p> <ul> 
<li>
<p><a
href="http://blog.ellisons.org.uk/article-33">Using .htaccess files on a Webfusion shared service</a></p> </li>
<li>
<p><a
href="http://blog.ellisons.org.uk/article-45">A lightweight HTML cache for a Webfusion shared service</a></p> </li></ul> 
<p> [...] ]]></description>
		<link>http://blog.ellisons.org.uk/article-59</link>
		<pubDate>Mon, 21 Nov 2011 16:58:00 +0000</pubDate>
		<dc:creator></dc:creator>
        <guid isPermaLink="true">http://blog.ellisons.org.uk/article-59</guid>
		<content:encoded><![CDATA[
	
<p>This article is a further discussion how to use rewrite rules on a shared hosting service
(SHS) such as the one supplied by Webfusion and that I use.&nbsp; It develops some earlier discussions
in the following blog articles:</p> <ul> 
<li>
<p><a
href="http://blog.ellisons.org.uk/article-33">Using .htaccess files on a Webfusion shared service</a></p> </li>
<li>
<p><a
href="http://blog.ellisons.org.uk/article-45">A lightweight HTML cache for a Webfusion shared service</a></p> </li></ul> 
<p><a name="endtaster"></a>The main documentation source is the Apache HTTP server documentation,
the <a href="http://httpd.apache.org/docs/2.2/howto/htaccess.html">.htaccess files</a> tutorial
and the documentation on <a href="http://httpd.apache.org/docs/2.2/rewrite">mod_rewrite</a>.&nbsp;
The former contains a lot of useful information on use of <tt>.htaccess</tt> files except anything
related to rewrite which is covered by the latter which is now a complete section covering the
more detailed aspects in eleven separate sub-sections, three of which are essential reading.</p> <ul> 
<li>
<p><a href="http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html"><b>Mod_rewrite reference
documentation</b></a>.&nbsp; This is the standard reference content on the module and its directive
definitions.</p> </li>
<li>
<p><a href="http://httpd.apache.org/docs/2.2/rewrite/intro.html"><b>Introduction to regular
expressions and mod_rewrite</b></a>.&nbsp; This is describes what regular expressions are and
how they are used within mod_rewrite.&nbsp; A must-read for beginners.</p> </li>
<li>
<p><a href="http://httpd.apache.org/docs/2.2/rewrite/flags.html"><b>RewriteRule Flags</b></a>.&nbsp;
A good summary of what all the rewrite flags mean.&nbsp; Note that the DPI flag is only available
for Apache versions 2.2.12 and later.</p> </li></ul> 
<p>Of the remaining eight, two are useful in specific cases and the rest really only apply where
you have access to the base Apache configuration and therefore aren’t so relevant to SHS users.</p> <ul> 
<li>
<p><a href="http://httpd.apache.org/docs/2.2/rewrite/remapping.html">Using mod_rewrite for redirection
and remapping of URLs</a>.&nbsp; This contains some standard patterns for common uses of mod_rewrite,
including detailed descriptions of how each works.</p> </li>
<li>
<p><a href="http://httpd.apache.org/docs/2.2/rewrite/access.html">Using mod_rewrite to control
access</a>.&nbsp; More patterns, this time based on the <tt>HTTP_REFERER</tt> variable, to deny
access or redirect to alternative sites</p> </li>
<li>
<p><a href="http://httpd.apache.org/docs/2.2/rewrite/vhosts.html">Dynamic virtual hosts with
mod_rewrite</a></p> </li>
<li>
<p><a href="http://httpd.apache.org/docs/2.2/rewrite/proxy.html">Dynamic proxying with mod_rewrite</a></p> </li>
<li>
<p><a href="http://httpd.apache.org/docs/2.2/rewrite/rewritemap.html">Using RewriteMap</a></p> </li>
<li>
<p><a href="http://httpd.apache.org/docs/2.2/rewrite/advanced.html">Advanced techniques and
tricks</a></p> </li>
<li>
<p><a href="http://httpd.apache.org/docs/2.2/rewrite/avoid.html">When NOT to use mod_rewrite</a></p> </li>
<li>
<p><a href="http://httpd.apache.org/docs/2.2/rewrite/tech.html">Technical details</a></p> </li></ul> 
<p>Most of what you need to understand is covered somewhere in these chapters, but it's well
worth scanning for “<b>htaccess</b>” and “<b>per-directory”</b> when you go through
because this content is fragmented across them.&nbsp; What I then want to do here is to cover
the main points about the interaction of the rewrite functionality and <tt>.htaccess</tt> processing
in a single article.</p> 
<h2>The overall Rewrite architecture</h2> 
<p>Rewrites fall into two categories:</p> <ul> 
<li>
<p>A system administrator can put them into the Apache configuration (SHS providers will typically
do this to map the individual user domains onto the correct root directory) and these will apply
to the entire server. </p> </li>
<li>
<p>Non-privileged users have to use the directory-specific processing (referred to “Perdir”
in the rewrite logs) that interprets the rewrite rules in the <tt>.htaccess</tt> files.</p> </li></ul> 
<p>The rewrite engine essentially does a loop:</p> 
<pre>do
  execute server rewrites (in the Apache Config)
  execute vhost rewrites (in the Apache Virtual Host Config)
  find the &quot;Perdir&quot; .htaccess file
  if found(.htaccess)
    execute .htaccess rewrites (in the user's directory)
 while rewrite occurred</pre>
<p> Note that this loop only terminates after a pass where no rewriting has occurred and thus
if you aren’t careful this can result in a loop which terminates with an error.</p> 
<h2>&quot;Perdir&quot; Rewrites</h2> 
<p>If you have access to the Apache config (for example when you are buying a VM service such
as Webfusion VPS or Amazon EC2) then best practice is to use the server and vhosts rewrites and
disable <tt>.htaccess</tt> use altogether.&nbsp; SHS providers can't do this as users will need
to have access to rewrite functionality through &quot;Perdir&quot; <tt>.htaccess</tt> files
and this is this category that I want to discuss in detail in this article.</p> 
<h3>What access files get processed</h3> 
<p>Processing of <tt>.htaccess</tt> files can only be enabled within the Apache configuration,
and this is done as standard within SHS offerings.&nbsp; (Note that the filename of the access
files can be changed from the default <tt>.htaccess</tt> in the Apache configuration, but this
is rarely done.)&nbsp; When processing a request for any resource, Apache will try to open all
possible access files on the path.&nbsp; So in the case of an example, <tt>/blog/includes/tinymce/license.txt</tt>,
it tries to open the following access files:</p> 
<pre>DOCUMENT_ROOT/.htaccess
DOCUMENT_ROOT/blog/.htaccess
DOCUMENT_ROOT/blog/includes/.htaccess
DOCUMENT_ROOT/blog/includes/tinymce/.htaccess
DOCUMENT_ROOT/blog/includes/tinymce/license.txt/.htaccess</pre>
<p> where <tt>DOCUMENT_ROOT</tt> is the root directory of the user’s directory tree (in my
case this is <tt>/websites/LinuxPackage02/el/li/so/ellisons.org.uk/public_html</tt>).&nbsp;
If any of the above files exists then it is read and cached during the processing of the request.&nbsp;
Doing a putative open and then handling the error condition if the file doesn’t exist may seem
an odd implementation, but it is a cheap operation (in terms of runtime and system overhead)&nbsp;–
say 0.1 mS – so this is an efficient way of traversing all possible <tt>.htaccess</tt> files.&nbsp;
Even this last case which might seem an odd one to do, but there is some logic in this: if <tt>licence.txt</tt> <i>were</i> a
directory, then its <tt>.htaccess</tt> file would need to be processed; the error code in this
case is “<b>not a directory</b>” which therefore acts as a file / directory test for <tt>licence.txt</tt>.</p> 
<p>Note that the <tt>.htaccess</tt> file will only take part in rewrite processing if it includes
a <tt>RewriteEngine on</tt> directive and <tt>Options FollowSymLinks</tt> has been set in the
Apache server configuration (which is always the case in the case of SHS offerings).</p> 
<p>My advice is to keep the number of access files to a minimum, and delete any that you don’t
need.&nbsp; You can do all the rewrite processing, access control, etc., from a single <tt>DOCUMENT_ROOT/.htaccess</tt> and
I prefer this approach.&nbsp; However, if you have multiple applications under your root (for
example I’ve got six including a blog, a test wiki and a forum) then a sensible compromise
is to add a second tier of access files with one per application at the next directory level,
so that application-specific rewrite processing can be split per application.&nbsp; This also
has the benefit that you can create a test directory (hierarchy) to develop changes to your rules,
and do this without causing the live applications to fail, but at the cost of an extra <tt>.htaccess</tt> file
open and Perdir processing cycle.</p> 
<h3>Some Misconceptions</h3> <ul> 
<li>
<p><tt><b>.htaccess</b></tt> <b>rewrites are inefficient</b>.&nbsp; The Apache documentation
uses phrases like “incredible”, “these are reached a very long time after the URLs have
been translated to filenames”, “mod_rewrite should be considered a last resort”, …&nbsp;
This is just silly: the Apache child process starts to process the <tt>.htaccess</tt> in <tt>DOCROOT</tt> less
than 0.2 mSec after reading GET request, and the entire rewrite processing typically takes a
few mSec at most if rewrite logging is disabled (as is always the case on an SHS configuration)
and the <tt>.htaccess</tt> files are precached in the server’s Virtual Filesystem Cache (VFC).&nbsp;
To put this time overhead into context, PHP image activation can take a typical 80 mSec, and
reading a <i>single file</i> from a network storage mounted directory (the typical implementation
for SHS server farm architectures), if not precached in the VFC or NAS cache, can take 200 mSec.&nbsp;
So the rewrite overheads&nbsp;are typically at least a couple of orders of magnitude less than
PHP image activation and file access overheads. &nbsp; The only material overhead is in reading
in the <tt>.htaccess</tt> file(s), and since these are the ‘hottest’ files in an SHS application
they will typically have excellent cache-hit ratios.</p> </li>
<li>
<p><tt><b>.htaccess</b></tt> <b>rewrites are a minefield that are best avoided</b>.&nbsp; Unfortunately
SHS application developers have no viable alternative for many Apache functions, and so using
this functionality can’t be avoided. Yes, the <tt>.htaccess</tt> architecture and its implementation
are complicated as a result of its evolution, but this also largely compounded by weaknesses
in the current documentation, which does a poor job in explaining how to use <tt>.htaccess</tt> files
to implement rewrite functionality.</p> </li></ul> 
<h3>How PerDir rewrite processing works</h3> <ul> 
<li>
<p><b>Processing is multi-pass</b>. Dropping through the last rewrite or setting a “last”
flag on a successful rewrite forces the end of a pass. If any processed rewrite rule has matched
<i>and</i> has changed the URI or the query string, then this could mean that another rule might
apply, so this is treated as an “INTERNAL REDIRECT” at the end of the pass, and processing
is restarted. This cycle is terminated when the last pass makes no change to the URI or the query
string.</p> </li>
<li>
<p><b>Each pass uses a single </b><tt><b>.htaccess</b></tt> <b>file</b>. The deepest <tt>.htaccess</tt> file
on the URI path with <tt>RewriteEngine on</tt> is used.&nbsp; For historic reasons, this processing
is convolved.&nbsp; </p> </li>
<li>
<p>By this second stage the URI has been converted into a putative filename based on the path
relative to <tt>DOCUMENT_ROOT</tt>.&nbsp; This is then split into a “<b>Per Dir</b>” part
(which includes the trailing <b>/</b>) and a relative URI part (where the leading <b>/</b> is
missing).&nbsp; So using an example which is generated initially by a request to <tt>http://blog.ellisons.org.uk/includes/tinymce/license.txt</tt>:</p> <ul> 
<li>
<p><tt>HTTP_HOST</tt> is <tt>blog.ellisons.org.uk</tt></p> </li>
<li>
<p><b>Per Dir</b> is <tt>/websites/LinuxPackage02/el/li/so/ellisons.org.uk/public_html/</tt></p> </li>
<li>
<p>The bare URI is <tt>includes/tinymce/license.txt</tt> (<i>note the missing leading /</i>).</p> </li></ul> </li>
<li>
<p>Any “<b>?</b>” delimiter and request parameters are also stripped from the URI before
it is then used as the match string in any <tt>RewriteRule</tt> statements.&nbsp; Thus you can’t
examine request parameters in a <tt>RewriteRule</tt>.</p> </li></ul> <ul> 
<li>
<p>The rewrite logic consists of a sequence of <tt>RewriteRule</tt> statements.&nbsp; By default
these are executed sequentially until the last one or a <tt>[last]</tt> flag is set on a successful
rule.&nbsp; However the order of rules can be modified by use of flags such as <tt>[chain]</tt> and
<tt>[skip]</tt>.&nbsp; </p> </li>
<li>
<p>Any rule can be preceded by one or more <tt>RewriteCond</tt> statements.&nbsp; These are
used if the <tt>RewriteRule</tt> pattern matches.&nbsp; The condition statements essentially
evaluate to a go/no-go against the first parameter which is interpolated, and which can therefore
include expanded variables, back-references, etc.&nbsp; (Hence request parameters can only be
accessed through using the <tt>%{QUERY_STRING}</tt> variable.)&nbsp; This is then matched against
the second parameter which is a condition pattern.&nbsp; <tt>RewriteCond</tt> statements are
used for two main purposes: to set match variables which can then be used as back references
in the associated <tt>RewriteRule</tt> replacement string or to provide a guard to stop an endless
loop substitution.&nbsp; Note that the execution order of the rules and conditions are actually
the other way around.</p> </li>
<li>
<p>So in this example the root <tt>.htaccess</tt> applies and if it contained the following
the URI would be rewritten as <tt>blog/includes/tinymce/license.txt</tt> and the flag would
trigger internal redirection:</p> </li></ul> 
<pre>RewriteCond %{ENV:REDIRECT_STATUS} =””
RewriteCond %{SCRIPT_FILENAME}  !-f
RewriteCond %{HTTP_HOST}        (blog|wiki|forum)\.ellisons\.org\.uk   [nocase]
RewriteRule  ^(.*)              %1/$1                                  [last]   

# This is evaluated as follows. Note that the REDIRECT_STATUS environment variable is set by any 
# redirect processing cycle, so this condition is a safety check to ensure that this rule is only
# applied on the first rewrite pass and not on any subsequent loops. 
#
#If URI_pattern.match(‘^(.*)’)   and
#    %{ENV:REDIRECT_STATUS} = ”” and     # This variable which is set at the end of a 
#    %{SCRIPT_FILENAME}  !-f     and
#    %{HTTP_HOST}.match(‘(blog|wiki|forum)\.ellison6\.co\.uk’) then
#    URI_pattern = “%1/$1”   # where %1 is blog, wiki or forum and $1 is the URI from the RewriteRule
#    Break processing
#Endif</pre> <ul> 
<li>
<p>A second rewrite pass using <tt>DOCUMENT_ROOT/blog/.htaccess</tt> now applies with:</p> </li></ul> <ul> <ul> 
<li>
<p><b>Per Dir</b> is <tt>/websites/LinuxPackage02/el/li/so/ellisons.org.uk/public_html/blog</tt></p> </li>
<li>
<p>The <tt>REQUEST_URI</tt> match string is <tt>includes/tinymce/license.txt</tt></p> </li></ul> </ul> <ul> 
<li>
<p>So the rule patterns ($1 etc.) can be used in any condition string.&nbsp; Any condition patterns
(%1 etc.) can by used in the next condition string, or (in the case of the last condition pattern)
the rewrite rule string.&nbsp; </p> </li>
<li>
<p>Remember match patterns are on the left and strings (which can include variable expansion)
or the right in the RewriteRule statements, and this is <i>visa-versa </i>in the case of <tt>RewriteCond</tt> statements:
strings (which can include variable expansion) are on the left and match patterns on the right.
</p> </li>
<li>
<p>Remember to specify a <tt>RewriteBase</tt> (usually <b>/</b> for your document root <tt>.htaccess</tt>).</p> </li>
<li>
<p>Following a successful rewrite the URI match string is replaced by the interpolated <tt>RewriteRule</tt> substitution
string any following statements. </p> </li>
<li>
<p>The rewrite engine will treat any <tt>RewriteRule</tt> substitution string with leading <b>/</b> as
a (new) <b>absolute directory</b> discarding the current relative path.&nbsp; <i>How these are
processed varies according to the Apache version</i>, so the safest thing to do if you do want
to specify an absolute path is to terminate the current rewrite cycle with a <tt>[last]</tt> flag
and restart a new cycle (with possibly a new <tt>.htaccess</tt> file).</p> </li>
<li>
<p>The rewrite engine will treat any <tt>RewriteRule</tt> substitution string containing a “<tt><b>?</b></tt><b>”</b> as
setting a new <tt>QUERY_STRING</tt> which will replace the existing one unless the <tt>[qsappend]</tt> flag
is set in which case the old string is appended.</p> </li>
<li>
<p><span  style="text-decoration : none;">Hence my advice is to use a </span><b>relative path</b> (no
leading <tt><b>/</b></tt>) for the <tt>RewriteRule</tt> substitution string wherever practical.&nbsp;
</p> </li></ul> <ul> 
<li>
<p>Write down all of the rules that you need, and work out the dependency groups that will require
chaining, and order the groups / singletons in approximate order of frequency.&nbsp; If you are
a programmer comfortable writing procedural logic then you might find it easier to write a version
using if/then/else bracketting first , then manually convert it to the rewrite ladder logic form
by adding the necessary <tt>[or]</tt>, <tt>[chain]</tt>, <tt>[skip]</tt>, <tt>[next]</tt> and
<tt>[last]</tt> flags.&nbsp; Your aim should be to order the rules so that a single pass executes
the rewriting (or two passes one root and one application <tt>.htaccess</tt> in the case of
a two level split.</p> </li></ul> <ul> 
<li>
<p>Make sure that you've got the R<i>egexp</i> syntax correct, so that each gives the expected
match variables for a set of test patterns.</p> </li>
<li>
<p>Don’t forget that Perdir processing is cyclic and this is an intrinsic characteristic of
how the engine works.&nbsp; Nonetheless, I regard such looping as a bad practice to be avoided
wherever possible. </p> </li>
<li>
<p>Make sure that you add the necessary conditions for each rewrite rule (i) to generate any
back references needed in the replacement string, and (ii) to prevent unnecessary refiring of
the rule on following internal redirection passes.</p> </li></ul> <ul> 
<li>
<p>If you don’t have access to your own test Apache instance where you can set Rewrite debugging,
test the rules in a separate test subdirectory, adding them one at a time, because the only diagnostic
that will be available to you are the status returns code on failure.&nbsp; Be aware that patterns
might fail when you don’t expect them to.&nbsp; For example in the above rule had been “^<tt>(.+)&nbsp;&nbsp;&nbsp;%1/$1</tt>”
then this would fail on the URI <tt>http://blog.ellisons.org.uk/</tt> as the match string would
be “” and <tt><b>(.+)</b></tt> requires at least one character.</p> </li></ul> 
<p>So wrapping this up this example if I adopted a two level <tt>.htaccess</tt> for my blog,
then this would give me this set of rules in the <tt>DOCUMENT_ROOT/blog/.htaccess</tt> file:</p> 
<pre>RewriteEngine on
RewriteBase /blog

# If the URI maps to a file that exists then stop. This will kill endless loops

RewriteCond %{REQUEST_FILENAME}     -f
RewriteRule .*                      -                   [last]

# If the request is HTML cacheable (a GET to a specific list and with no query string, the
# user is not logged on) and the HTML cache file exists then use it instead of executing PHP

RewriteCond %{HTTP_COOKIE}                                              !blog_user
RewriteCond %{REQUEST_METHOD}%{QUERY_STRING}                            =GET               [nocase]
RewriteCond %{DOCUMENT_ROOT}/blog/html_cache/$1.html                    -f
RewriteRule ^(article-\d+|index|sitemap.xml|search-\w+|rss-[0-9a-z]*)$  html_cache/$1.html [last]

# Anything else pass to index.php

RewriteRule (.*) index.php?page=$1    [qsappend,last]</pre>
<p> One complication here is the use of the <tt>RewriteBase</tt>.&nbsp; This doesn’t apply
to the rewrite rule pattern, but it <i>is</i> added in to the substitution strings and the <tt>REQUEST_URI</tt> variable.
You can see this in an extract of the rewrite log (with debugging enabled) for a request to <tt>http://blog.ellisons.org.uk/</tt>article-59
as follows (note that I’ve trimmed some of the header fields and replaced the document root
by DOCROOT to save space.)</p> 
<pre>init rewrite engine with requested uri /article-59
pass through /article-59
[perdir DOCROOT/] strip per-dir prefix: DOCROOT/article-59 -&gt; article-59
[perdir DOCROOT/] applying pattern '^(.*)' to uri 'article-59'
[perdir DOCROOT/] RewriteCond: input='' pattern=''  =&gt; matched
[perdir DOCROOT/] RewriteCond: input='DOCROOT/article-59' pattern='!-f' =&gt; matched
[perdir DOCROOT/] RewriteCond: input='blog.ellison6.home' pattern='(blog|wiki|forum)\.ellison6\.home' [NC] =&gt; matched
[perdir DOCROOT/] rewrite 'article-59' -&gt; 'blog/article-59'
[perdir DOCROOT/] add per-dir prefix: blog/article-59 -&gt; DOCROOT/blog/article-59
[perdir DOCROOT/] trying to replace prefix DOCROOT/ with /
strip matching prefix: DOCROOT/blog/article-59 -&gt; blog/article-59
add subst prefix: blog/article-59 -&gt; /blog/article-59
[perdir DOCROOT/] internal redirect with /blog/article-59 [<span style="color:#ff0000;">INTERNAL REDIRECT</span>]
[perdir DOCROOT/blog/] strip per-dir prefix: DOCROOT/blog/article-59 -&gt; article-59 
[perdir DOCROOT/blog/] applying pattern '.*' to uri 'article-59'
[perdir DOCROOT/blog/] RewriteCond: input='DOCROOT/blog/article-59' pattern='-f' =&gt; not-matched
[perdir DOCROOT/blog/] strip per-dir prefix: DOCROOT/blog/article-59 -&gt; article-59 
[perdir DOCROOT/blog/] applying pattern '^(article-\d+|index|sitemap.xml|search-\w+|rss-[0-9a-z]*)$' to uri 'article-59'
[perdir DOCROOT/blog/] RewriteCond: input='blog_email=XXXXXXXX; blog_user=XXXX; blog_token=XXXXXXXXXXXXXXXX' pattern='!blog_user' =&gt; not-matched
[perdir DOCROOT/blog/] strip per-dir prefix: DOCROOT/blog/article-59 -&gt; article-59 
[perdir DOCROOT/blog/] applying pattern '(.*)' to uri 'article-59'
[perdir DOCROOT/blog/] rewrite 'article-59' -&gt; 'index.php?page=article-59'
split uri=index.php?page=article-59 -&gt; uri=index.php, args=page=article-59
[perdir DOCROOT/blog/] add per-dir prefix: index.php -&gt; DOCROOT/blog/index.php
[perdir DOCROOT/blog/] trying to replace prefix DOCROOT/blog/ with /blog
strip matching prefix: DOCROOT/blog/index.php -&gt; index.php
add subst prefix: index.php -&gt; /blog/index.php
[perdir DOCROOT/blog/] internal redirect with /blog/index.php [<span style="color:#ff0000;">INTERNAL REDIRECT</span>]
[perdir DOCROOT/blog/] strip per-dir prefix: DOCROOT/blog/index.php -&gt; index.php
[perdir DOCROOT/blog/] applying pattern '.*' to uri 'index.php'
[perdir DOCROOT/blog/] RewriteCond: input='DOCROOT/blog/index.php' pattern='-f' =&gt; matched
[perdir DOCROOT/blog/] pass through DOCROOT/blog/index.php</pre>
<h2> Diagnosing .htaccess usage</h2> 
<p>There is a simple fact of life here: debugging <tt>.htaccess</tt> files on a shared service
is very difficult, and this is for one main reason: even though Apache server has facilities
to carry out detailed logging, enabling logging incurs major performance penalties, so service
providers do not enable this through the Apache configuration.&nbsp; So you as an SHS user/developer
are left with a process of trial and error where the only error diagnostics that you get is the
HTTP status return code if the rewrite fails.&nbsp; Hence I do all of my <tt>.htaccess</tt> rule
debugging on a local test VM which mirrors my SHS environment.&nbsp; Here I can set the <tt>RewriteLogLevel</tt> to
dump diagnostics for the Apache instance to a <tt>rewrite.log</tt> file.&nbsp; If you can’t
do this then another trick is to use environment variables to pass parameters between rules and
to the executing script, writing intermediate test strings to environment variables using the
<tt>[E=VAR:val]</tt> constructs which are then accessible by the invoked script as the environment
variable <tt>REDIRECT_VAR</tt>.</p> 
<p>Because I use a Linux laptop as my PC, as I’ve discussed on previous articles, I can also
use the <tt>strace</tt> system utility to log all of the system calls executed by the Apache
child processes as follows:</p> 
<pre># Start an strace on all www-data children
sudo rm /tmp/strace.*
sudo strace  -u www-data -tt -ff -o /tmp/strace $(ps -o &quot;-p %p&quot; h -u www-data)</pre>
<p> This strace diagnostic enables me to look at the relative timelines, and file / directory
access and the writes to the <tt>rewrite.log</tt> can then be tied up to the corresponding log
records.&nbsp; This information, plus code inspection of the <tt>mod_rewrite</tt> source enables
me to retro-engineer the module’s processing.</p> 
<p>I can examine the timelines, what processing takes place, for roughly how long and when,
and what I/O takes place.&nbsp; Unfortunately doing this is really in the skill domain of an
experienced developer and probably a bit daunting for the average SHS user.&nbsp; However, the
reasons that I’ve included this footnote are (i) for those readers who would like to drill
down themselves to understand what is going on in the rewrite processing; and (ii) to underline
that the recommendations that I make in the preceding section are not simply a matter of opinion,
but are underpinned by hard timing and file access data.</p> 
]]></content:encoded>
	</item>
	<item>
		<title><![CDATA[HTTP Caching Revisited]]></title>
		<description><![CDATA[
	
<p>I find the issues around of Web performance very interesting.&nbsp; I’ve researched pretty
comprehensively and written a few articles in this area. I also routinely use various web tools
to instrument sites that I visit to see how they perform, and quite frankly a lot are middling
to terrible in performance terms.&nbsp; An example is the MoneyCorp GPS application that I discussed
in a <a
href="http://blog.ellisons.org.uk/article-57">previous article</a>, I have subsequently found worse sites (e.g.
the <a href="http://www.radiotimes.com/tv/">RadioTimes TV</a> website which scores 18-25 depending
on page), though in these cases the main effect is a case of loading slowly, rather than failing
to load at all as in the case of GPS.</p> 
<p>However, what I have noticed with some sites is that they seemed to perform reasonably well
from a user-perspective even though Google PageSpeed marks them down for not specifying caching
parameters.&nbsp; The relevant Google recommendations on “<a
href="http://code.google.com/speed/page-speed/docs/caching.html#LeverageBrowserCaching">Leverage
browser caching</a>” describe the use of the HTTP headers <b>Expires</b>, <b>Cache-Control
max-age</b>, <b>Last-Modified</b> and <b>Etag</b> for resources that you wish cached by the
client browser, as I have discussed perviously.&nbsp; This guidance really only relates to the
<i>mandatory</i> rules for browser caching detailed in <a
href="http://www.w3.org/Protocols/rfc2616/rfc2616.html">RFC 2616 (HTTP/1.1)</a> in <a
href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.htm#sec13.2">section 13.2</a>.&nbsp;
However, in addition to these mandatory rules, most browsers also implement an advisory rule
that is discussed in section 13.2.4, and which is based on any <b>Last-Modified</b> header if
provided.&nbsp; If present, then life of the cached resource is the age at download (the delta
of <b>Date</b> and <b>Last-Modified</b> values) divided by a factor <b>X</b>.&nbsp; The factor
“<b>X</b> = 10” is suggested in the RFC and this is what IE, Firefox and Chrome use.&nbsp;
So by example, if a resource is 10 weeks old at download, then its cached copy will be treated
as valid for one week from download.</p> 
<p> [...] ]]></description>
		<link>http://blog.ellisons.org.uk/article-58</link>
		<pubDate>Wed, 29 Jun 2011 18:13:04 +0000</pubDate>
		<dc:creator></dc:creator>
        <guid isPermaLink="true">http://blog.ellisons.org.uk/article-58</guid>
		<content:encoded><![CDATA[
	
<p>I find the issues around of Web performance very interesting.&nbsp; I’ve researched pretty
comprehensively and written a few articles in this area. I also routinely use various web tools
to instrument sites that I visit to see how they perform, and quite frankly a lot are middling
to terrible in performance terms.&nbsp; An example is the MoneyCorp GPS application that I discussed
in a <a
href="http://blog.ellisons.org.uk/article-57">previous article</a>, I have subsequently found worse sites (e.g.
the <a href="http://www.radiotimes.com/tv/">RadioTimes TV</a> website which scores 18-25 depending
on page), though in these cases the main effect is a case of loading slowly, rather than failing
to load at all as in the case of GPS.</p> 
<p>However, what I have noticed with some sites is that they seemed to perform reasonably well
from a user-perspective even though Google PageSpeed marks them down for not specifying caching
parameters.&nbsp; The relevant Google recommendations on “<a
href="http://code.google.com/speed/page-speed/docs/caching.html#LeverageBrowserCaching">Leverage
browser caching</a>” describe the use of the HTTP headers <b>Expires</b>, <b>Cache-Control
max-age</b>, <b>Last-Modified</b> and <b>Etag</b> for resources that you wish cached by the
client browser, as I have discussed perviously.&nbsp; This guidance really only relates to the
<i>mandatory</i> rules for browser caching detailed in <a
href="http://www.w3.org/Protocols/rfc2616/rfc2616.html">RFC 2616 (HTTP/1.1)</a> in <a
href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.htm#sec13.2">section 13.2</a>.&nbsp;
However, in addition to these mandatory rules, most browsers also implement an advisory rule
that is discussed in section 13.2.4, and which is based on any <b>Last-Modified</b> header if
provided.&nbsp; If present, then life of the cached resource is the age at download (the delta
of <b>Date</b> and <b>Last-Modified</b> values) divided by a factor <b>X</b>.&nbsp; The factor
“<b>X</b> = 10” is suggested in the RFC and this is what IE, Firefox and Chrome use.&nbsp;
So by example, if a resource is 10 weeks old at download, then its cached copy will be treated
as valid for one week from download.</p> 
<p><a name="endtaster"></a>Apache and IIS both supply <b>Date</b> and <b>Last-Modified</b> headers
by default for file-based resources, hence despite the general “best practice” advice on
specifying these headers, such file-based resources will frequently be cached.</p> 
<p>Conversely, script-based resources will not be unless the code explicitly emits the correct
headers.</p> 
<p>
<br/>
<br/> </p> 
]]></content:encoded>
	</item>
	<item>
		<title><![CDATA[A good web application spoilt by poor Internet performance]]></title>
		<description><![CDATA[
<p><a href="http://static.panoramio.com/photos/original/4655285.jpg"><img
height="180" src="http://mw2.google.com/mw-panoramio/photos/small/4655285.jpg" style="float: RIGHT;" width="240"/></a>My
wife and I have a cottage on the Greek island of Alonissos &ndash; at the very top of the village
in the photo to the right.&nbsp; We like the remoteness, food, walking, swimming, our terrace
with a fantastic view over Aegean and the life-style in general.&nbsp; Internet connectivity
isn&rsquo;t a high priority when I am on the island, so I haven&rsquo;t gone through all the
hassle of getting an ADSL line; I just buy a drink or two at one of the local tavernas and use
its WiFi when I need Internet access.&nbsp; This works fine for all the access that I need: managing
the websites that I look after, Skype, access to my various bank / credit cards services, YouTube,
etc. &ndash; with one annoying exception: I need to transfer money routinely from my UK Sterling
bank account to my Greek Euro one to cover living expenses.</p> 
<p>I use<a href="http://www.moneycorp.com/Moneycorp-GPS/Online-transfers/"> MoneyCorp GPS</a> to
do this.&nbsp; MoneyCorp is a market leader in this Forex sector that offers competitive rates
and seems to be widely recommended (e.g. by the <a
href="http://www.telegraph.co.uk/sponsored/offers/money-deals/8509198/Currency-exchange-is-there-a-right-way.html">Telegraph</a>).&nbsp;
The GPS application provides all functionality that I need except that it is unusable on these
taverna connection [...] ]]></description>
		<link>http://blog.ellisons.org.uk/article-57</link>
		<pubDate>Sat, 11 Jun 2011 18:01:22 +0000</pubDate>
		<dc:creator></dc:creator>
        <guid isPermaLink="true">http://blog.ellisons.org.uk/article-57</guid>
		<content:encoded><![CDATA[
<p><a href="http://static.panoramio.com/photos/original/4655285.jpg"><img
height="180" src="http://mw2.google.com/mw-panoramio/photos/small/4655285.jpg" style="float: RIGHT;" width="240"/></a>My
wife and I have a cottage on the Greek island of Alonissos &ndash; at the very top of the village
in the photo to the right.&nbsp; We like the remoteness, food, walking, swimming, our terrace
with a fantastic view over Aegean and the life-style in general.&nbsp; Internet connectivity
isn&rsquo;t a high priority when I am on the island, so I haven&rsquo;t gone through all the
hassle of getting an ADSL line; I just buy a drink or two at one of the local tavernas and use
its WiFi when I need Internet access.&nbsp; This works fine for all the access that I need: managing
the websites that I look after, Skype, access to my various bank / credit cards services, YouTube,
etc. &ndash; with one annoying exception: I need to transfer money routinely from my UK Sterling
bank account to my Greek Euro one to cover living expenses.</p> 
<p>I use<a href="http://www.moneycorp.com/Moneycorp-GPS/Online-transfers/"> MoneyCorp GPS</a> to
do this.&nbsp; MoneyCorp is a market leader in this Forex sector that offers competitive rates
and seems to be widely recommended (e.g. by the <a
href="http://www.telegraph.co.uk/sponsored/offers/money-deals/8509198/Currency-exchange-is-there-a-right-way.html">Telegraph</a>).&nbsp;
The GPS application provides all functionality that I need except that it is unusable on these
taverna connections: it hangs and times-out.&nbsp; It seems to need a reasonable internet bandwidth
to work robustly, and luckily for me there is an Internet service in the main port on the island
which offers Enet connection and a reasonable Internet bandwidth, so this an inconvenience rather
than a show-stopper. Even so, for an application implementing a money transfer service, I would
expect it to provide a functional service 24x7 from pretty much anywhere in the world, given
a reasonable minimum Internet service.</p> 
<p>So following the theme of my last few articles on optimising web applications for network
and browser performance, I decided to drill down into the GPS application to see just why it
was the worst application that I had measured with <a
href="http://code.google.com/speed/page-speed/docs/extension.html">Google Page Speed</a>, scoring
<b>36</b>/100 (my <a
href="http://blog.ellisons.org.uk/article-58">next article</a> will discuss a few more that I have
subsequently found).&nbsp; As far as I can see, the designers may have developed a good application
in functional terms, but they seem to have ignored some basic rules for web optimisation:</p> <ul> 
<li> 
<p><b>None of the text content is compressed</b>.&nbsp; Rendering the GPS home page requires
downloading some 29 HTML, CSS and JS files.&nbsp; None is compressed.&nbsp; This can be enabled
by a simple configuration option in the appropriate IIS <tt><b>web.config</b></tt> files, so
I am not sure why this hasn&rsquo;t been done.&nbsp; Compressing these 29 resources totalling
460 KiB would reduce this size by roughly a factor of 10.&nbsp; Some admins cite performance
reasons for not doing so, but this is a bogus argument because (i) the CPU overhead on modern
Intel CPUs in minimal; (ii) there is a CPU and context switch overhead saving in avoiding the
larger network transfers, and (iii), the cost of compression also has to be offset against the
saving in only having to encrypt the 10% of the bytestream in the case of HTTPS.&nbsp; <i>This
one is a no brainer</i>.</p> </li> 
<li> 
<p><b>Most static files do not have an expiry date set</b>.&nbsp; There are 22 CSS and JS files
which do not have an associated <tt><b>Expires</b></tt> tag; this is again a simple IIS <tt><b>web.config</b></tt> setting.&nbsp;
The client browser can safely use any locally cached resources when they are tagged with a future
expiry date, without querying the server.&nbsp; If any document has an associated <tt><b>Etag</b></tt> but
no valid <tt><b>Expires</b></tt> date, then the browser must issue a conditional request to
the server to download if <tt><b>Etag</b></tt> is mismatched.&nbsp; This is in effect an RPC
call that optimises out unnecessary downloads.&nbsp; However, even when the download is optimised
away, each RPC call itself still has an associated round trip time.&nbsp; Also whilst modern
browsers support multiplexing of requests, most limit the concurrent number to six, so issuing
these requests will block and serialise other load activities.&nbsp; <i>So again, setting expiries
at say now+1 month is a no brainer</i>.</p> </li> 
<li> 
<p><b>The main document is 80% boilerplate</b>.&nbsp; There is a ticker-tape currency bar on
the GPS home page when logged in: occasionally a nice little goodie, but not really core functionality.&nbsp;
However, this content is common to <i>all</i> user home pages and it comprises 80% of the document
(some 180 KiB out of 223 KiB).&nbsp; Scrolling is already handled by a javascript so why on earth
doesn&rsquo;t the script just do an AJAX-style asynchronous (compressed) XML/JSON load of this
table data?&nbsp; The ticker-tape is above this business content so the <tt><b>&lt;div&gt;</b></tt> for
the ticker tape is easiest placed ahead of this content in the HTML body, but there is absolutely
no reason to place the ticker-tape script here.&nbsp; Best practice is to place all such furniture
scripts at the foot of the body so that their invocation will not block or or prevent the actual
main page content loading.&nbsp; Doing this and compressing the remaining document content would
reduce the main document size from the current <b>223 KiB</b> to l<b>ess than 12 Kib</b>.&nbsp;
This is a simple application change using standard functionality within the <a
href="http://aspnetajax.componentart.com/">ComponentArt Web UI toolkit</a> used by this application.&nbsp;
There are various other dubious implementation details, and in my experience the current implementation
is something that a novice might do.</p> </li> 
<li> 
<p><b>Marshal CSS and JS files</b>.&nbsp; This is more of a nice-to-have as most users will
be repeat visitors for this use-case, and therefore (<i>if static CSS and JS have appropriate
expiry dates)</i> the browser will optimise away any such server requests by using locally cached
content.&nbsp; Nonetheless, applications are usually more responsive, especially on initial load,
if the CSS files are grouped and collated into composite style sheets: 3 big style sheets are
better than 22 small ones.&nbsp; Likewise, server-side javascript aggregation helps improve performance.&nbsp;
There seem to be two main group of script files: the GPS application ones, and the Web UI ones.&nbsp;
It is simpler and better to aggregate and &ldquo;minify&rdquo; the former into a compressed composite
as a one-off server-side function, and then load this once into the local cache of each user.&nbsp;
In the Web UI case, again these are loaded one file per UI component, through I note that the
ComponentArt website <i>does</i> marshal Web UI components on its own pages to reduce the numbers
of script files.&nbsp; Another benefit of marshalling is that any compression can be done as
a pre-process to remove the per-request overhead as I did in my <a
href="http://blog.ellisons.org.uk/article-46">TinyMCE
loader</a>.</p> </li> </ul> 
<p>Having browsed some of the other pages such as the <b>Transfers</b> and <b>Payments</b> functions,
I see that javascript components seem to be randomly loaded from either <tt><b>WebResource.axd</b></tt> or
<tt><b>ScriptResource.axd</b></tt>, with the latter being compressed, but not the former.&nbsp;
Bizarre.</p> 
<p>So in previous articles I&rsquo;ve discussed how to optimise your own web applications and
packages such as phpBB.&nbsp; Here we have an example of a corporate B2C application that is
undermined by sloppy implementation details and ignoring some basic rules for good web performance:</p> <ul> 
<li> 
<p>don't send content over the net that doesn't need to be (i.e. can be cached in the client
browser);</p> </li> 
<li> 
<p>set up the transfer defaults so that the browsers and servers don't even need to &ldquo;talk&rdquo;
about transfer if this isn't needed;</p> </li> 
<li> 
<p>when you do need to send content, make sure that it's compressed;</p> </li> 
<li> 
<p>sensibly glob up content that you do need to send, because this is more efficient and responsive.</p> </li> </ul> 
<p>I am not sure of the exact reasons for the time-out because I haven&rsquo;t any access to
the server-side logs, but the system is clearly struggling.&nbsp; The secure HTTPS protocol is
a must for this type of financial application, and this complicates some techniques for page
optimisation.&nbsp; Even so, GPS is sending out 223 &ndash; 595 KiB (depending on caching) over
this HTTPS stream, when with a few tweaks it needs to send 15 KiB to render the home page, plus
another 15 KiB or so asynchronously to initialise the ticker-tape.&nbsp; This poor implementation
means that the system traffic is perhaps 5-10x greater than it needs to be, and any poor performance
is very much self inflicted.&nbsp; All I hope is that MoneyCorp manage to fix this before the
autumn, so that I can do my Forex transactions from my local taverna on my next visit to the
island!</p>
]]></content:encoded>
	</item>
	<item>
		<title><![CDATA[The Anatomy and Timing of a Web Request – Part II]]></title>
		<description><![CDATA[

<p>In <a href="/article-55">Part I</a> of this analysis, I looked at the overall timeline of
viewing a webpage, and my main recommendations were:</p> <ul> 
<li> 
<p>The correct webserver configuration is essential to ensure that files are correctly cached
at the client’s browser and compressed to ensure that any network transfer times are kept to
an absolute minimum when content is transferred over the internet.</p> </li> 
<li> 
<p>The application should be written to minimise the number of supporting files needed to render
the page by aggregating content where practical.&nbsp; It should also ensure that when the client
browser revalidates scripted content that the script processes the request correctly and issues
a status 304 request when appropriate.</p> </li></ul> 
<p>Whilst the application changes are beyond the scope of most application installers, getting
the webserver correct, through properly configured <tt><b>.htaccess</b></tt> files can easily
improve response times by a factor of 3 or more.&nbsp; Having done this though, the application
response for the delivery of the main document content becomes the main performance constraint
and I want to use phpBB to explore the factors which drive this response time.&nbsp; Whilst I
realise that this article is in many ways a reprise of two earlier articles, it is clear from
my dialogue on the phpBB developers forum that we continue to talk at cross purposes, so I wanted
to drill down and parameterise some of these performance factors to put quantitative numbers
of this responsiveness.</p> 
<p> [...] ]]></description>
		<link>http://blog.ellisons.org.uk/article-56</link>
		<pubDate>Mon, 04 Apr 2011 22:12:26 +0000</pubDate>
		<dc:creator></dc:creator>
        <guid isPermaLink="true">http://blog.ellisons.org.uk/article-56</guid>
		<content:encoded><![CDATA[

<p>In <a href="/article-55">Part I</a> of this analysis, I looked at the overall timeline of
viewing a webpage, and my main recommendations were:</p> <ul> 
<li> 
<p>The correct webserver configuration is essential to ensure that files are correctly cached
at the client’s browser and compressed to ensure that any network transfer times are kept to
an absolute minimum when content is transferred over the internet.</p> </li> 
<li> 
<p>The application should be written to minimise the number of supporting files needed to render
the page by aggregating content where practical.&nbsp; It should also ensure that when the client
browser revalidates scripted content that the script processes the request correctly and issues
a status 304 request when appropriate.</p> </li></ul> 
<p>Whilst the application changes are beyond the scope of most application installers, getting
the webserver correct, through properly configured <tt><b>.htaccess</b></tt> files can easily
improve response times by a factor of 3 or more.&nbsp; Having done this though, the application
response for the delivery of the main document content becomes the main performance constraint
and I want to use phpBB to explore the factors which drive this response time.&nbsp; Whilst I
realise that this article is in many ways a reprise of two earlier articles, it is clear from
my dialogue on the phpBB developers forum that we continue to talk at cross purposes, so I wanted
to drill down and parameterise some of these performance factors to put quantitative numbers
of this responsiveness.</p> 
<p><a name="endtaster"></a>To do this, I wrote a small script (see <a
href="http://blog.ellisons.org.uk/56#Listing1">Listing
1</a>) which I ran on local server and used this to “hit” the harness (see <a
href="http://blog.ellisons.org.uk/56#Listing2">Listing 2</a>) at my Webfusion based forum (pointed to by <tt><b>$site</b></tt>)
using this script with the output logged to a file.&nbsp; Running the three timings serially
like this is safe because the file itself is in the target directory so all of the path Inodes
are resolved outside the timing windows, and the three variants load disjoint versions of the
phpBB <tt><b>common.php</b></tt> include hierarchy:</p> <ul> 
<li> 
<p><a href="http://files.ellisons.org.uk/phpBB/LatencyScatterPlot.png"><span
 style="color : #000080;"><img
border="1" height="209" src="http://files.ellisons.org.uk/phpBB/LatencyScatterPlotThumb.png" style="float:RIGHT;" width="248" align="RIGHT"/></span></a>the
‘out-of-the box’ load which hierarchically loads 14 source files comprising some 15.5K lines
of source;</p> </li> 
<li> 
<p>a compacted version of these 14 files which comprises 1 source file with whitespace collapsed,
comments removed and therefore a reduced 11K lines of source;</p> </li> 
<li> 
<p>a variant of the 1 source file where the single source file is <b>gzip</b> compressed and
included through a PHP zlib stream.</p> </li></ul> 
<p>I let this run for 36 hrs and dropped the results into a spreadsheet for analysis.&nbsp;
Figure 1 to the right (click to enlarge) shows a scatter plot and linear regression fits for
the three cases with the delay between queries in minutes as the X axis and time taken in mSec
as the logarithmic Y axis (Hence the linear trends looking bendy).&nbsp; The formulae give the
time in millseconds with <b>N</b> the number of minutes delay.</p> <ul> 
<li> 
<p><b>14 files</b>: 77 + 15 N (Blue X’s and trend line)</p> </li> 
<li> 
<p><a href="http://files.ellisons.org.uk/phpBB/ErrorHistogram.png"><span
 style="color : #000080;"><img
border="1" height="190" src="http://files.ellisons.org.uk/phpBB/ErrorHistogram.png" style="float:RIGHT;" width="331" align="RIGHT"/></span></a><b>1
file</b>: 71 + 0.1 N (Red Squares and trend line)</p> </li> 
<li> 
<p><b>1 gzip</b> <b>file</b>: 77 + 0.4 N (Yellow Diamonds and trend line)</p> </li></ul> 
<p>Figure 2 shows the error histograms around these three linear fits.&nbsp; The 14 file case
has a standard deviation of 211 mSec and a typical log-normal skew so the top 10% are &gt; 250
mSec greater than the trend.&nbsp; The variance on the other two cases is bugger-all (3 mSec
for the 1 file case), so in other words the response for the 1 file compacted case is a solid
71-75 mSec, the 14 file load is double that at a 5 minute interval, treble at a 10 minute interval
and the actual times are all over the place, so the top 10% about <i><b>6x</b></i> the single
file load version.&nbsp; The 71 mSec pretty much represents a base load as this inclusion is
roughly 75% of the code needed to be compiled.</p> 
<p>To put this in context, the complete time to execute the PHP script for, say, <tt><b>viewforum.php</b></tt> is
as follows.&nbsp; I have given the ballpark times in mSec for two cases: (a) where phpBB is running
on a dedicated server or VM with the system, Apache, PHP and MySQL tuned to memory-cache hot
code, tables, and files so as to avoid unnecessary physical I/O, etc, and (b) on a shared service
such as Webfusion’s.</p> 
<table width="756"> <col width="88"/> <col width="73"/> <col width="583"/> 
<tr> <td> 
<p>Time (mSec) &nbsp;&nbsp;Dedicated</p> </td> <td> 
<p>Time (mSec) &nbsp;&nbsp;Shared</p> </td> <td> 
<p>Description of Component</p> </td> </tr> 
<tr> <td> 
<p>0</p> </td> <td> 
<p>80</p> </td> <td> 
<p>PHP impage activation.&nbsp; Not needed on Dedicated as PHP is permanently mapped into Apache
through mod_php</p> </td> </tr> 
<tr> <td> 
<p>0</p> </td> <td> 
<p>70</p> </td> <td> 
<p>Compile ~20,000 lines of PHP needed to run the viewtopic function.&nbsp; Not needed on Dedicated
as PHP code is cached in Opcode Cache.</p> </td> </tr> 
<tr> <td> 
<p>0</p> </td> <td> 
<p>10 – 50</p> </td> <td> 
<p>Fetch cached variables from ACM file cache.&nbsp; Not needed on Dedicated as variables cached
in memory ACM cache.</p> </td> </tr> 
<tr> <td> 
<p>15</p> </td> <td> 
<p>15</p> </td> <td> 
<p>Execute PHP opcodes needed to run the viewtopic function.</p> </td> </tr> 
<tr> <td> 
<p>15</p> </td> <td> 
<p>30</p> </td> <td> 
<p>Execute SQL queries needed to run the viewtopic function.&nbsp; Significantly less on Dedicated
as (i) connection to the mysql instance will be by local socket rather than TCP to database server;
(ii) the tables will be memory cached on a dedicated D/B.</p> </td> </tr> 
<tr> <td> 
<p>0</p> </td> <td> 
<p>50-500</p> </td> <td> 
<p>Delays in loading data from NFS served files.&nbsp; On Dedicated, these will be fully cached
in the VFC.</p> </td> </tr> 
<tr> <td> 
<p>30</p> </td> <td> 
<p>30</p> </td> <td> 
<p>Apache Handling Overhead and stream compression.</p> </td> </tr> 
<tr> <td> 
<p>0</p> </td> <td> 
<p>0 – 1500</p> </td> <td> 
<p>Download any uncached images if Apache caching not properly defined in <tt><b>.htaccess</b></tt> files.&nbsp;
This will usually be correctly set up on Dedicated.</p> </td> </tr> 
<tr> <td> 
<p><b>60 
<br/></b> 
<br/> </p> </td> <td> 
<p><b>285-2275 
<br/></b> 
<br/> </p> </td> <td> 
<p><b>Total 
<br/></b> 
<br/> </p> </td> </tr> </table> 
<p>The reason that I’ve said “ballpark” about these timings is that I’ve collected them
on separate timing tests.&nbsp; One thing that I want to add to phpBB is a <tt><b>DEBUG_TIMING</b></tt> option
which can be turned on in <tt><b>config.php</b></tt>, so that these stats can be properly to
the error log.&nbsp; Nonetheless, the overall pattern is correct: phpBB is a fairly lightweight
application, and the per-request load an a dedicated VM or system should be excellent.&nbsp;
However the timings for running it on a shared server are different.&nbsp; The service time roughly
0.3 – 2.5 seconds mainly depending how the forum administrator has configured the service (through
the <tt><b>.htaccess</b></tt> file) to maximise caching. </p> 
<p>File access is the main performance killer on shared services, since Virtual Filesystem Cache
(VFC) is rapidly flushed by contention with the other applications.&nbsp; However, the data storage
architecture on Linux services is mitigates this because the VFC acts as a first level cache,
with the file system then accessing the shared NAS using an RPC tunnelled NFS protocol.&nbsp;
The NAS itself has a large RAM cache, so active files are normally in cache.&nbsp; As a consequence
the second order file I/O penalty is that of the RPC transport itself, which is again mitigated
because the network drivers including the TCP stack run in kernel mode and the IP Gbit Enet switched
infrastructure use jumbo frames, and the NFS (at least on the Webfusion services) is configured
with a 32 Kbyte blocksize.&nbsp; Hence the overhead for a VFC cache-miss is RPC to the NAS server
and this is only a few milliseconds per I/O if the data content is itself in the NAS cache.</p> 
<p>However, because phpBB needs to read some 30+ code and data files per web request in its
default configuration, as the gap between transactions increases a higher percentage of these
I/Os result in RPC overheads which rapidly aggregate up to become the major component of request
response.</p> 
<p>Lastly note that all of the times relate to the service time for the request. As soon as
the server starts to become resource limited then queuing will add multiplier factors to these
delays. </p> 
<hr/> 
<h2><a name="Listing1"></a>Listing I</h2> 
<pre>file=&quot;$site/common_comp.php
for ((i=240; $i; i=$i-1)); do
  sleep $($php -r 'echo rand(1,900);')
  date &quot;+%d %H:%M:%S&quot;
  $wget -o /dev/null -O - &quot;$file?v=0&quot;
  $wget -o /dev/null -O - &quot;$file?v=1&quot;
  $wget -o /dev/null -O - &quot;$file?v=2&quot;
done</pre> 
<h2> <a name="Listing2"></a>Listing 2</h2> 
<pre><span  style="color : #0000bb;">&lt;?php</span>
<span  style="color : #ff8000;">//the float version of microtime truncates the LSDs </span>
<span
 style="color : #007700;">function </span><span  style="color : #0000bb;">now</span><span
 style="color : #007700;">() {</span><span  style="color : #0000bb;">$x </span><span
 style="color : #007700;">= </span><span  style="color : #0000bb;">explode</span><span
 style="color : #007700;">(</span><span  style="color : #dd0000;">' '</span><span
 style="color : #007700;">, </span><span  style="color : #0000bb;">microtime</span><span
 style="color : #007700;">()); return ( (</span><span  style="color : #0000bb;">$x</span><span
 style="color : #007700;">[</span><span  style="color : #0000bb;">1</span><span
 style="color : #007700;">] % </span><span  style="color : #0000bb;">1000</span><span
 style="color : #007700;">) + </span><span  style="color : #0000bb;">$x</span><span
 style="color : #007700;">[</span><span  style="color : #0000bb;">0</span><span
 style="color : #007700;">] ); }  </span>
<span  style="color : #0000bb;">$path</span><span
 style="color : #007700;">=</span><span  style="color : #0000bb;">realpath</span><span
 style="color : #007700;">(</span><span  style="color : #dd0000;">'.'</span><span
 style="color : #007700;">);</span>
<span  style="color : #0000bb;">$common</span><span
 style="color : #007700;">=array(</span><span  style="color : #dd0000;">'./_debug_common.php'</span><span
 style="color : #007700;">, </span><span  style="color : #dd0000;">'./common.php'</span><span
 style="color : #007700;">, </span><span  style="color : #dd0000;">&quot;php://filter/zlib.inflate/resource=common.php.gz&quot;</span><span
 style="color : #007700;">);</span>
<span  style="color : #0000bb;">$v </span><span
 style="color : #007700;">= (int) </span><span  style="color : #0000bb;">$_GET</span><span
 style="color : #007700;">[</span><span  style="color : #dd0000;">'v'</span><span
 style="color : #007700;">];</span>
<span  style="color : #0000bb;">$start </span><span
 style="color : #007700;">= </span><span  style="color : #0000bb;">now</span><span
 style="color : #007700;">();</span>
<span  style="color : #0000bb;">define</span><span
 style="color : #007700;">(</span><span  style="color : #dd0000;">'IN_PHPBB'</span><span
 style="color : #007700;">, </span><span  style="color : #0000bb;">true</span><span
 style="color : #007700;">);</span>
<span  style="color : #0000bb;">$phpbb_root_path </span><span
 style="color : #007700;">= (</span><span  style="color : #0000bb;">defined</span><span
 style="color : #007700;">(</span><span  style="color : #dd0000;">'PHPBB_ROOT_PATH'</span><span
 style="color : #007700;">)) ? </span><span  style="color : #0000bb;">PHPBB_ROOT_PATH </span><span
 style="color : #007700;">: </span><span  style="color : #dd0000;">'./'</span><span
 style="color : #007700;">;</span>
<span  style="color : #0000bb;">$phpEx </span><span
 style="color : #007700;">= </span><span  style="color : #0000bb;">substr</span><span
 style="color : #007700;">(</span><span  style="color : #0000bb;">strrchr</span><span
 style="color : #007700;">(</span><span  style="color : #0000bb;">__FILE__</span><span
 style="color : #007700;">, </span><span  style="color : #dd0000;">'.'</span><span
 style="color : #007700;">), </span><span  style="color : #0000bb;">1</span><span
 style="color : #007700;">);</span>
<span  style="color : #0000bb;">header</span><span
 style="color : #007700;">(</span><span  style="color : #dd0000;">&quot;Content-Type: text/plain&quot;</span><span
 style="color : #007700;">);</span>
<span  style="color : #ff8000;">//echo &quot;including $common[$v]\n&quot;; </span>
<span
 style="color : #0000bb;">$status </span><span  style="color : #007700;">= require( </span><span
 style="color : #0000bb;">$common</span><span  style="color : #007700;">[</span><span
 style="color : #0000bb;">$v</span><span  style="color : #007700;">] );</span>
<span
 style="color : #0000bb;">$elapsed </span><span  style="color : #007700;">= </span><span
 style="color : #0000bb;">now</span><span  style="color : #007700;">() - </span><span
 style="color : #0000bb;">$start</span><span  style="color : #007700;">;</span>
<span
 style="color : #0000bb;">printf</span><span  style="color : #007700;">(</span><span
 style="color : #dd0000;">&quot;%-50.50s %9.6F (require status=%u)\n&quot;</span><span  style="color : #007700;">, </span><span
 style="color : #0000bb;">$common</span><span  style="color : #007700;">[</span><span
 style="color : #0000bb;">$v</span><span  style="color : #007700;">], </span><span
 style="color : #0000bb;">$elapsed</span><span  style="color : #007700;">, </span><span
 style="color : #0000bb;">$status</span><span  style="color : #007700;">);</span></pre> 
]]></content:encoded>
	</item>
	<item>
		<title><![CDATA[The Anatomy and Timing of a Web Request – Part I]]></title>
		<description><![CDATA[
	
<p>My academic background was mathematics, specialising in operations research and statistics.&nbsp;
I put this to good use when I first started my career in an IT consultancy, in working one the
development and use of detailed simulations of large-scale Army Command and Control Systems.&nbsp;
You might wonder what on earth modelling large voice communications networks over thirty years
ago has to do with using modern web services, but in fact the conceptualisation and analytic
techniques are very similar, (though the time constants involved have shrunk from seconds to
milliseconds).&nbsp; This foundation in modelling and analysing large communicating sequential
systems has proved invaluable in my work in systems optimisation during my career, and has influenced
my approach to systems engineering. </p> 
<p>What I show in this article is that:</p> <ul> 
<li>
<p>Correctly optimising your webserver configuration to ensure that files are correctly cached
at the client’s browser and compressed to ensure that any network transfer times are kept to
an absolute minimum when content is transferred over the internet.</p> </li>
<li>
<p>The application should be written to minimise the number of supporting files needed to render
the page by aggregating content where practical.&nbsp; It should also ensure that when the client
browser revalidates scripted content that the script processes the request correctly and issues
a status 304 request when appropriate.</p> </li></ul> 
<h2>Using the Google Chrome network analyser</h2> 
<p>I now want to explore a typical phpBB query in depth, and in one specific case: displaying
the phpBB community forum’s board index.&nbsp; To understand what goes on in the interval between
a user clicking the <a href="http://www.phpbb.com/community/index.php">board index</a> link
and the time taken to complete page is assembled for viewing, you need to have a suitable tool
to instrument this process.&nbsp; I recommend using <a href="http://www.google.com/chrome">Google
Chrome</a>, because the developer tools that you need are part of the standard download (though
most modern browsers have an add-on pack which provides similar functionality).&nbsp; The main
view that I will use to do this instrumentation is the <a
href="http://code.google.com/chrome/devtools/docs/network.html">Network Panel</a>.&nbsp; You
can access this when visiting any website with Chrome by typing <b>Shift+Ctrl+i</b> whilst viewing
the page. </p> 
<p> [...] ]]></description>
		<link>http://blog.ellisons.org.uk/article-55</link>
		<pubDate>Sat, 02 Apr 2011 13:39:49 +0000</pubDate>
		<dc:creator></dc:creator>
        <guid isPermaLink="true">http://blog.ellisons.org.uk/article-55</guid>
		<content:encoded><![CDATA[
	
<p>My academic background was mathematics, specialising in operations research and statistics.&nbsp;
I put this to good use when I first started my career in an IT consultancy, in working one the
development and use of detailed simulations of large-scale Army Command and Control Systems.&nbsp;
You might wonder what on earth modelling large voice communications networks over thirty years
ago has to do with using modern web services, but in fact the conceptualisation and analytic
techniques are very similar, (though the time constants involved have shrunk from seconds to
milliseconds).&nbsp; This foundation in modelling and analysing large communicating sequential
systems has proved invaluable in my work in systems optimisation during my career, and has influenced
my approach to systems engineering. </p> 
<p>What I show in this article is that:</p> <ul> 
<li>
<p>Correctly optimising your webserver configuration to ensure that files are correctly cached
at the client’s browser and compressed to ensure that any network transfer times are kept to
an absolute minimum when content is transferred over the internet.</p> </li>
<li>
<p>The application should be written to minimise the number of supporting files needed to render
the page by aggregating content where practical.&nbsp; It should also ensure that when the client
browser revalidates scripted content that the script processes the request correctly and issues
a status 304 request when appropriate.</p> </li></ul> 
<h2>Using the Google Chrome network analyser</h2> 
<p>I now want to explore a typical phpBB query in depth, and in one specific case: displaying
the phpBB community forum’s board index.&nbsp; To understand what goes on in the interval between
a user clicking the <a href="http://www.phpbb.com/community/index.php">board index</a> link
and the time taken to complete page is assembled for viewing, you need to have a suitable tool
to instrument this process.&nbsp; I recommend using <a href="http://www.google.com/chrome">Google
Chrome</a>, because the developer tools that you need are part of the standard download (though
most modern browsers have an add-on pack which provides similar functionality).&nbsp; The main
view that I will use to do this instrumentation is the <a
href="http://code.google.com/chrome/devtools/docs/network.html">Network Panel</a>.&nbsp; You
can access this when visiting any website with Chrome by typing <b>Shift+Ctrl+i</b> whilst viewing
the page. </p> 
<p><a name="endtaster"></a><a
href="http://files.ellisons.org.uk/phpBB/Chrome-NetworkView.png"><span
style="color:#000080;"><img
border="1" height="263" src="http://files.ellisons.org.uk/phpBB/Chrome-NetworkViewThumb.png" width="248" align="RIGHT"style="float:RIGHT;"/></span></a>What
I want to do first is to discuss the base case when the browsers local cache is empty.&nbsp;
This Chrome explorer window is active so you can click on different options and objects to drill
down for more information.&nbsp; Alternatively, you can right-click and select “Export all
to HAR” which copies the analysis content to the clipboard in JSON format, then paste this
into a test editor and save it to file.&nbsp; I have written a simple PHP filter, (which you
can download from <a href="http://files.ellisons.org.uk/phpBB/HARtoTSV.php.zip">here</a>) to
convert this HAR format into tab-separated variable (TSV) for loading into Calc or Excel for
further analysis (which is what I do).&nbsp; To give you some idea of what this tool does, the
screen snapshot to the right shows the display for this index page.&nbsp; (Click on the image
to get the full-size version).</p> 
<p>If you look at the zoomed figure, then you will see that it took my browser some <b>5.02</b> <b>seconds</b> to
download the 55 files comprising some 314 Kbytes needed to render the page.&nbsp; Revisiting
this page later in the same browsing session took some <b>1.06</b> <b>seconds</b> and this time
nearly all files were already cached locally on the PC with only 2 files comprising 9 Kbytes
needing to be downloaded.&nbsp; If I did a “page refresh” then the time increased to <b>3.51
seconds</b> but with still only 2 files comprising 9 Kbytes needing to be downloaded.</p> 
<p>Note that the figures that I quote here are the main milestone reported by the network tool,
and that is when the main page has been rendered and is stable to the viewer.&nbsp; The DOM content
load might be an earlier time milestone, but if images of HTML/CSS-undefined size are to be viewed,
the view will still jerk around as these load.&nbsp; Further images may continue load and javascript
download and fill in content (but this are not visually distracting to the user reading the page)
before the “onload” event fires.&nbsp; So the network tool reports all three separately.</p> 
<p>The user experience on these extremes is very different: on the first view the page is terribly
sluggish and stuttered during loading; on the second it is almost a case of click-2-3-bang and
the page was displayed.&nbsp; Note that whilst the exact timing will vary from request to request,
the general timing pattern will be similar.&nbsp; I want to look at little more into the loading
timeline and consider why were these are so different.&nbsp; In the HTTP 1.1 protocol, the individual
requests are multiplexed over shared TCP/IP connections, and by convention, the browser will
open up at most two such streams per host site referenced.&nbsp; Each request can be thought
of as a <a href="http://en.wikipedia.org/wiki/Remote_procedure_call">Remote Procedure Call (RPC)</a> from
the browser, where the input parameters are a set of request headers, and the output is a set
of response headers with an optional content. </p> 
<h2>The timeline for the phpBB communitiy website bulletin index</h2> 
<table width="899"> <col width="176"/> <col width="715"/> 
<tr> <td> 
<p><b>Timeline (mSec) 
<br/><span style="color:#4c4c4c;">(cache empty)</span> 
<br/><span style="color:#b84747;">(pre-cached) </span></b> </p> </td> <td> 
<p><b>Description</b></p> </td> </tr> 
<tr> <td> 
<p>0 - 873
<br/><span style="color:#4c4c4c;">0 - 975</span>
<br/><span style="color:#b84747;">0 – 898</span></p> </td> <td> 
<p>The core <b>index.php</b> request does the bulk of the application processing and outputs
HTML document.&nbsp; As pbpBB does all its processing before it generates its output, this appears
as delay then a content download from the browser perspective.&nbsp; The download is streamed
into the browser parser, and the first thing that it does parse the HTML header, so the header
javascript and CSS files referenced can be processed before the content has been loaded.</p> </td> </tr> 
<tr> <td> 
<p>598 – 2,132
<br/><span style="color:#4c4c4c;">667 - 706</span>
<br/><span style="color:#b84747;">643 - 1.498</span></p> </td> <td> 
<p>The 5 javascript files linked in the header are downloaded.&nbsp; These total 107Kbytes and
are uncompressed.</p> </td> </tr> 
<tr> <td> 
<p>1,903 – 3,204
<br/><span style="color:#4c4c4c;">708 - 732</span>
<br/><span style="color:#b84747;">1,255 – 1,986</span></p> </td> <td> 
<p>The 9 CSS files linked in the header are downloaded.&nbsp; These total 85Kb and the static
CSS files are uncompressed.&nbsp; Once these are downloaded the browser now has enough information
to start to render the page.&nbsp; This takes a few tens of mSec.&nbsp; If the image tags have
defined dimensions then the browser can allocate frame-space to hold each image ahead of loading
it and subsequently avoid reflow of the page as images download.</p> </td> </tr> 
<tr> <td> 
<p>3,246 - 5,018
<br/><span style="color:#4c4c4c;">733 - 1,024</span>
<br/><span style="color:#b84747;">2,043 – 3,658</span></p> </td> <td> 
<p>The 40 files linked in the CSS and the HTML body are downloaded.&nbsp; Image files are already
in a compressed format, so no further compression is necessary.&nbsp; These total 124 Kbytes.&nbsp;
As each is downloaded, the browser adds it to the page.</p> </td> </tr> </table> 
<p>There is a big difference for the user between a 1 second and a 5 second delay in rendering
a web page.&nbsp; These are typical timings when the phpBB site was relatively lightly loaded.&nbsp;
The variance in response means that the worst quartile will be a lot longer than these times.&nbsp;
The phpBB admins have got a reasonable set of Apache configuration settings have also configured
a <a href="http://en.wikipedia.org/wiki/Varnish_%28software%29">Varnish</a> caching accelerator
to boost response, but these settings still fall short of optimum.&nbsp; I find the refresh time
of ~3.5 seconds quite worrying.&nbsp; A typical phpBB installer using a shared hosting service
with default Apache settings would perform at or below this 5 second mark.&nbsp; This is quite
unnecessary as the <tt><b>.htaccess</b></tt> files can be used to achieve most of the required
performance gains.</p> 
<p>I have also got an ODS spreadsheet detail (<a
href="http://files.ellisons.org.uk/phpBB/phpBBboardIndex.ods">here</a>) for those interested
in looking at the detailed figures.&nbsp; But the main points that I want to emphasise are: </p> <ul> 
<li>
<p>Loading a single web page results in a cascade of secondary requests needed to view the complete
page. </p> </li>
<li>
<p>Most of these (the static elements) can be cached on the user’s local browser cache and
you need to ensure that your web server is configured wherever possible (in the <tt><b>&lt;VirtualHost&gt;</b></tt> section
or <tt><b>.htaccess</b></tt> files on Apache and the <tt><b>web.config</b></tt> file on IIS)
to inform the client browser that they can be cached.</p> </li>
<li>
<p>You need to be aware that many users configure their browsers to revalidate local cache entries
once-per-session or even on every page, so you need to configure the <tt><b>Etag</b></tt> or
<tt><b>Last-Modified</b></tt> responses so that the browser can bypass the download.&nbsp; Scripted
content can also make use of cache negotiation, but again the script author must let the browser
know that it can negotiate by setting these responses, and I cover this point in further detail
below.. </p> </li>
<li>
<p>If the content is being generated by script then remember that users will typically view
your site in a burst sequence, visiting maybe 4-6 pages in one session, so script authors need
to ask themselves “is this information likely to change on every view, once-per-session, or
even very occasionally and set the caching parameters accordingly.</p> </li>
<li>
<p>I have configured my OpenOffice.org phpBB forums to maximise possible caching, but the split
is still roughly 75% of user page requests which have locally cached all of this “content furniture”
vs. 25% of requests which either download or renegotiate these additional files.&nbsp; This 25%
miss-rate may seem a good ‘batting average’, but remember that this will trigger 30+ further
requests to the webserver. </p> </li>
<li>
<p>Whenever possible configure the browser (or PHP in the case of scripts) to compress the output.&nbsp;
This has minimal compute overhead on the server, but can make a dramatic impact on perceived
user response.</p> </li>
<li>
<p>Applications developers should <i><b>minimise the number of files needed to display a given
page</b></i>, because each separate file brings with it network overheads and per request load
on the server.&nbsp; (For example, the Google homepage needs 12 requests, my blog needs 7 requests
and the phpBB community board index needs 55 requests.)&nbsp; This is done by aggregating content
whenever possible:</p> <ul> 
<li>
<p>Images used for enriching the page can be grouped together using a technique know as CSS
sprites.&nbsp; See <a
href="http://blog.ellisons.org.uk/article-48">So why CSS Sprites?</a> for a discussion of how I did this on my blog.</p> </li>
<li>
<p>Files such as cascading style sheets and javascripts can be aggregated server-side on a one-per-change
basis and the aggregate version reference in the HTML page for a single inclusion.&nbsp; (This
is the approach adopted by TinyMCE and it uses a PHP server-side loader / aggregator that I helped
develop.&nbsp; See <a
href="http://blog.ellisons.org.uk/article-46">Rounding off the use of TinyMCE for WYSIWYG editing</a> for more details.) </p> </li></ul> </li>
<li>
<p>The per-request overhead is also dependent on the number of cookies defined for that site
as the browser passes a copy of all cookies for a given site back to the server with every request.&nbsp;
In the case of phpBB extra 0.5Kb upload for 54 of the requests.&nbsp; For this reason, many sites
use a resource-only shadow site to server the static furniture.&nbsp; As this is a second (cookie-less)
site, these requests don’t carry this overhead.&nbsp; Also most browsers will parallel up request
streams to separate sites. So most high performance sites use this trick.&nbsp; (Google uses
<a href="http://ssl.gstatic.com/">ssl.gstatic.com</a>, Wikipedia uses <a
href="http://bits.wikimedia.org/">bits.wikimedia.org</a>.) </p> </li>
<li>
<p>Applications developers also need to be aware of the inter-dependencies between files needed
to render the webpage.&nbsp; These can serialise download activities and extend the total time
needed to render the page. Examples here include:</p> <ul> 
<li>
<p>Load the CSS needed to render the page ASAP, and this means putting all CCS definitions and
links in the HTML header.&nbsp; Don’t nest CCS style sheets using the <tt><b>@import</b></tt> directive.
</p> </li>
<li>
<p>If Javascript links are placed in HTML header then the browser will load them before rendering
the page, so if they only implement action processing (e.g. on user clicks) them move them into
the HTML body.</p> </li></ul> </li></ul> 
<p>These guidelines are discussed in detail on the <a
href="http://code.google.com/speed/page-speed/docs/rules_intro.html">Google Web Performance
Best Practices</a> page. </p> 
<h2>Specific comments on the phpBB website</h2> 
<p>Just looking at this page, I can cast these general observations about phpBB and this site.
Given the expertise of the phpBB team, The fact that I can make them about this site shows that
they can make quite a difference to site performance if you aren’t careful. This is more an
issue of website configuration rather than an application coding one, because my OOo community
forums run over 2x faster than the phpBB ones (<b>1.78s</b> vs <b>5.02s</b> for clear-cache
load and <b>0.47s</b> vs <b>1.06s</b> for primed-cache load on the trial that I did.)</p> <ul> 
<li>
<p><b>[Defect]</b> The Apache setting for the phpBB site do not enable compression on static
javascript and CSS files.&nbsp; (Note that <b>style.php</b> enables zlib compression, but the
Chrome tool doesn’t report this.&nbsp; I did some double checking here because the network
delay was not consistent with the reported size.&nbsp; This turns out that if the response does
not contain a <tt><b>Content-Size</b></tt> header then Chrome reports the final size not the
size transferred.&nbsp; I checked this by commenting out the <tt><b>ob_start('ob_gzhandler');</b></tt> statement
and letting the output fall through to be compressed by the mod_deflate handler . Chrome now
reports the transfer as <b>67.45KB</b> / <b>12.79KB</b> but the timing is the same.)</p> </li>
<li>
<p><b>[Defect]</b> The phpBB <b>style.php </b>script neither defines nor processed the <tt><b>Etag</b></tt> or
<tt><b>Last-Modified</b></tt> responses, so the entire stylesheet can be unnecessarily downloaded
on page refresh.</p> </li>
<li>
<p><b>[Enhancement]</b> Javascripts that are linked into the HTML header but should be moved
into the HTML body</p> </li>
<li>
<p><b>[Enhancement]</b> The current template downloads (5) multiple javascripts, when these
could be marshalled and compressed for a <i>single</i> aggregate inclusion in the HTML body.&nbsp;
 </p> </li>
<li>
<p><b>[Enhancement]</b> The same aggregator callback approach could be adopted for style sheets
to reduce the number and size to be downloaded.&nbsp; Say in this case from 9 files comprising
~150Kbyte to 1 of 40Kbytes.</p> </li>
<li>
<p><b>[Enhancement]</b> The number of images could be significantly reduced using a CSS sprite
implementation.&nbsp; They can’t be conveniently used for all images, but nonetheless the the
number of images to be downloaded could easily be reduced from 40 to 20.&nbsp;</p> </li>
<li>
<p><b>[Enhancement]</b> Image tags should have their explicit dimensions set in the document
HTML.</p> </li>
<li>
<p><b>[Enhancement]</b> By allowing the downloading of static files from a configurable second
site, e.g.&nbsp; <b>static.phpBB.com</b> (and this might just be a DNS synonym for forum site),
(i) this avoids having to upload these cookies on all static references, (ii) the browser will
open extra TCP channels to this site to improve download concurrency.</p> </li></ul> 
<h2>Programmatic cache negotiation </h2> 
<p>There are three aspects to ensuring that PHP scripts carry out optimum negotiation with the
client browser:</p> <ul> 
<li>
<p>Ensure that the script emits the correct <tt><b>Expires</b></tt> and <tt><b>Cache-Control</b></tt> directives
to let the browser know when it can safely cache content/</p> </li>
<li>
<p>Generate <tt><b>Etag</b></tt> or <tt><b>Last-Modified</b></tt> headers to facilitate negotiation
when the browser needs to revalidate content.</p> </li>
<li>
<p>If the browser has previously cached content with either or both of these headers, it will
attempt to revalidate content by supplying <tt><b>If-None-Match</b></tt> or <tt><b>If-Modified-Since</b></tt> request
fields. Unfortunately the CGI 1.1 specification does guarantee that these will be supplied as,
but if <tt><b>HTTP_IF_NONE_MATCH</b></tt> and <tt><b>HTTP_IF_MODIFIED_SINCE</b></tt>, are supplied
then the script can use these to bypass unnecessary processing and return a <b>304 status code</b> (Not
Modified).</p> </li></ul> 
<p>What still constrains performance is the server-side internal overheads of responding to
this request, and this will be the subject of my next article.</p> 
]]></content:encoded>
	</item>
</channel>
</rss>

