Unsolvable Problems

Occasionally I come across problems which can’t seem to be fixed.  Often there’s a way to workaround the problem so that it doesn’t have to be solved.  Sometimes, though, I just have to leave it unsolved and put up with the imperfection.  Here is a great example.

StreetsblogSF runs using the same software (WordPress/lighttpd) and on the same hardware as Streetsblog NYC, DC, and LA.  Unlike the others, however, it occasionally takes a long time, up to a minute, to load in the browser.  Go to it now and see if this happens for you.  There doesn’t seem to be any pattern to when these long loads occur.  Most of the time the site loads in just a few seconds.

I’m not sure why StreetsblogSF behaves like this.  It is setup and configured the same as the other sites, has the same webserver settings, installed plugins etc.  I checked the server logs and there are no reports of anything unusual.  I watched the site load in firebug but got no additional information.

The sole thing that is different about SF is its data, so I set out to duplicate the problem on our dev server (which uses apache).  I tarred up and copied over all the files and dumped and loaded the dev database.  I then used a testing service called WebSitePulse to test the rig over three days, making a request for the home page at 5 min intervals.  I did this as well for StreetsblogSF and StreetsblogNYC.  Here are the results:

By counting the number of responses that took more than 5 seconds over three-day periods, StreetsblogSF, at 13, had much more than NYC at 2 and the dev rig at 4.

I suspected that it might be due to some interaction with supercache, our caching plugin, on SF that was causing the problem.  So I turned off supercache for three days to test:

Over this three-day period, StreetsblogSF had 87 responses over 5 seconds.  I also performed some other tests.  You can see all the results here.
From this we can make a few observations:
  • StreetsblogSF only produces a slow response when a request hits the database; this is heavily suggested by increases in slow responses when supercache is off
  • the system doesn’t hang every time we hit the database
  • the files aren’t the problem; or our dev rig would have exhibited similar behavior
  • the data is not the problem; or our dev rig would have exhibited similar behavior
  • the way SF is configured on our production server is not the problem; since StreetsblogNYC and the others are configured the same way

So my question is, what is the cause of this problem?  What is causing StreetsblogSF to intermittently have really slow response times while all the other Streetsblogs perform fine?

Update: just now Evan discovered that the lighttpd configuration file on our production server had two duplicate blocks configuring StreetsblogSF.  We will delete one of these and redo the test.  Stay tuned!

Comments are closed.