Blowout Preventer :: non-technical white paper


Consider a car, generally speaking the efficiency (i.e. miles per gallon) increases with speed, up to a point, after which the efficiency drops off. The same is true for web applications, or indeed any applications supporting concurrent processing. Generally, the number of requests that can be served per second increases with the number of concurrent requests, up to a point, after which it decreases.


As more people use a site, the number of concurrent requests being handled by the server increases. The response time remains relatively stable up to a point, let’s say RC, after which the response time goes up and eventually everything grinds to a halt.


A properly set-up server will hold all the information it needs to serve pages in memory and so the greatest factor affecting the number of concurrent requests it can serve is the processor. If a processor has, let’s say 16 cores, it will be able to effectively handle up to 16 concurrent requests, after which the throughput will decrease due to overheads caused by handling the threads.

For application servers this is easily resolved by buying more servers and distributing the requests between them using a load balancer. If you have a database then scaling becomes more complicated and the database generally will be the bottleneck in your system.

A benchmark published on clearly show how the throughput of a MySQL database varies with the number of concurrent requests on a 16-core server:



As the database is generally always the bottleneck, we need to ensure that the number of concurrent requests (R) does not exceed the critical number RC. By looking at the above graph, we could satisfy the requirement by limiting the maximum number of concurrently processed HTTP requests to the number of CPU cores:

Rmax = RC

This basic form of controller is an open-loop control whereby it does not monitor the output to ensure the system is operating correctly.

An example of open-loop controller is a coffee machine. Providing the cups are all the same size, it will fill them with coffee. If the cup is too big then the cup will not be full, if the cup is too small then it will overflow.

Open-loop control is suitable for well-defined systems, operating under constant conditions or not required to adapt to change. Consider tuning a guitar; the tuning pegs are turned until then string tension is such that each string resonates at the required frequency. Providing nothing changes, the strings will remain in tune. If the strings heat up, the string tension is reduced and now resonate at lower frequencies.

Providing the graph holds under all circumstances then open-loop control would suffice. Unfortunately the test above is not indicative of the general case but rather an unrealistic ideal. Although the sysbench test includes read and write operations, it should be noted that the test was done using a solid-state HDD which of course can handle multiple concurrent IO operations. Normal HDDs support only one thread and so operation which need to use the HDD, write operations or read operations for data not cached in RAM, will have an adverse affect on concurrent performance.

The crux of the problem

Generally, most database operations are read operations and, providing the database is configured optimally, it will not touch the HDD. In this case we can get optimum concurrent performance, as shown in the graph. If the application is designed correctly, most HTTP requests will not require the database at all, thus removing the database bottleneck and so concurrent performance will be determined at the application layer. As mentioned earlier, the application layer can be distributed across multiple machines, achieving still greater concurrent performance.

As such, the maximum number of concurrent requests the system can optimally handle varies on the nature of the requests – just as the ability of the coffee machine to fill a cup depends on the size of the cup. Clearly, the open-loop control mechanism described earlier is not sufficient.

Feedback is required in order to regulate the number of concurrent requests by monitoring the system’s ability to process them. This is known as a closed-loop controller. The output of the system is fed back to a controller where it is compared with the required value, the result of which is used to regulate the input to the system in order to bring the output closer to the required value.


It is proposed to create an HTTP proxy which regulates the rate at which it proxies requests to the web application. Feedback is to be used to restrict the the number of concurrent requests to ensure maximum throughput for the given load in terms of both volume and nature.

Requests which cannot be immediately processed are to be either held in a queue or returned immediately with a basic HTML page informing the user that the request cannot be processed at the moment and that they should try again soon.

Peaks, by their very nature are short lived affairs and by ensuring that the server is always operating at its maximum capacity, the effect of the peaks is managed and reduced. Furthermore, the proxy will prevent a surge of requests from crashing the system – all users who wait for a request to return are guaranteed that they will not wait in vain, only to see “ERROR 500” – or, to use a technical term, to prevent a blowout.

Posted in High Scalability, MySQL | Tagged , , , , , , , | Leave a comment

Daemonize PHP (properly)

There is often a need when building a website to automate background processes, for example:

  • Sending emails: many sites don’t immediately connect to an SMTP server when a user fires off an email request, but rather store the email in the database which is processed by a background process, thus decoupling the frontend from the SMTP server. If the SMTP server goes down, your users can still register
  • Offline processing: let’s say your site has a social element to it, users have real-time news feeds that get updated whenever their friends do things on your site. By updating these feeds offline there’s no need for users to wait until all 1,000,000 of their friends’ feeds have been updated before completing a request to post a simple comment.

There are plenty more examples, in fact as a rule of thumb it’s best to do everything that doesn’t have to be done immediately on a HTTP request later in the background. It also helps manage peak traffic, when your site is overloaded, just don’t run the background tasks until the peak has gone. Ok, so news feeds aren’t bang up-to-date for a short period, but at least the site works. So how does one go about automating all these background tasks?


This is probably the first port-of-call; implement your background task in PHP, thereby enabling you to use the database abstraction layer and all the other libraries in your application, and set CRON to run the script every minute.

Great – everything works perfectly, that is until the data in your database grows so big that the tasks start taking longer than one minute and you end up with lots of tasks all running at the same time – and potentially all processing the same data! Ooops, the boss just got his registration email ten times.

So you could implement some sort of locking mechanism to prevent multiple instances but it’s all starting to get a bit fragile with lots of interdependencies – all you want to do is run a simple script!

Write a daemon in PHP

So it’s back to the drawing board – what is needed is a daemon that will sit in the background and run as script continually, restarting it immediately after it finishes. You could write a daemon in PHP – this would enable you to write a daemon which would interface well with your existing PHP scripts, there are even some good PHP packages which make writing a daemon in PHP easy.

So, full of enthusiasm you launch into writing your own daemon in PHP. A week or so later, after reading all about process control, (and probably pulling large chunks of hair out in the process as yo realise it wasn’t as simple as you thought), you finally have something that works. You deploy it to your website, sit back and feel smug.

After a week your boss rages into your office demanding to know why nobody has registered in the past week. Red faced, you go to check your PHP daemon that sends emails form the email queue is still running – and it isn’t. If you haven’t already been fired then your next step would probably be to find out what happened. Looking through your PHP error log you find a “FATAL ERROR” from your daemon script, dated exactly one week ago.

PHP can throw fatal errors for all sorts of reasons, and there’s no way to recover from them. Also, PHP is renowned for its rather flaky memory management – whilst this is fine for web scripts that render a page and then end, it makes it highly unsuitable to write applications.

Enter The Fat Controller

How can we solve the problem? We need to run PHP scripts as a daemon. I faced this problem one year ago and I solved it by writing The Fat Controller. It is a program written in C that runs a daemon and can continually run PHP scripts. As it is written in C, it is highly stable and can run for months or years without problem. As the daemon runs separately from the PHP scripts, no matter what happens in the PHP script, it does not affect the Fat Controller.

The Fat Controller is also very flexible, easy to install and configure and supports running multiple instances of a PHP at the same time to achieve parallel processing. You can even configure it to dynamically control the number of parallel processes dependent on how much work there is.

You can read more about The Fat Controller here

Posted in Linux, PHP | Tagged , , , , | Leave a comment

Programmatically set environment for Symfony project

I have recently been working on a web project built on Symfony 1.4. There are a team of developers, each has their own installation and we had the problem whereby each needed their own particular configuration settings. Symfony allows for multiple configurations to be specified using “environments” but the problem is how to get each installation to choose the correct environment without changing index.php – which of course would affect everyone else each time it was committed to Subversion.

The solution was to create a simple class that looks for the existence of a file, environment.cfg, in the configuration directory. The contents of the file specify which environment to use. This file is not committed to Subversion each developer can specify their own environment without affecting anything in Subversion.

We added a few lines to our index.php file like this:

// Load the class which chooses the environment

// Instantiate the class - specify the default environment to "live" (if there is no environment.cfg file)
$environment = new ApplicationEnvironment("live");

// Try and read environment.cfg

// Create a configuration object using the chosen environment
$configuration = ProjectConfiguration::getApplicationConfiguration('frontend', $environment->getEnvironment(), $environment->getDebugMode());

Note that the debug mode option is also set by the ApplicationEnvironment object. This is determined by the URL:

// in ApplicationEnvironment.class.php
    public function getDebugMode()
        $debugAddresses = array("");
        return in_array($_SERVER['SERVER_NAME'], $debugAddresses);

We could have used the environment.cfg file to specify this, for example on another line, but we also needed to specify debug mode on our live server so that if the site is accessed at it is not in debug mode, but if it is accessed via then it is in debug mode (of course is http password protected).


Next we had the problem of setting the environment. We could of course rely on each developer to put --env=my-environment but it would be much nicer to have this done automatically. The solution was to create a simple script that resides in the application root directory next to symfony.php that looks something like this:


$environment = new ApplicationEnvironment("live");

$envSet = false;

foreach ($argv as $i => $arg)
    if (substr($arg, 0, 6) == "--env=")
        $envSet = true;

if (!$envSet)
    $argv[] = sprintf("--env=%s", $environment->getEnvironment());

    $_SERVER['argv'] = $argv;
    $_SERVER['argc'] = $argc;


It’s a bit of a nasty mess, but it works at least.



Posted in PHP | Tagged , , , , , , | Leave a comment

svn: ‘’ was not present on the resource

I recently got this somewhat cryptic error message when trying to merge something in Subversion. In the root directory of my working copy, I ran svn merge, something along the lines of:

svn merge -c 1234 ./

Only to get:
svn: '' was not present on the resource

I had absolutely no idea what it was on about, and a fair bit of Googling turned up nothing. Then, just for fun, I tried going one level up from my working copy and trying again:

cd ../
svn merge -c 1234 ./myworkingcopy/

And lo and behold – it worked. I haven’t the foggiest idea why, but it does. Any ideas anyone?

Using SVN v1.6.6

Posted in Linux | Tagged , , , , | Leave a comment

cast from pointer to integer of different size – cast to pointer from integer of different size

On a 32-bit system my code compiled without any problem. On a 64-bit system I get the warnings:

warning: cast from pointer to integer of different size
warning: cast to pointer from integer of different size

It turned out the problem was how I was passing an integer to a new thread using pthread_create(). The final argument is a void pointer which can be used to pass a pointer to some data. In my case I was casting an integer to a void pointer in the parent thread, and then re-casting it back to an integer inside the thread.

On a 32-bit system both void pointers and integers occupy 32 bits so everything was ok. On my 64-bit system, integers are still 32-bits but void pointers are 64-bits.

One solution use long instead of int, which are, according to the C99 standard, the same size as void pointers on 32-bit and 64-bit systems.

The proper solution is not to cast integers to pointers in the first place but use pointers throughout:

    void hello(void *i)
        int *pointer;
        /* Cast the void pointer to a integer pointer */
        pointer = (int *) i;
        /* Display the value pointed to by the pointer */
        printf("Value is: %d\n", *pointer);

    int main()
        int value, *pointer;
        /* Assign a value */
        value = 11;
        /* Display the original value */
        printf("Value is: %d\n", value);
        /* Set the pointer to the location of the value */
        pointer = &value;
        /* Call a function, cast the pointer to a void pointer */
        hello((void *)pointer);
        return 1;

Of course it should be taken into account that when using threads there is the problem that the value referenced by the pointer could be changed by any of the threads.

Posted in C | Tagged , , , , , , | 2 Comments

Redirect and append STDOUT and STDERR to a log file

Redirecting STDOUT to a file is easy:

command > foo.log

Redirecting STDERR to a file is also easy:

command 2> foo.log

Redirecting both STDOUT and STDERR to a file is also fairly straightforward:

command &> foo.log

But what if we want to append to the log rather than truncating it each time we write? This is done for STDOUT by using the >> operator, and for STDERR by using 2>>. Unfortunately &>> does not exist, so how do we append both STDOUT and STDERR?

The solution is to redirect STDERR to STDOUT, and then redirect STDOUT to a file using the append operator:

command 1>> foo.txt 2>&1

Posted in Linux | Tagged , , , , , , | Leave a comment

Multitasking: PHP in parallel

Update: I have recently solved the issue of multitasking in PHP with a standalone application, The Fat Controller, which handles multitasking for you. Read more here.

Like most websites, periodically sends out an e-newsletter to its user-base.    This is done by a simple PHP script that opens a socket to a mail server and goes through the user database, sending an email to each one.    This approach has worked fine until recently, the number of users has gone up dramatically and now it takes an infeasibly long time to send emails to all our users.

After optimising the script as much as possible, the only alternative option was to run many instances of the script simultaneously.   PHP doesn’t support multithreading, and although PHP does support POSIX style process control ( ) I decided to create a wrapper in C that would handle multiple instances of PHP.   (I haven’t found anything technically wrong with PHP’s process control, I just needed something stable, reliable and potentially able to mulitask other scripts such as Python).

How it works

The wrapper I made (see below for code listing) is fairly simple and works by creating a series of threads, each of which forks and creates a PHP instance (replacing the child process) that runs the emailing script.   The thread from the parent process waits for the PHP process to finish and then ends itself.   Once a thread ends, the main thread creates another one, thereby maintaining a constant number of threads.   Signal handling is handled by a separate thread and can either tell the main thread to stop creating new threads and thereby safely shutdown the program or directly terminate all child processes and thus abruptly end the program.

Almost done

So now I had a way of running any number of PHP processes simultaneously and was almost ready to send emails at lightning speed, however one problem remained.   Once an email has been sent to a user, the database is updated so that I know which user has received which email and thus prevent me from sending the same user the same email more than once.   The problem is; what if one instance of the PHP script reads a user, and before it has time to send them the email and update the database, another PHP instance reads the same user and also sends them the email?   One way to solve this would be through locking the database table before reading and updating each row, however this would slow the database down which is not ideal.

The solution I came up with was to limit the number of simultaneous PHP instances to 8 and have the wrapper send each PHP process a unique instance ID from 0 to 7.   When selecting users from the database, the script looked at the 3 least significant bits (LSBs) of the IDs in the recipients table and selected only those rows whose 3 LSBs equated to the instance ID passed from the wrapper.

	if( (`id` | 1) = `id`, 1, 0) +
	if( (`id` | 2) = `id`, 2, 0) +
	if( (`id` | 4) = `id`, 4, 0)
) AS `IID`
FROM `recipients`
WHERE `sent` = 0

Just bind the instance ID passed from the wrapper into the query and each instance will now select recipients that are unique to that ID – no two concurrent instances can select the same recipients. Just don’t forget to update each row once each email has been sent.

Note: Using OFFSET in a limit clause is inefficient, for an alternative:

Finishing touches

We’re almost there, just one more little feature is required and the system will be perfect (sort of). If a thread process has nothing to do then it will return immediately, so we end up with a situation when thread processes are constantly created and destroyed which is not the most efficient use of resources. The solution was to check the exit status of the PHP process. If zero then all is ok, the thread ends and is available to be restarted immediately as normal. If however, the exit status is non-zero then the thread is marked as “sleeping” and cannot be restarted by the main thread for another 30 seconds.

If the PHP script doesn’t find any rows in the database to process then it returns 250 (a non-zero, non-reserved exit status) and so the calling thread in the wrapper sleeps for 30 seconds before trying again to see if there are any new items in the database to process.

This has the added advantage that if anything goes wrong in the PHP script, such as a fatal error, then the thread will sleep and you won’t end up with perpetuate thread cycling.

Code listing

Here is the code for the wrapper. The code should be fairly straightforward, with the above notes and the inline comments you should be able to figure out what’s going on – just don’t forget to compile with the pthread library. Comments are of course very welcome!


#define NUM_THREADS 8

pthread_mutex_t mutexSignal;
int slots[NUM_THREADS];
int handledSignal = -1;

void* signalHandler(void* arg);
void waitForThreads();
void killall();
void *task(void *i);

 * Each thread will do this
void *task(void *i)
    sigset_t signalSet;
    int iid;
    char *tid;
    int stat_loc;
    iid = (int)i;

    /* Say hello and show the thread number */
    printf("Thread %d: starting\n", iid);
    /* Spawn a child to run the program. */
    pid_t pid=fork();
    switch (pid)
        case 0:
            pthread_sigmask(SIG_UNBLOCK, &signalSet, NULL );
            tid = malloc(11*sizeof(char));
            sprintf(tid, "tid=%d", iid);
            char *argv[]={"php", "./sleep.php", tid, NULL};
            exit(EXIT_FAILURE); /* only if execv fails */
        case -1:
            printf("Thread %d: Fork failed\n", iid);
            slots[iid] = 0;
            /*  parent process */
            slots[iid] = pid;
            if (pid == waitpid(pid,&stat_loc,WUNTRACED)) /* wait for child to exit */
                printf("Thread %d: Child finished with exit code: %d", iid, WEXITSTATUS(stat_loc));
                if (WEXITSTATUS(stat_loc) != 0)
                    /* Sleep this thread, set available time to now+30s */
                    printf(" Going to sleep\n");
                    slots[iid] = -1 * (time(0) + 30);
                    /* Free this thread slot */
                    printf(" Returning to pool\n");
                    slots[iid] = 0;
                printf("Thread %d: Bad exit\n", iid);
                slots[iid] = 0;

    printf("Thread %d: Finished\n", iid);

    /* Terminate the thread */
    pthread_exit((void*) i);

void killall()
    int kv,i;
    for (i=0; i 0)
        /* Find an empty slot */
        for (i=0; i (-1 * slots[i]) )
                    /* Thread has slept long enough so let's wake it up */
                    slots[i] = 0;

        switch ( handledSignal )
            case -1:

            case 0:
                /* The case for signals we're not interested in */
                handledSignal = -1;

            case SIGTERM:
            case SIGQUIT:
                printf("Main: SIGQUIT\n");
                handledSignal = -1;
                running = -1;

            case SIGINT:
                printf("Main: SIGINT\n" );
                handledSignal = -1;
                running = -1;
        /* Sleep for a bit - all this thread creation is hard work! */



And here’s a sample PHP script (but you can of course have the wrapper call anything).

Exercises left to the reader

The above code is not really intended as production-ready, (although I have actually used it in production) and there are still plenty of loose ends that could do with tidying up. One thing you might want to do is detach the child PHP processes from the parent process’ output sockets, thereby effectively sending them to the background. For inspiration take a look at this basic daemonising program: A simple daemon in C.

Another nice addition might be to allow the child process to be specified in command line arguments to the wrapper. It would be useful to specify the sleep time for threads and also the maximum number of concurrent threads – this way you wouldn’t have to recompile each time you changed something.

If you find this useful or interesting then comments would be greatly welcomed!

Posted in C, PHP | Tagged , , , , | 1 Comment

Scalable MySQL: Avoid offset for large tables

It’s fairly common to need to iterate through all the rows in a given table and perform an action on each. The usual way to do this is fetch rows from the database in batches using LIMIT, specifying an offset and number of rows to be returned.


After each batch of rows is processed, the offset is increased by the size of the batch and the process is repeated until all rows have been processed.

Now that’s not exactly rocket science, but there is a problem – as the offset increases, the time taken for the query to execute progressively increases, which can mean processing very large tables will take an extremely long time. The reason is because offset works on the physical position of rows in the table which is not indexed. So to find a row at offset x, the database engine must iterate through all the rows from 0 to x.

The obvious solution would be to use the primary key instead of offset:


The problem here is that MySQL doesn’t guarantee that rows appear in the table in the same order as the primary key, so you could end up processing some rows twice, and worse still, some rows not at all. Luckily this can be easily solved by using ORDER BY:

SELECT * FROM `myBigTable` WHERE `id` > :OFFSET ORDER BY `id` ASC;

This might seem a bit counterintuitive using ORDER BY when we want to speed things up, but remember we’re ordering on a uniquely indexed column which will be fast anyway, and compared to the alternative of iterating through each row using an offset, it’s a huge improvement.


The general rule of thumb is “never use offset in a limit clause”. For small tables you probably won’t notice any difference, but with tables with over a million rows you’re going to see huge performance increases.

Remember, this doesn’t just apply to iterating through tables but whenever offset is used. For example implementing pagination of large tables, if the first few pages load quickly but subsequent pages load progressively slower then it’s likely this is the cause (or you are missing some indices!)

Posted in High Scalability, MySQL | Tagged , , , , , , , , | 8 Comments

SVN: Entry has unexpectedly changed special status

I just tried to commit my Working Copy when I got the error message:

svn: Entry 'jquery-latest.js' has unexpectedly changed special status

It turns out that someone had created a symlink to whatever is the latest version of jQuery, which apart from being a somewhat dubious solution, should be fine.   However, it seems a Windows user then checked out this file and somehow changed it; either directly or by some strange Windowsy magic.   Anyway, the changed file was then corrupted and the result is the rather cryptic message above.

I tried to remove the offending file:

svn remove jquery-latest.js

But that resulted in:

svn: 'jquery-latest.js' is in the way of the resource actually under version control

Which was simpy resolved by using the --force command:

svn remove --force jquery-latest.js

I recreated the symlink, committed it and it seems now to be ok.

Posted in Uncategorised | Tagged , , , , , | 2 Comments

Spinning command line cursor in Java

As a follow-up to my previous article on displaying a spinning cursor to display program activity, I have created a more useful example that could be adapted for use.

The original problem was this – I was writing a program that performed a particular task and I needed some way of showing the program was alive and functioning correctly whilst doing the task.

In Windows this is done by changing the mouse cursor into an hourglass, unfortunately on the command line we don’t have such GUI luxuries. One classic option is to create a spinning line by displaying | / – \ and so on.

The example below creates two child threads; one to do a particular job (in this case slowly count to 10) and another thread to display a spinning cursor.

 * Spinning cursor test class
 * @author Nick Giles
public class SpinTest
     * Thread which does the stuff we're interested in
    private Thread stuff;
     * Thread class which does stuff in the background
    private class Stuff implements Runnable
        public void run()
            int result = 0;
                // Slowly count to 10, although you can of course put whatever 
                // you want here, presumably something a bit more useful
                while (result < 10)
            catch (InterruptedException e)
                // If interrupted then quietly end the thread, you may well
                // want to handle this in some way
     * Spinner thread
    private class Spinner implements Runnable
        public void run()
            String[] phases = {"|", "/", "-", "\\"};
                while (true)
                    for (String phase : phases)
            catch (InterruptedException e)
                // No need to do anything if interrupted

     * Handles all shutdown functions
    public class ShutdownHandler extends Thread
        public void run()
            // On interrupt, stop doing stuff

    public void doStuff()
            // Attach an object to handle shutdown signals (i.e. ctrl+c)
            Runtime.getRuntime().addShutdownHook(new ShutdownHandler()); 
            // Create a new thread for spinning the cursor
            Thread spinner = new Thread(new Spinner());
            // Create a new thread for doing stuff
            this.stuff = new Thread(new Stuff());
            // Nice message to user
            System.out.printf("Doing stuff...  ");
            // Start doing stuff
            // Start spinning the cursor
            // Check the thread doing stuff is still doing stuff
            while (stuff.isAlive())
            // The thread doing stuff has finished, so stop spinning
            // Wait for the spinning thread to terminate
            System.out.println("The End.");
        catch (InterruptedException e)
     * Main method
    public static void main(String[] args)
        SpinTest test = new SpinTest();

It’s a fairly basic implementation but it should be fairly simple to extend it for a real-world application.

Posted in Java | Tagged , , , , , , , , , | Leave a comment