What is it?
The Blowout Preventer is an HTTP proxy that sits in front of a website’s load-balancer which:
- Ensures the website can never be overloaded by requests
- Guarantees all users a maximum response time
- Dynamically controls concurrency for optimum throughput
What does it do?
The Blowout Preventer is a proxy which accepts all incoming requests to a website, forwards those requests on to the website and relays back the responses to users. It dynamically controls the number of requests being concurrently processed by a website to achieve maximum throughput of requests. Just like ABS brakes on your car use feedback to limit the amount of force on the brakes to achieve maximum braking effect and prevent instability, the Blowout Preventer monitors the request completion rate to determine optimum concurrency.
When limiting concurrency, arriving requests are put into a queue and forwarded on to the website once the concurrency drops below the limit. However, before a request is placed into the queue, the Blowout Preventer calculates how long that request would have to wait in the queue, and if it is longer than a set maximum, then a static response is immediately sent saying “Sorry, we’re busy, please try again in a few minutes.”. This is much better than letting users wait and wait, or even worse, making them hit “refresh” and send another request.
As concurrency is limited to the optimum level, a website can never be knocked offline by being tipped into instability by a flood of requests – hence the name!
Who is it for?
We’ve all been there before: you build a website, users arrive, it gets slower and slower as more people use the site until it becomes unusable and might even crash. So what do you do? Bigger hardware? Optimise the code? Usually a combination of both is required. Then you notice the traffic spikes, the ones which might happen just once a month or just very occasionally when someone posts a link to your site on Reddit. Do you really want to invest in hardware and software for an event which might happen for just a few minutes, once in a blue moon? Do you really want to take the risk that your website could crash if your latest marketing campaign is more successful than you thought? What if you simply cannot scale the site fast enough to cope with the growing traffic? Even if you build a website which can scale to handle truly massive concurrency, you might well achieve better response times by limiting concurrency, as we’ll see later.
The root-causes vary from site to site, but ultimately it boils down to the number of requests being processed at the same time – concurrency. As the concurrency increases, the time needed to process each requests remains constant, up until a point, after which the time to process each request begins to increase, ultimately leading to a decrease in the overall rate at which requests are processed. Imagine filling a bucket with a hole in the bottom. Providing the rate at which the water pours out the bottom is greater than or equal to the rate at which water is poured into the bucket, the level of water inside the bucket will not increase. The same is true for websites – providing requests can be served at a faster rate than they arrive, then concurrency will not increase. However, if requests cannot be served quicker than they arrive, then the concurrency will increase, just like the level of water in the bucket. The increased concurrency exacerbates the problem by causing the time needed to process each request to increase, further reducing the rate at which the server can process requests. Ultimately this positive feedback leads to system instability and potential crashes – a blowout. This is what happens when your site gets swamped by users.
Clearly there is a need to prevent concurrency exceeding this critical maximum. Even if your site has plenty of headroom, a sudden influx of requests could knock you offline instantly. It also needn’t be a particularly large volume of users, imagine that a relatively small number of users just happen to each initiate a highly intensive operation, such as a large database update. This would effectively render the database offline, other users would then wait for it to become free and cause requests to quickly clog the system, possibly tipping it into instability.
As the Blowout Preventer constantly monitors request throughput and concurrency, it can detect such changes immediately and take action automatically before problems occur.
This is the critical difference between the Blowout Preventer and passive monitoring solutions.
Concurrency limiting not only prevents blowouts, but also ensures your site is operating at its maximum efficiency at any moment, based on the current demands of the users. I’ll try and illustrate this with an example. Below is a graph showing how the throughput (requests per second) of one particular page of a website varies with increasing concurrency:
Apart from a couple of anomalies at concurrencies of 24 and 30, we can see that the throughput rises linearly with concurrency up until a concurrency of about 8, after which the throughput tends to level off. So we can assume that this particular page of this website scales well and we can quite happily allow the concurrency to go right up to 30 and maybe even beyond – right? Well, let’s see what happens now if we superimpose the average request times on the graph. The request time is the time taken for each request, i.e. the time users have to wait for the page.
Again ignoring those two anomalies, we can see that up to a concurrency of 5, the request times remain constant with increasing concurrency. After this point, request times increase linearly with concurrency. You’ll notice that there is a point after which request times increase with increasing concurrency yet there is no noticeable increase in throughput. Surely it would be better to limit the concurrency before this happens? Absolutely it would, and that’s exactly what the Blowout Preventer does.
Adaptive concurrency control
No doubt you will have noticed that the above examples are very simplistic. Clearly the optimum concurrency for requests purely for static content will be much greater than requests instigating complex database write operations. So a one-size-fits-all approach to concurrency limiting isn’t going to work, we’d either have to set it so low and risk needlessly incapacitating our site some of the time, or set it too high and risk being pushed into instability by chance that all the requests want to do something terribly complicated. This is where the Blowout Preventer’s adaptive concurrency control algorithm comes in. If we model a website as a black box which processes HTTP requests, then for any given average of concurrent requests, there will be an optimum concurrency: if most requests are for static content then it will be high, and conversely if most of the requests initiate complicated calculations then it will be low. As the Blowout Preventer is a proxy in front of this black box, it keeps statistics on request times, concurrency and throughput and constantly adjusts the concurrency to what it calculates to be the optimum.
I started this project about three years ago and spent the first two years working on the adaptive concurrency algorithm. Using mathematical models built in Octave, I was able to simulate almost real-life conditions to refine the algorithm. This year I began building the proxy and now I am almost ready for the first tests. The proxy itself is written in C and uses an event-driven architecture, so handling massive numbers of concurrent TCP connections is no problem. As well as this, it is multithreaded with one thread for each CPU core to ensure that the Blowout Preventer cannot become a bottleneck itself!
This article has described what the Blowout Preventer is – an HTTP proxy which sits in front of a website, dynamically limiting concurrency to ensure the system never becomes unstable, to ensure optimum efficiency and guarantee that all users get a response within a specified waiting time.
By writing this article, I hope to find out what the level of interest is in the project, if anyone would be interested and willing to test an early version and also if there are any comments on the idea of adaptive concurrency limiting.
You can keep up-to-date with news and progress on Twitter: