Consider a car, generally speaking the efficiency (i.e. miles per gallon) increases with speed, up to a point, after which the efficiency drops off. The same is true for web applications, or indeed any applications supporting concurrent processing. Generally, the number of requests that can be served per second increases with the number of concurrent requests, up to a point, after which it decreases.
As more people use a site, the number of concurrent requests being handled by the server increases. The response time remains relatively stable up to a point, let’s say RC, after which the response time goes up and eventually everything grinds to a halt.
A properly set-up server will hold all the information it needs to serve pages in memory and so the greatest factor affecting the number of concurrent requests it can serve is the processor. If a processor has, let’s say 16 cores, it will be able to effectively handle up to 16 concurrent requests, after which the throughput will decrease due to overheads caused by handling the threads.
For application servers this is easily resolved by buying more servers and distributing the requests between them using a load balancer. If you have a database then scaling becomes more complicated and the database generally will be the bottleneck in your system.
A benchmark published on mysqlperformanceblog.com clearly show how the throughput of a MySQL database varies with the number of concurrent requests on a 16-core server:
As the database is generally always the bottleneck, we need to ensure that the number of concurrent requests (R) does not exceed the critical number RC. By looking at the above graph, we could satisfy the requirement by limiting the maximum number of concurrently processed HTTP requests to the number of CPU cores:
Rmax = RC
This basic form of controller is an open-loop control whereby it does not monitor the output to ensure the system is operating correctly.
An example of open-loop controller is a coffee machine. Providing the cups are all the same size, it will fill them with coffee. If the cup is too big then the cup will not be full, if the cup is too small then it will overflow.
Open-loop control is suitable for well-defined systems, operating under constant conditions or not required to adapt to change. Consider tuning a guitar; the tuning pegs are turned until then string tension is such that each string resonates at the required frequency. Providing nothing changes, the strings will remain in tune. If the strings heat up, the string tension is reduced and now resonate at lower frequencies.
Providing the graph holds under all circumstances then open-loop control would suffice. Unfortunately the test above is not indicative of the general case but rather an unrealistic ideal. Although the sysbench test includes read and write operations, it should be noted that the test was done using a solid-state HDD which of course can handle multiple concurrent IO operations. Normal HDDs support only one thread and so operation which need to use the HDD, write operations or read operations for data not cached in RAM, will have an adverse affect on concurrent performance.
The crux of the problem
Generally, most database operations are read operations and, providing the database is configured optimally, it will not touch the HDD. In this case we can get optimum concurrent performance, as shown in the graph. If the application is designed correctly, most HTTP requests will not require the database at all, thus removing the database bottleneck and so concurrent performance will be determined at the application layer. As mentioned earlier, the application layer can be distributed across multiple machines, achieving still greater concurrent performance.
As such, the maximum number of concurrent requests the system can optimally handle varies on the nature of the requests – just as the ability of the coffee machine to fill a cup depends on the size of the cup. Clearly, the open-loop control mechanism described earlier is not sufficient.
Feedback is required in order to regulate the number of concurrent requests by monitoring the system’s ability to process them. This is known as a closed-loop controller. The output of the system is fed back to a controller where it is compared with the required value, the result of which is used to regulate the input to the system in order to bring the output closer to the required value.
It is proposed to create an HTTP proxy which regulates the rate at which it proxies requests to the web application. Feedback is to be used to restrict the the number of concurrent requests to ensure maximum throughput for the given load in terms of both volume and nature.
Requests which cannot be immediately processed are to be either held in a queue or returned immediately with a basic HTML page informing the user that the request cannot be processed at the moment and that they should try again soon.
Peaks, by their very nature are short lived affairs and by ensuring that the server is always operating at its maximum capacity, the effect of the peaks is managed and reduced. Furthermore, the proxy will prevent a surge of requests from crashing the system – all users who wait for a request to return are guaranteed that they will not wait in vain, only to see “ERROR 500” – or, to use a technical term, to prevent a blowout.