<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>High Fibre Programming &#187; MySQL</title>
	<atom:link href="http://www.4pmp.com/category/mysql/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.4pmp.com</link>
	<description>PHP, MySQL, C, Java, Linux and other great after dinner speech topics</description>
	<lastBuildDate>Tue, 17 Jan 2012 09:10:14 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.5</generator>
		<item>
		<title>Blowout Preventer :: non-technical white paper</title>
		<link>http://www.4pmp.com/2011/02/blowout-preventer/</link>
		<comments>http://www.4pmp.com/2011/02/blowout-preventer/#comments</comments>
		<pubDate>Mon, 14 Feb 2011 22:27:06 +0000</pubDate>
		<dc:creator>Nick</dc:creator>
				<category><![CDATA[High Scalability]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[blowout preventer]]></category>
		<category><![CDATA[concurrency]]></category>
		<category><![CDATA[control]]></category>
		<category><![CDATA[feedback]]></category>
		<category><![CDATA[http]]></category>
		<category><![CDATA[proxy]]></category>
		<category><![CDATA[scalability]]></category>
		<category><![CDATA[threads]]></category>

		<guid isPermaLink="false">http://www.4pmp.com/?p=412</guid>
		<description><![CDATA[Background Consider a car, generally speaking the efficiency (i.e. miles per gallon) increases with speed, up to a point, after which the efficiency drops off. The same is true for web applications, or indeed any applications supporting concurrent processing. Generally, the number of requests that can be served per second increases with the number of [...]]]></description>
			<content:encoded><![CDATA[<h2>Background</h2>
<p>Consider a car, generally speaking the efficiency (i.e. miles per gallon) increases with speed, up to a point, after which the efficiency drops off.   The same is true for web applications, or indeed any applications supporting concurrent processing.   Generally, the number of requests that can be served per second increases with the number of concurrent requests, up to a point, after which it decreases.</p>
<h2>Problem</h2>
<p>As more people use a site, the number of concurrent requests being handled by the server increases.   The response time remains relatively stable up to a point, let&#8217;s say R<sub>C</sub>, after which the response time goes up and eventually everything grinds to a halt.</p>
<h2>Situation</h2>
<p>A properly set-up server will hold all the information it needs to serve pages in memory and so the greatest factor affecting the number of concurrent requests it can serve is the processor.   If a processor has, let&#8217;s say 16 cores, it will be able to effectively handle up to 16 concurrent requests, after which the throughput will decrease due to overheads caused by handling the threads.</p>
<p>For application servers this is easily resolved by buying more servers and distributing the requests between them using a load balancer.   If you have a database then scaling becomes more complicated and the database generally will be the bottleneck in your system.</p>
<p>A benchmark published on mysqlperformanceblog.com clearly show how the throughput of a MySQL database varies with the number of concurrent requests on a 16-core server:</p>
<p><a href="http://www.4pmp.com/wp-content/uploads/2011/02/sysbench_oltp.png"><img class="aligncenter size-medium wp-image-419" title="MySQL 5.5 Concurrent performance benchmark" src="http://www.4pmp.com/wp-content/uploads/2011/02/sysbench_oltp.png" alt="" /></a></p>
<p><strong>Reference:</strong> <a href="http://www.mysqlperformanceblog.com/2010/02/28/mysql-5-5-m2-scalability/">http://www.mysqlperformanceblog.com/2010/02/28/mysql-5-5-m2-scalability/</a></p>
<h2>Solution</h2>
<p>As the database is generally always the bottleneck, we need to ensure that the number of concurrent requests (R) does not exceed the critical number R<sub>C</sub>.   By looking at the above graph, we could satisfy the requirement by limiting the maximum number of concurrently processed HTTP requests to the number of CPU cores:</p>
<p style="text-align: center;">R<sub>max</sub> = R<sub>C</sub></p>
<p>This basic form of controller is an open-loop control whereby it does not monitor the output to ensure the system is operating correctly.</p>
<p style="margin-left: 26px;"><em>An example of open-loop controller is a coffee machine.   Providing the cups are all the same size, it will fill them with coffee.   If the cup is too big then the cup will not be full, if the cup is too small then it will overflow.<br />
</em></p>
<p>Open-loop control is suitable for well-defined systems, operating under constant conditions or not required to adapt to change.   Consider tuning a guitar; the tuning pegs are turned until then string tension is such that each string resonates at the required frequency.   Providing nothing changes, the strings will remain in tune.   If the strings heat up, the string tension is reduced and now resonate at lower frequencies.</p>
<p>Providing the graph holds under all circumstances then open-loop control would suffice.   Unfortunately the test above is not indicative of the general case but rather an unrealistic ideal.   Although the sysbench test includes read and write operations, it should be noted that the test was done using a solid-state HDD which of course can handle multiple concurrent IO operations.   Normal HDDs support only one thread and so operation which need to use the HDD, write operations or read operations for data not cached in RAM, will have an adverse affect on concurrent performance.</p>
<h2>The crux of the problem</h2>
<p>Generally, most database operations are read operations and, providing the database is configured optimally, it will not touch the HDD.   In this case we can get optimum concurrent performance, as shown in the graph.   If the application is designed correctly, most HTTP requests will not require the database at all, thus removing the database bottleneck and so concurrent performance will be determined at the application layer.   As mentioned earlier, the application layer can be distributed across multiple machines, achieving still greater concurrent performance.</p>
<p>As such, the maximum number of concurrent requests the system can optimally handle varies on the nature of the requests – just as the ability of the coffee machine to fill a cup depends on the size of the cup.   Clearly, the open-loop control mechanism described earlier is not sufficient.</p>
<p>Feedback is required in order to regulate the number of concurrent requests by monitoring the system&#8217;s ability to process them.   This is known as a closed-loop controller.   The output of the system is fed back to a controller where it is compared with the required value, the result of which is used to regulate the input to the system in order to bring the output closer to the required value.</p>
<h2>Proposal</h2>
<p>It is proposed to create an HTTP proxy which regulates the rate at which it proxies requests to the web application.   Feedback is to be used to restrict the the number of concurrent requests to ensure maximum throughput for the given load in terms of both volume and nature.</p>
<p>Requests which cannot be immediately processed are to be either held in a queue or returned immediately with a basic HTML page informing the user that the request cannot be processed at the moment and that they should try again soon.</p>
<p>Peaks, by their very nature are short lived affairs and by ensuring that the server is always operating at its maximum capacity, the effect of the peaks is managed and reduced.   Furthermore, the proxy will prevent a surge of requests from crashing the system – all users who wait for a request to return are guaranteed that they will not wait in vain, only to see “ERROR 500” &#8211; or, to use a technical term, to prevent a blowout.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.4pmp.com/2011/02/blowout-preventer/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Scalable MySQL: Avoid offset for large tables</title>
		<link>http://www.4pmp.com/2010/02/scalable-mysql-avoid-offset-for-large-tables/</link>
		<comments>http://www.4pmp.com/2010/02/scalable-mysql-avoid-offset-for-large-tables/#comments</comments>
		<pubDate>Sat, 27 Feb 2010 21:24:57 +0000</pubDate>
		<dc:creator>Nick</dc:creator>
				<category><![CDATA[High Scalability]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[large]]></category>
		<category><![CDATA[limit]]></category>
		<category><![CDATA[offset]]></category>
		<category><![CDATA[order]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[scalability]]></category>
		<category><![CDATA[table]]></category>

		<guid isPermaLink="false">http://www.4pmp.com/?p=263</guid>
		<description><![CDATA[It&#8217;s fairly common to need to iterate through all the rows in a given table and perform an action on each. The usual way to do this is fetch rows from the database in batches using LIMIT, specifying an offset and number of rows to be returned. SELECT * FROM `myBigTable` LIMIT :OFFSET, :ROW_COUNT; After [...]]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s fairly common to need to iterate through all the rows in a given table and perform an action on each.   The usual way to do this is fetch rows from the database in batches using LIMIT, specifying an offset and number of rows to be returned.   </p>
<pre class="brush: sql; gutter: false;">SELECT * FROM `myBigTable` LIMIT :OFFSET, :ROW_COUNT;</pre>
<p>After each batch of rows is processed, the offset is increased by the size of the batch and the process is repeated until all rows have been processed.</p>
<p>Now that&#8217;s not exactly rocket science, but there is a problem &#8211; as the offset increases, the time taken for the query to execute progressively increases, which can mean processing very large tables will take an extremely long time.   The reason is because offset works on the physical position of rows in the table which is not indexed.   So to find a row at offset x, the database engine must iterate through all the rows from 0 to x.</p>
<p>The obvious solution would be to use the primary key instead of offset:</p>
<pre class="brush: sql; gutter: false;">SELECT * FROM `myBigTable` WHERE `id` &gt; :OFFSET LIMIT :BATCH_SIZE;</pre>
<p>The problem here is that MySQL doesn&#8217;t guarantee that rows appear in the table in the same order as the primary key, so you could end up processing some rows twice, and worse still, some rows not at all.   Luckily this can be easily solved by using ORDER BY:</p>
<pre class="brush: sql; gutter: false;">SELECT * FROM `myBigTable` WHERE `id` &gt; :OFFSET ORDER BY `id` ASC;</pre>
<p>This might seem a bit counterintuitive using ORDER BY when we want to speed things up, but remember we&#8217;re ordering on a uniquely indexed column which will be fast anyway, and compared to the alternative of iterating through each row using an offset, it&#8217;s a huge improvement.</p>
<h4>Conclusion</h4>
<p>The general rule of thumb is &#8220;never use offset in a limit clause&#8221;.   For small tables you probably won&#8217;t notice any difference, but with tables with over a million rows you&#8217;re going to see huge performance increases.</p>
<p>Remember, this doesn&#8217;t just apply to iterating through tables but whenever offset is used.   For example implementing pagination of large tables, if the first few pages load quickly but subsequent pages load progressively slower then it&#8217;s likely this is the cause (or you are missing some indices!)</p>
]]></content:encoded>
			<wfw:commentRss>http://www.4pmp.com/2010/02/scalable-mysql-avoid-offset-for-large-tables/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Lost connection to MySQL server during query</title>
		<link>http://www.4pmp.com/2009/08/lost-connection-to-mysql-server-during-query/</link>
		<comments>http://www.4pmp.com/2009/08/lost-connection-to-mysql-server-during-query/#comments</comments>
		<pubDate>Tue, 11 Aug 2009 20:32:23 +0000</pubDate>
		<dc:creator>Nick</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[connections]]></category>
		<category><![CDATA[error]]></category>
		<category><![CDATA[fork]]></category>
		<category><![CDATA[persistent]]></category>

		<guid isPermaLink="false">http://blog.4pmp.com/?p=21</guid>
		<description><![CDATA[A member of the team came to me the other day with a MySQL error that I&#8217;d not seen before: Lost connection to MySQL server during query The MySQL manual (http://dev.mysql.com/doc/refman/5.0/en/gone-away.html) says that it can be caused by: applications that fork child processes, all of which try to use the same connection to the MySQL [...]]]></description>
			<content:encoded><![CDATA[<p>A member of the team came to me the other day with a MySQL error that I&#8217;d not seen before:</p>
<p><em>Lost connection to MySQL server during query</em></p>
<p>The MySQL manual (<a href="http://dev.mysql.com/doc/refman/5.0/en/gone-away.html">http://dev.mysql.com/doc/refman/5.0/en/gone-away.html</a>) says that it can be caused by:</p>
<blockquote><p>applications that fork child processes, all of which try to use the same connection to the MySQL server
</p></blockquote>
<p>Which was exactly our case as we were creating a benchmarking tool that created child processes to concurrently query the website.   So we checked the code to make sure that no database connections were shared between processes, and also that all connections were closed immediately after being used&#8230; but we still got the same error.</p>
<p>After a few minutes of scratching our heads, it dawned on us that we were using <strong>persistent connections</strong>, and of course when connection objects are destroyed the actual connections are not closed but are returned to a pool of connections, ready for re-use.</p>
<p>We changed the connections to non-persistent and Bob was our proverbial uncle!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.4pmp.com/2009/08/lost-connection-to-mysql-server-during-query/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

