Archive

Posts Tagged ‘MySQL’

Scalable MySQL: Avoid offset for large tables

February 27th, 2010 Nick 1 comment

It’s fairly common to need to iterate through all the rows in a given table and perform an action on each. The usual way to do this is fetch rows from the database in batches using LIMIT, specifying an offset and number of rows to be returned.

SELECT * FROM `myBigTable` LIMIT :OFFSET, :ROW_COUNT;

After each batch of rows is processed, the offset is increased by the size of the batch and the process is repeated until all rows have been processed.

Now that’s not exactly rocket science, but there is a problem – as the offset increases, the time taken for the query to execute progressively increases, which can mean processing very large tables will take an extremely long time. The reason is because offset works on the physical position of rows in the table which is not indexed. So to find a row at offset x, the database engine must iterate through all the rows from 0 to x.

The obvious solution would be to use the primary key instead of offset:

SELECT * FROM `myBigTable` WHERE `id` > :OFFSET LIMIT :BATCH_SIZE;

The problem here is that MySQL doesn’t guarantee that rows appear in the table in the same order as the primary key, so you could end up processing some rows twice, and worse still, some rows not at all. Luckily this can be easily solved by using ORDER BY:

SELECT * FROM `myBigTable` WHERE `id` > :OFFSET ORDER BY `id` ASC;

This might seem a bit counterintuitive using ORDER BY when we want to speed things up, but remember we’re ordering on a uniquely indexed column which will be fast anyway, and compared to the alternative of iterating through each row using an offset, it’s a huge improvement.

Conclusion

The general rule of thumb is “never use offset in a limit clause”. For small tables you probably won’t notice any difference, but with tables with over a million rows you’re going to see huge performance increases.

Remember, this doesn’t just apply to iterating through tables but whenever offset is used. For example implementing pagination of large tables, if the first few pages load quickly but subsequent pages load progressively slower then it’s likely this is the cause (or you are missing some indices!)

Lost connection to MySQL server during query

August 11th, 2009 Nick No comments

A member of the team came to me the other day with a MySQL error that I’d not seen before:

Lost connection to MySQL server during query

The MySQL manual (http://dev.mysql.com/doc/refman/5.0/en/gone-away.html) says that it can be caused by:

applications that fork child processes, all of which try to use the same connection to the MySQL server

Which was exactly our case as we were creating a benchmarking tool that created child processes to concurrently query the website.   So we checked the code to make sure that no database connections were shared between processes, and also that all connections were closed immediately after being used… but we still got the same error.

After a few minutes of scratching our heads, it dawned on us that we were using persistent connections, and of course when connection objects are destroyed the actual connections are not closed but are returned to a pool of connections, ready for re-use.

We changed the connections to non-persistent and Bob was our proverbial uncle!

Categories: MySQL Tags: , , , ,