PEAR Cache_Lite – efficient group cleaning

After using PEAR Cache_Lite for a while, we began to notice that as traffic increased the web servers spent more and more time thrashing their discs. On closer inspection we noticed that the servers were pretty much constantly parsing the entire cache directory structure.

Whenever you call Cache_Lite::clean() to remove a group of cached elements, it parses the entire cache directory structure looking for cache files which have the correct group hash in the filename. This was problematic for us because we stored a lot of data in groups, for example with messages – each page of messages for each user was stored in one group. Whenever someone sends a message, the system then deletes the cache group containing the recipient’s messages. As the cache directory structure increased in size, it took longer and longer to parse, and with increasing traffic the web servers were soon doing nothing but parsing the cache directory.

The solution I came up with was to prepend the name of each group with a number which was also cached. So when a request arrives for a cached item in group “messages”, the cache system looked for the cached group identifier number and prepended it to the group name, resulting in an internal group name like “1234_messages”.

The overhead is an extra cache “get”, but the advantage is that in order to expire a whole group you just have to increment the identifier number by one, (get, increment, save). So when the group is accessed again, the internal group name becomes “1235_messages”, which is not yet set, and so the application can regenerate the cache.

In my opinion this additional “get” is a price worth paying, especially as it’s a relatively very quick operation, and the time saved expiring a group is many times faster.

Finally

You might be thinking to yourself, “what about all those expired cache files just left on the disc?”. Well, we set a CRON job to run every day and delete all files older than 3 days. As none of the caches lasted longer than three days this was a safe duration.

In fact, we don’t use disc caching anywhere near as much as we did, now we use Memcached for most things, but for small and often used caches (such as IDs) we still use the disc cache as it’s by far the fastest.

This entry was posted in PHP and tagged , , , , , , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *