Using memcached to complement MySQL
Many websites store their data in a MySQL database. I would almost say that this is the defacto standard on the web. Access to this database is provided by the MySQL server service. Retrieving the data you need, in the format you need, can be an expensive (in terms of computational power and memory) process. Especially when your tables grow to many rows (100,000 or more) and your query need lots of joins. This can require a lot of disk i/o when your tables don’t fit in memory.
In dynamic websites most queries are issued many times, for example once for each visitor. The result of the query stays mostly the same, especially in websites that are not updated often. That’s most websites that are not user-content or community driven. When your websites receives a reasonable amount of visitors, your web server may suffer under the load of the many identical queries issued.
MySQL has tried to alleviate this problem by including the query cache feature in their MySQL server product. This cache will remember the results of each query and return them from cache if the underlying tables have not changed. This sounds like a great idea, but there are some problems:
- The query cache is enabled by default, but its size is set to zero. So effectively, it might as well have been disabled in the default configuration.
- If any of the underlying tables is modified (INSERT, UPDATE, DELETE or any other non-SELECT query) all cached results that belonged to that table are instantly discarded. This makes the query cache pretty much worthless in a write intensive environment. Changing a table can lead to massive cache purging.
- The query cache is on the same server as the MySQL server process, so the memory used for the query cache cannot be used for disk cache, the many InnoDB caches, or the MyISAM key cache.
- It doesn’t work for prepared statements or queries that start with anything other than “SELECT”.
- The query cache is not very fast.
There is an alternative to the query cache, memcached. Memcached is a service which only purpose it is to provide fast in-memory caching. Some advantages are:
- Memcached can be deployed on a separate server. One memcached server can cache the results of an entire MySQL pool.
- It can be used as a multi-purpose caching solution, and cache anything. It can handle huge amounts of memory, 128GB is no problem (if you need such huge caches).
- It’s super fast. Everything is stored in memory, and it’s non-blocking (no locks) and scales great. Facebook was able to scale a single server to serve 200,000 cache requests per second.
The main disadvantage is that you have to take care of cache invalidation yourself and that it adds any additional layer to you LAMP stack. It is possible to set up memcached in a high availability server environment, but this requires additional servers. If you require that caches cannot be lost in case of hardware failure, you need 2 – 3 times as many servers.
Another disadvantage is that you need physical servers if you want to use a reasonable cache size (> 4GB). Memory is ridicoulously expensive in virtual servers and in cloud services such as Amazon EC2. A Dell server can be upgraded to 128GB for just over $4,000. Amazon EC2 instances max out at 15GB.
As stated, cache invalidation is more difficult with memcached, but becomes really easy if you never change any object you wuld like to cache, but just create a new objects. For example, suppose you have an object that is modified. Then you would create a new object with the changed content, and update any references pointing to the old object to point to the new object. If an object is no longer requested from memcached, memcached will delete the object from its cache after an expiration time.
The only thing left then is caching the references to the objects. At iWink, we use a counter that we increment everytime the references are modified. When the counter changes, the old caches are invalidated. Because objects are separately cached, most of our caches stays valid and no performance hit is noticeable.
We have implemented memcached at iWink to help alleviate the load on the database servers. Memcached gets a cache hit / miss ratio of about 20 to 1, which is pretty good and has really helped us increase the speed of our web sites. Compared to other alternatives to improve the performance of our database stack such as using the query cache or adding MySQL slaves memcached has been a really great choice.
If you would like to read more about memcached at iWink, you can read our iWink Lab article about memcached (Dutch).
About this entry
You’re currently reading “Using memcached to complement MySQL,” an entry on Willem Stuursma
- Published:
- July 23, 2009 / 15:41
- Category:
- mysql
- Tags:
- linkedin, memcached, mysql, mysql performance, performance, php
No comments yet
Jump to comment form | comments rss [?] | trackback uri [?]