The cache eviction death spiral

October 18, 2010

Here's a cache behavior I've touched on in passing several times but I feel like covering explicitly now. It's what I've taken to call the cache eviction death spiral, where cache evictions start you down a path to grinding death.

Cache evictions of useful data have an obvious problem; they slow down the program when you have to pull the needed data back into cache. But in a contended cache under pressure from multiple sources, they have a second effect as well; because your program is now running slower, it is touching the data that it needs less often, 'cooling down' the data and making it more likely to be evicted from the cache in turn. Of course this then repeats, worse.

(Technically this only happens in caches that are based on some variety of least-recently-used or least-frequently-used. That would be pretty much all of them.)

This is the cache eviction death spiral. It can happen in any situation where the cache is being contended over (as opposed to the single process case where the cache just isn't big enough), and does not necessarily result in a winner; if you have enough sources of contention, they can all thrash each other to death, with everyone too busy stealing entries from other processes in order to get their own entries back into memory to do any real work.

An especially dangerous situation for a cache eviction death spiral is when some users of the cache have a much easier time keeping their data hot (or in general claiming cache space) than other users. This can create a brutal feedback cycle that rapidly more or less freezes the slow users out of the cache entirely (or at least confines them to whatever small portion the fast users don't need). Generally this results in them grinding to a complete halt (although performance monitoring may claim that they are very active).

If you're designing a cache in a system it pays to consider if you're susceptible to this issue, especially if your system can wind up starving some users. (I don't know if the literature has good ways around this, but if I was searching I would start with the operating system people; this crops up a lot in operating system caching.)

Written on 18 October 2010.
« Read IO is generally synchronous (unlike write IO)
A module that I want in the standard library: Parsing »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Oct 18 00:10:13 2010
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.