Why ZFS L2ARC is probably not a good fit for our setup

August 26, 2016

Back when I wrote up our second generation of ZFS fileservers, I mentioned in an aside that we expected to eventually add some L2ARCs to our pools. Since then, I've started to think that L2ARCs are not a good fit for our particular and somewhat peculiar setup. The simple problem is that as far as I know, there is no such thing as a L2ARC that's shared between pools.

The easy and popular way to think about L2ARC is right there in its name; it's a second level cache for the in-RAM ARC. As with traditional kernel buffer caches, the in-RAM ARC is a shared, global cache for all ZFS pools on your system, where space is allocated to active data regardless of where it comes from. If you have multiple pools and one pool is very hot, data from it can wind up taking up most of the ARC; when the pool cools down and others start being active, the ARC shifts to caching them instead.

L2ARC doesn't behave like this, because a given L2ARC device can't be shared between pools. You don't and can't have a global L2ARC, with X GB of fast SSD space that simply holds the overflow from the ARC regardless of where that overflow came from. Instead you must decide up front how much of your total L2ARC space each pool will get (and I believe that how much L2ARC space you have in total has an impact on how much RAM will get used for L2ARC metadata). A hot pool cannot expand its L2ARC usage beyond what you gave it, and a cool pool cannot donate some of its unneeded space to a hot pool.

My impression is that many people operate ZFS servers where there are only one or two active pools (plus maybe the system pool). For these people, a per-pool L2ARC is effectively the same or almost the same as a global L2ARC. We are not in this situation. For administrative reasons we split different people and groups into different pools, which means that each of our three main fileservers has nine or ten ZFS pools.

As far as I can see, adding a decent sized L2ARC for each pool would rapidly put us over the normal recommendations for total L2ARC size (we have only 64 GB of RAM on each fileserver). Adding a small enough L2ARC to each pool to keep total L2ARC size down is likely to result in L2ARCs that are more decorative than meaningful. And splitting the difference probably doesn't really help either side; we might well wind up with excessive RAM use for L2ARC metadata and L2ARCs that are too small to be really useful. If we're going to spend money here at all, it would probably be more sensible and useful to add more RAM.

(RAM costs more than SSDs, but it's automatically global and thus balanced dynamically between pools. Whatever hot data needs it just gets it, no tuning required.)

All of this leads me to the conclusion that L2ARC is probably not a good fit for a situation like ours where you have a bunch of pools and your activity (and desire for L2 caching) is spread relatively evenly over them all. You can maybe make it work and it might improve things a bit, but the effort to performance increase ratio doesn't seem likely to be all that favorable.

(This old entry of mine has some information on L2ARC metadata memory requirements, although I don't know if things have changed since 2013.)

Written on 26 August 2016.
« The single editor myth(ology)
My logic of blocking certain sorts of attachments outright »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Aug 26 23:25:54 2016
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.