ZFS On Linux (still) needs better ways to control the ZFS ARC
Over on the Fediverse I said something about the ZFS ARC:
It has been '0' days since I wished for a way to directly set ZFS on Linux's 'arc_c' internal parameter for the target size of the ZFS ARC.
Why yes, our ARCs are still collapsing for mysterious reasons on our ZoL fileservers.
(I was going to say that 'collapsing' is a relative term, but on checking our metrics, we've seen some really remarkably low ARCs for fileservers with 192 GB of RAM. It looks like we had one drop as low as 56 GB in the past week.)
The ZFS ARC is ZFS's version of a disk cache. For various reasons, ZFS on Linux keeps its ARC separate from the kernel's regular disk caches that are used for other filesystems, and ZFS tunes the ARC size and other parameters separately, instead of the whole thing being integrated into the kernel's general memory tuning.
An important parameter for ARC sizing is arc_c, which is the target size for the ARC, as opposed to its current size. The ARC's current size may drop significantly under memory pressure, but it will grow back to arc_c given time. If arc_c also drops, the ARC will generally not grow its memory use very fast; first ZFS has to decide to raise arc_c, and then it has to have the ARC grow to that new size.
As you might guess from my Fediverse post, ZFS On Linux doesn't directly expose any way to set arc_c. If your ARC target size has collapsed down to ridiculously low numbers, there's no straightforward way to change it. Sometimes you can change the ZFS module parameter zfs_arc_max and this seems to give ZFS a kick; otherwise, there is at most the brute force and potentially dangerous approach of temporarily setting a high zfs_arc_min (which has the obvious side effect of raising arc_c to this new minimum value if necessary). However, historically setting zfs_arc_min has been dangerous.
In addition, there's an additional internal ARC variable of whether or not the ARC can grow; this is arc_no_grow in /proc/spl/kstat/zfs/arcstat. If this is '1', I'm not sure that having a high arc_c does you any good. This too is not something that you can control, and it's not even obvious how decisions are made about it.
ZFS On Linux having annoying issues with ARC size isn't a new issue; we had this problem on the 18.04 versions of our fileservers, and I've had it periodically on my desktop machines. Since the problems with ARC sizing keep not getting fixed in ZFS On Linux, I've come around to the idea that system administrators should at least have a hammer that we can use to tell ZFS On Linux that it's wrong and the ARC target size should really be 'X', for some X.
(Alternately, we at least need better documentation on all of the ARC related metrics and probably better metrics, so that we can understand what it did and why. Yes, I know, I have bpftrace and other eBPF tools so in theory I can instrument the kernel code. I should not have to be a system programmer here.)