2023-04-25
Understanding ZFS ARC hit (and miss) kstat statistics
The ZFS ARC exposes a number of kstat statistics about its hit and miss performance, which are obviously quite relevant for understanding if your ARC size and possibly its failure to grow are badly affecting you, or if your ARC hit rate is fine even with a smaller than expected ARC size. Complicating the picture are things like 'MFU hits' and 'MFU ghost hits', where it may not be clear how they relate to plain 'ARC hits'.
There are a number of different things that live in the ZFS ARC, each of which has its own size. Further, the disk blocks in the ARC (both 'data' and 'metadata') are divided between a Most Recently Used (MRU) portion and a Most Frequently Used (MFU) portion (I believe other things like headers aren't in either the MRU or MFU). As covered in eg ELI5: ZFS Caching, the MFU and MRU also have 'ghost' versions of themselves; to simplify, these track what would be in memory if the MFU (or MRU) portion used all of memory.
The MRU, MFU, and the ghost versions of themselves give us our first
set of four hit statistics: 'mru_hits
', 'mfu_hits
',
'mru_ghost_hits
', and 'mfu_ghost_hits
'. These track blocks
that were found in the real MRU or found in the real MFU, in which
case they are actually in RAM, or found in the ghost MRU amd MFU,
in which case they weren't in RAM but theoretically could have been.
As covered in ELI5: ZFS Caching, ZFS tracks the hit rates of
the ghost MRU and MFU as signs for when to change the balance between
the size of the MRU and MFU. If a block wasn't even in the ghost
MFU or MRU, there is no specific kstat for it and we have to deduce
that from comparing MRU and MFU ghost hits with general misses.
However, what we really care about for ARC hits and misses is whether
the block actually was in the ARC (in RAM) or whether it had to be
read off disk. This is what the general 'hits
' and 'misses
'
kstats track, and they do this independently of the MRU and MFU
hits (and ghost 'hits'). At this level, all hits and misses can be
broken down into one of four categories; demand data, demand metadata,
prefetch data, and prefecth metadata (more on this breakdown is in
my entry on ARC prefetch stats). Each
of these four has hit and miss kstats associated with them, named
things like 'demand_data_misses
'. As far as I understand it, a
'prefetch' hit or miss means that ZFS was trying to prefetch something
and either already found it in the ARC or didn't. A 'demand' read
is from ZFS needing it right away.
(This implies that the same ZFS disk block can be a prefetch miss, which reads it into the ARC from disk, and then later a demand hit, when the prefetching paid off and the actual read found it in the ARC.)
In the latest development version of OpenZFS, which will eventually
become 2.2, there is an additional category of 'iohits
'. An
'iohit' happens when ZFS wants a disk block that already has active
IO issued to read it into the ARC, perhaps because there is active
prefetching on it. Like 'hits
' and 'misses
', this has the
four demand vs prefetch and data vs metadata counters associated
with it. I'm not quite sure how these iohits are counted in OpenZFS
2.1, and some of them may slip through the cracks depending on the
exact properties associated with the read (although the change
that introduced iohits
suggests that they may previously have been counted as 'hits
').
If you want to see how your ARC is doing, you want to look at the overall hits and misses. The MRU and MFU hits, especially the 'ghost' hits (which are really misses) strike me as less interesting. If you have ARC misses happening (which leads to actual read IO) and you want to know roughly why, you want to look at the breakdown of the demand vs prefetch and data vs metadata 'misses' kstats.
It's tempting to look at MRU and MFU ghost 'hits' as a percentage
of misses, but I'm not sure this tells you much; it's certainly not
very high on our fileservers.
Somewhat to my surprise, the sum of MFU and MRU hits is just slightly
under the overall number of ARC 'hits
' on all of our fileservers
(which use ZoL 2.1). However, they're exactly the same on my desktops,
which run the development version of ZFS on Linux and so have an
'iohits
'. So possibly in 2.1, you can infer the number of
'iohits' from the difference between overall hits and MRU + MFU
hits.
(I evidently worked much of this out years ago since our ZFS ARC stats displays in our Grafana ZFS dashboards work this way, but I clearly didn't write it down back then. This time around, I'm fixing that for future me.)