2023-04-16
Some important ARC memory statistics exposed by ZFS on Linux (as of ZoL 2.1)
The ZFS ARC is ZFS's version of a disk cache, and ZFS on Linux reports various information about it in /proc/spl/kstat/zfs/arcstats. Some of this is information on how big the ZFS ARC is and wants to be, but other parts contain important information on how ZFS views the system's overall memory situation. The general meaning of this information is system independent (I believe it exists on FreeBSD and Illumons, as well as ZFS on Linux), but how it's determined and derived is system specific and I've only looked into the situation on Linux.
As covered, the critical ARC
size parameter for determining if it will grow, shrink, or stay the
same size is 'c
', also known as 'arc_c
', which is what the
ARC considers the overall target size. ZFS also exposes three memory
sizes, memory_all_bytes
, memory_free_bytes
, and
memory_available_bytes
. The 'all' number is how much total RAM
ZFS thinks the system has; the 'free' number is how much memory ZFS
thinks is free in general, and 'available' is how much memory ZFS
feels it has available to it at the moment, which can go negative.
If the 'available' number goes negative, the ARC shrinks; if it's
(enough) positive, the ARC can grow.
On Linux, the code that determines these is in arc_os.c.
On most Linux systems, the 'free' is the number of free pages plus
the number of inactive file pages, which are visible in /proc/vmstat
as nr_free_pages
and nr_inactive_file
. On all Linux systems,
the 'available' number is the 'free' number minus 'arc_sys_free
',
which is normally somewhat over 1/32nd of your total RAM and doesn't
get adjusted on the fly by ZFS. You can set this through the
zfs_arc_sys_free
parameter.
(The manual page says that arc_sys_free is normally 1/64th of RAM, but the actual code says 1/32nd plus stuff.)
Whether or not the ARC can grow at the moment is shown in
'arc_no_grow
', which is 1 if the ARC can't grow at the moment.
Generally, this will turn on and stay on if 'available' is less
than 1/32nd of 'arc_c
' (the 1/32nd bit is determined by
'arc_no_grow_shift
', which is an internal variable and so not
subject to tuning in ZFS on Linux). One implication of this is that
it's harder and harder for the ARC target size to grow toward its
maximum because you need more and more free memory as 'arc_c
'
gets larger and larger. On our ZFS fileservers with 192 GB of
RAM we set the maximum ARC size to about
155 GB, so at the top end we need the 'free' memory number to reach
over 10 GB. It looks like we have gotten there sometimes, but it
doesn't happen very often.
(Most of our fileservers also spend 80% to 90% of their time with
'arc_no_grow
' being 1.)
The situation for 'arc_no_grow
' is checked once a second, so
even without explicit memory pressure ARC growth will turn off when
'available' drops low enough; once 'arc_c
' is large, this may
be most of the time because of the minimum requirement above. If
'available' becomes negative (ie, if the 'free' memory drops below
'arc_sys_free
'), then ZFS will consider there to be a 'memory
pressure event' and ARC growth can't turn back on until at least
zfs_arc_grow_retry
seconds later, which defaults to five seconds. It's likely but not
certain that this will trigger the ARC target size shrinking.
If 'arc_need_free
' is non-zero, this means that ZFS on Linux
is in the process of trying to shrink the ARC by (at least) that
amount of bytes. This statistic is not used inside ZFS on Linux;
it purely exposes some state information, and I think it can be
zero even if the ARC is currently reclaiming memory.
Sidebar: The ARC's target size versus its actual size
It's entirely possible for the ARC to drop its memory usage without dropping its target size (for example, if you delete a big file that's been cached in the ARC, I think the ARC may drop the cached blocks for the file). Over the last week, our fileservers have had the target size be up to 40 GB more than the current size.
Differences the other way (when the target size is below the actual size) seem to be much smaller. Even going back four weeks, the largest shortfall is only a little bit over a GB. The obvious guess is that ZFS seems to be quite prompt at shrinking the ARC along side shrinking its target size.