2021-04-14
Some things on ZFS (on Linux) per-dataset basic IO statistics
Sufficiently recent versions of OpenZFS on Linux have not just performance statistics for overall pool IO (also), but some additional per-dataset IO statistics. Conveniently, these IO statistics are exposed through the Prometheus host agent, so if you're using Prometheus (as we are), so you don't have to write something to collect and manipulate them yourself. However, what these statistics actually mean is a little bit underexplained.
(I believe these first appeared in ZFS 0.8.0, based on the project's git history.)
The per-dataset statistics appear in files in /proc/spl/kstat/zfs/<pool> that are called objset-0x<hex>. A typical such file looks like this:
28 1 0x01 7 2160 5523398096 127953381091805 name type data dataset_name 7 ssddata/homes writes 4 718760 nwritten 4 7745788975 reads 4 29614153 nread 4 616619157258 nunlinks 4 77194 nunlinked 4 77189
(For what the header means, see kstat_seq_show_headers() in spl-kstat.c.)
Paradoxically, the easiest fields to explain are the last two,
nunlinks
and nunlinked
. What these reflect is the number of
files, directories, and so on that have been queued for deletion
in the ZFS delete queue and the number
of things that have actually been deleted (they may start out at
non-zero, see dataset_kstats.h).
In many cases these two numbers will be the same, because you have
no pending deletes. In this
case, there are some files in my home directory that have been
deleted but that are still in use by programs.
The writes
, nwritten
, reads
, and nread
fields count the
number of writes and reads and the bytes written and read, but what
makes them complicated is what is and isn't included in them. I
believe the simple version is that they count normal user level
read and write IO performed through explicit system calls, starting
with read()
and write()
but probably including various other
related system calls. They definitely don't count internal ZFS IO
to do things like read directories, and I don't think they count
IO done through mmap()
'd files. However it appears that they may
include some IO to read (and perhaps write) ZFS xattrs, if you use
those extensively. It may not include user level IO that is performed
as direct IO; I'm not sure. This isn't documented explicitly and the
code is unclear to me.
I have no idea if these read and write statistics count NFS IO (I
have to assume that the nunlinks
and nunlinked
statistics do
count things deleted over NFS). Not counting NFS IO would make them
much less useful in our fileserver environment,
because we couldn't use it to find active filesystems. Of course,
even if these dataset statistics don't include NFS IO now (as of
ZFS 2.0.4 and an impending ZFS 2.1.0 release), they may well in the
future. If you're tempted to use these dataset statistics, you
should probably conduct some experiments to see how they react to
your specific IO load.
(Our fileservers are currently running Ubuntu 18.04, which has an Ubuntu version of 0.7.5. This is recent enough to have the pool level IO statistics, but it doesn't have these per dataset ones.)
Update: Based on some experimentation, the Ubuntu 20.04 version of ZFS on Linux 0.8.3 does update these per-dataset read and write statistics for NFS IO.