Some things on ZFS (on Linux) per-dataset basic IO statistics

April 14, 2021

Sufficiently recent versions of OpenZFS on Linux have not just performance statistics for overall pool IO (also), but some additional per-dataset IO statistics. Conveniently, these IO statistics are exposed through the Prometheus host agent, so if you're using Prometheus (as we are), so you don't have to write something to collect and manipulate them yourself. However, what these statistics actually mean is a little bit underexplained.

(I believe these first appeared in ZFS 0.8.0, based on the project's git history.)

The per-dataset statistics appear in files in /proc/spl/kstat/zfs/<pool> that are called objset-0x<hex>. A typical such file looks like this:

28 1 0x01 7 2160 5523398096 127953381091805
name             type data
dataset_name     7    ssddata/homes
writes           4    718760
nwritten         4    7745788975
reads            4    29614153
nread            4    616619157258
nunlinks         4    77194
nunlinked        4    77189

(For what the header means, see kstat_seq_show_headers() in spl-kstat.c.)

Paradoxically, the easiest fields to explain are the last two, nunlinks and nunlinked. What these reflect is the number of files, directories, and so on that have been queued for deletion in the ZFS delete queue and the number of things that have actually been deleted (they may start out at non-zero, see dataset_kstats.h). In many cases these two numbers will be the same, because you have no pending deletes. In this case, there are some files in my home directory that have been deleted but that are still in use by programs.

The writes, nwritten, reads, and nread fields count the number of writes and reads and the bytes written and read, but what makes them complicated is what is and isn't included in them. I believe the simple version is that they count normal user level read and write IO performed through explicit system calls, starting with read() and write() but probably including various other related system calls. They definitely don't count internal ZFS IO to do things like read directories, and I don't think they count IO done through mmap()'d files. However it appears that they may include some IO to read (and perhaps write) ZFS xattrs, if you use those extensively. It may not include user level IO that is performed as direct IO; I'm not sure. This isn't documented explicitly and the code is unclear to me.

I have no idea if these read and write statistics count NFS IO (I have to assume that the nunlinks and nunlinked statistics do count things deleted over NFS). Not counting NFS IO would make them much less useful in our fileserver environment, because we couldn't use it to find active filesystems. Of course, even if these dataset statistics don't include NFS IO now (as of ZFS 2.0.4 and an impending ZFS 2.1.0 release), they may well in the future. If you're tempted to use these dataset statistics, you should probably conduct some experiments to see how they react to your specific IO load.

(Our fileservers are currently running Ubuntu 18.04, which has an Ubuntu version of 0.7.5. This is recent enough to have the pool level IO statistics, but it doesn't have these per dataset ones.)

Update: Based on some experimentation, the Ubuntu 20.04 version of ZFS on Linux 0.8.3 does update these per-dataset read and write statistics for NFS IO.

Written on 14 April 2021.
« Getting NVMe and related terminology straight (for once)
Link: "a2d<C-V>3gE: Vim normal mode grammar »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Apr 14 23:30:54 2021
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.