The state of getting per-pool IO statistics in ZFS on Linux as of version 2.1

July 23, 2022

Historically, ZFS on Linux has had two ways to get per-pool IO statistics. The 'zpool iostat' subcommand would report one set, obtained directly from the kernel, and ZFS also exposed a second set in /proc/spl/kstat/zfs/<pool>/io, which contained core useful information, although how to interpret them wasn't entirely clear). The /proc kstats were more or less fixed, but the set of 'zpool iostat' statistics has grown over time; you can get an idea of this growth by looking at the addition of extended statistics entries in a 'history' view of include/sys/fs/zfs.h.

(There are also per-dataset statistics.)

Because the /proc per-pool kstats were easy for programs to get at (you just had to read a file), my impression is that they were widely supported. In particular I know that the Prometheus host agent supports them. We used this for our Ubuntu 18.04 ZFS fileservers to build detailed ZFS dashboards in our monitoring system. Unfortunately, these kstats were removed in ZFS on Linux 2.1.0 for reasons that I will let you read about in the pre-2.1 commit message or the pull request (also the mainline removal commit). Ubuntu 22.04 ships with ZFS on Linux 2.1.2, so this is likely to start affecting people as they move to it.

This kstats removal has been somewhat awkward. The Prometheus host agent now has no more ZFS pool IO statistics, for example (and until bug #2068 is fixed it doesn't have any ZFS information at all). One reason that it's awkward for metrics programs to do better is that extracting the statistics that 'zpool iostat' uses requires either building with and linking to the libzfs code or writing an implementation of it yourself.

Another issue is that in my opinion the existing 'zpool iostat' statistics are incomplete. Two omissions are that the available latency information doesn't include a true cumulative wait time and that there's no information about the amount of time one or more IOs were in flight, which is crucial for several derived metrics like the average queue size and utilization.

(Zpool iostat prints summary averages that would seem to require having the cumulative time available, but it estimates it from the midpoint of histogram buckets (also).)

To get statistics today you have a number of options. First, you can pick through the zpool iostat manual page and run it by hand to generate copious output, although this doesn't give you access to all IO statistics available (for example, there are size histograms that aren't currently available). If you want things in a metrics system, the ZFS on Linux project provides zpool_influxdb as an official InfluxDB format metrics exporter, but when I looked at it I didn't really like using it with Prometheus. There's a native Prometheus zfs_exporter project, but it's explicitly marked experimental and in my opinion needs a number of changes to make it truly useful (for example, in its current state it only provides per-vdev statistics, although it's easy to hack the code a bit to report per-pool stats too).

(I have a hacked up version that I may put in public at some point; I've used it to create a usable Grafana dashboard for an experimental Ubuntu 22.04 fileserver we're working on.)

I haven't looked to see if there are any 'top'-like programs that can produce live IO statistics for the current state of affairs. Since 'zpool iostat's histogram output is so potentially overwhelming (especially if you want to drill down to per-disk details), it could be helpful to have something that let you navigate around it.

If you want per-dataset statistics from the Prometheus host agent, you'll currently have to hand-modify a version and build it yourself (as of version 1.3.1). Hopefully this issue will be fixed at some point, but in general the host agent seems to only get slow updates even in the face of known issues with fixes integrated (eg).

Written on 23 July 2022.
« I've now used Linux nftables for firewall rules and it went okay
Some pragmatic issues with Linux kernel mode setting on servers »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Jul 23 23:25:44 2022
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.