I wish ZFS pools kept a persistent count of various errors

March 12, 2022

Famously, ZFS pools will report a count of read, write, and checksum errors on the level of the pool, vdevs, and individual devices, counts that are persistent over reboots (and thus pool exports and imports). Equally famously, ZFS expects you to clear these counts when (and if) you resolve problems; for example, if you want to see if you have a persistent checksum problem or a one-time thing, you'll normally clear the error count and re-scrub the pool. This makes these error counts a (persistent) count of recent errors, not a persistent count of errors over the lifetime of the pool, vdev, or device.

What I've come to wish for over the time we've been running our ZFS fileservers is just such a persistent count (as well as persistent information about how many total bytes have been healed or found with unfixable errors). For long term management, it's nice to know this sort of slowly accumulating information. You can look at it at any point to see if something stands out, and you can capture it in metrics systems to track growth over time. Without it, you're relying on fallible human memory (or equally fallible human tracking) to notice that your checksum errors are increasing on this sort of disk, or on disks this old, or keep showing up once every few months on this hardware, and other things like that.

(ZFS has pool history, but even 'zpool history -i' seems to not include fixable errors under at least some circumstances.)

In my view, the ideal implementation would have persistent error counts on all levels, from the pool down to individual devices. The individual device count would be dropped when a device was removed from the pool (through replacement, for example), but the pool persistent count would live as long as the pool did and the vdev persistent count would, in most cases, be just as long-lived. Since ZFS pool, vdev, and device data is relatively free-form (it's effectively a key-value store), it wouldn't be too hard to add this to ZFS as a new pool feature.

Today, of course, you can do this through good record keeping and perhaps good logging. On Linux at least, the ZFS event daemon provides you with an opportunity to write a persistent log of all ZFS disk errors to some place. On Illumos, probably syseventadm can be made to do the same thing. Detailed logging can also give you more information than counts do; for example, you can see if the problems reoccur in the same spot on the disk, or if they move around.

(Of course 'the same spot on the disk' is not terribly meaningful these days, especially on solid state disks.)

Comments on this page:

I don't think there's anything wrong with keeping these values in ZFS, but I also don't see what the advantage is.

If you're interested in keeping this kind of information you probably also want to be able to correlate it with S.M.A.R.T attributes or SCSI defects as well as ATA/SCSI READ/WRITE errors in the OS, and collecting a number of other things like unstructured log files, I/O rates, hardware information et cetera ad nauseum.

All of these things can be collected and acted upon using event monitoring and alerting software like Prometheus, ELK, Zabbix, Open-/LibreNMS, or all the other pieces of software that exist for this exact purpose.

Written on 12 March 2022.
« Filesystems can experience at least three different sorts of errors
We do see ZFS checksum failures, but only infrequently »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Mar 12 22:32:23 2022
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.