What we use ZFS on Linux's ZED 'zedlets' for

January 12, 2024

One of the components of OpenZFS is the ZFS Event Daemon ('zed'). Old ZFS hands will understand me if I say that it's the OpenZFS equivalent of the Solaris/Illumos fault management system as applied to ZFS; for other people, it's best described as ZFS's system for handling (kernel) ZFS events such as ZFS pools experiencing disk errors. Although the manual page obfuscates this a bit, what ZED does is it runs scripts (or programs in general) from a particular directory, normally /etc/zfs/zed.d, choosing what scripts to run for particular events based on their names. OpenZFS ships with a number of zedlets ('zedlet' is the name for these scripts), and you can add your own, which we do in our ZFS fileserver environment.

The standard ZED setup supports a number of relatively standard notification methods, including email; we enable this in our /etc/zfs/zed.d/zed.rc. The email you get through these standard notifications is a bit generic but it's a useful starting point and fallback. Beyond this, we have three additional zedlets we add:

  • one zedlet simply syslogs full details about almost all events by doing almost literally the following:

    printenv | fgrep 'ZEVENT_' | sort | fmt -999 |
      logger -p daemon.info -t 'cslab-zevents'
    

    ZED has an 'all-syslog.sh' zedlet that's normally enabled, but it doesn't capture absolutely everything this way and it believes in reformatting information a bit. We wanted to capture full event information so we could do as complete a reconstruction of things as possible later.

  • one zedlet syslogs when vdev state changes happen (and what they are) and immediately triggers our ZFS status reporting and spares handling system. Because ZED treats individual disks as vdevs, this is triggered for things like loss of disks and disk read, write, or checksum errors. Our own system for this will then email us a report about issues and start any sparing that's necessary (which will probably result in more email).

  • one zedlet syslogs when resilvers complete and triggers a run of our ZFS status reporting and spares handling system. This will report to us when a pool becomes healthy again and possibly start another round of sparing if we were holding back to not have too many resilvers happening at once.

Because ZED has a hard-coded ten second timeout on zedlets, we have to run our status reporting and spares handling in the background of the zedlet, which means we need to use some straightforward shell locking.

The net effect of this setup is that we'll generally get at least two emails if a disk has problems. One email will be generically formatted and come from the standard ZED email notification generated by the various '*-notify.sh' zedlets. The second email comes from our own ZFS status reporting system, using our own tools to report and summarize ZFS pool status with informative (for us) disk names and so on.

Sidebar: Why we have our own email reporting

A typical status report can look something like this:

Subject: sanhealthmon: details of ZFS pool problems on sanshui
Newly degraded pools:
  fs16-matter-02 fs16-rahulgk-01 fs16-vision-02

[...]
pool:     fs16-rahulgk-01
overall:  problems
problems: disk(s) have repaired errors
config:
  mirror      ONLINE   
    disk01/0  ONLINE   
    disk09/0  REPAIRED (errors: 1 read/0 write/0 checksum)
[...]

This is a lot more readable (for us) than decoding the equivalent in the normal ZFS email, and it also often summarizes the state of multiple pools if all of them have experienced errors simultaneously (because, for example, they all use the same physical disk and that physical disk has had a problem).

Written on 12 January 2024.
« An old Unix mistake you could make when signaling init (PID 1)
Indexed archive formats and selective restores »

Page tools: View Source.
Search:
Login: Password:

Last modified: Fri Jan 12 22:46:04 2024
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.