In praise of ZFS On Linux's ZED 'ZFS Event Daemon'

July 19, 2020

I've written before (here) about how our current Linux ZFS fileservers work much like our old OmniOS fileservers. However, not everything is quite the same between ZFS on Linux and traditional Solaris/OmniOS ZFS. One of the most welcome differences for us is ZED, the ZFS Event Daemon. What ZED does that is so great is that it provides a very simple way to take action when ZFS events happen.

When a ZFS event happens, ZED looks through a directory (generally /etc/zfs/zed.d) to find scripts (or programs) that should be run in response to the event. Each script is run with a bunch of environment variables set to describe what's going on, and it can use those environment variables to figure out what the event is. ZED decides what things to run based on their names; generally you wind up with script names like all-cslab.sh (which is run on all events) and resilver_finish-cslab.sh (which is run when a resilver finishes).

Because these are just a collection of individual files, you're free to add your own without colliding with or having to alter the standard 'ZEDLETs' provided by ZFS on Linux. Your additions can do anything you want them to, ranging from the simple to the complex. For instance, our simplest ZEDLET simply syslogs all of the ZED environment variables:

PATH=/usr/bin:/usr/sbin:/bin:/sbin:$PATH
export PATH
if [ "$ZEVENT_SUBCLASS" = "history_event" ]; then
        exit 0
fi
unset ZEVENT_TIME
unset ZEVENT_TIME_STRING
printenv | fgrep 'ZEVENT_' | sort | fmt -999 |
    logger -p daemon.info -t 'cslab-zevents'
exit 0

(There's a standard 'all-syslog.sh' ZEDLET, but it doesn't syslog all of the information in the zevents. Capturing all of the information is especially useful if you want to write additional ZEDLETs and aren't quite sure what they should look for or what environment variables have useful information.)

It can take a bit of time and experimentation to sort out what ZFS events are generated (and with what information available) in response to various things happening to adn in your ZFS pools. But once you have figured it out, ZED gives you a way to trigger and drive all sorts of system management activities. These can be active (like taking action if devices fail) or passive (like adding markers in your metrics system or performance dashboards for when ZFS scrubs or resilvers start and end, so you can correlate this with other things happening).

Coming from Solaris and OmniOS, where there was no such simple system for reacting to things happening in your ZFS pools, ZED was a breath of fresh air for us. More than anything else, it feels like how ZFS events should have been handled from the start, so that system administrators could flexibly meet their own local needs rather than having to accept whatever the Solaris Fault Management system wanted to give them.

PS: Because ZFS on Linux is now OpenZFS, I believe that ZED will probably eventually show up in FreeBSD (if it isn't already there). Perhaps it will even some day be ported back to Illumos.


Comments on this page:

By crest at 2020-07-20 01:52:29:

On FreeBSD ZFS events are exported via devctl(4) to the generic devd(8) daemon that also handles other events (e.g. NIC link state changes, hotplugging).

There is also zfsd(8) that just deals with zfs errors by replacing failed disks with spares.

I am pretty sure we have something almost exactly analogous in illumos, and which has been there since the Solaris days: https://illumos.org/man/1M/syseventadm

It runs programs in response to sysevents, some of which are generated by ZFS.

Oh wow, this looks very cool. I recently built a "ZFS Playground" which can be used to run ZFS in a VM and simulate things like disk failures and device corruption. I'm gonna see about integrating ZED so I can gain a better understanding of it.

My ZFS Playground is at https://github.com/dmuth/zfs-playground if you're curious. Feel free to fool around with it, if you're so inclined.

Written on 19 July 2020.
« Using Go build directives to optionally use new APIs in the standard library
An exploration of why Python doesn't require a 'main' function »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Jul 19 21:58:54 2020
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.