In praise of ZFS On Linux's ZED 'ZFS Event Daemon'
I've written before (here) about how our current Linux ZFS fileservers work much like our old OmniOS fileservers. However, not everything is quite the same between ZFS on Linux and traditional Solaris/OmniOS ZFS. One of the most welcome differences for us is ZED, the ZFS Event Daemon. What ZED does that is so great is that it provides a very simple way to take action when ZFS events happen.
When a ZFS event happens, ZED looks through a directory (generally
/etc/zfs/zed.d) to find scripts (or programs) that should be run
in response to the event. Each script is run with a bunch of
environment variables set to describe what's going on, and it can
use those environment variables to figure out what the event is.
ZED decides what things to run based on their names; generally you
wind up with script names like
all-cslab.sh (which is run on
all events) and
resilver_finish-cslab.sh (which is run when a
Because these are just a collection of individual files, you're free to add your own without colliding with or having to alter the standard 'ZEDLETs' provided by ZFS on Linux. Your additions can do anything you want them to, ranging from the simple to the complex. For instance, our simplest ZEDLET simply syslogs all of the ZED environment variables:
PATH=/usr/bin:/usr/sbin:/bin:/sbin:$PATH export PATH if [ "$ZEVENT_SUBCLASS" = "history_event" ]; then exit 0 fi unset ZEVENT_TIME unset ZEVENT_TIME_STRING printenv | fgrep 'ZEVENT_' | sort | fmt -999 | logger -p daemon.info -t 'cslab-zevents' exit 0
(There's a standard 'all-syslog.sh' ZEDLET, but it doesn't syslog all of the information in the zevents. Capturing all of the information is especially useful if you want to write additional ZEDLETs and aren't quite sure what they should look for or what environment variables have useful information.)
It can take a bit of time and experimentation to sort out what ZFS events are generated (and with what information available) in response to various things happening to adn in your ZFS pools. But once you have figured it out, ZED gives you a way to trigger and drive all sorts of system management activities. These can be active (like taking action if devices fail) or passive (like adding markers in your metrics system or performance dashboards for when ZFS scrubs or resilvers start and end, so you can correlate this with other things happening).
Coming from Solaris and OmniOS, where there was no such simple system for reacting to things happening in your ZFS pools, ZED was a breath of fresh air for us. More than anything else, it feels like how ZFS events should have been handled from the start, so that system administrators could flexibly meet their own local needs rather than having to accept whatever the Solaris Fault Management system wanted to give them.
PS: Because ZFS on Linux is now OpenZFS, I believe that ZED will probably eventually show up in FreeBSD (if it isn't already there). Perhaps it will even some day be ported back to Illumos.
Comments on this page:Written on 19 July 2020.