The ZFS opacity problem and its effect on manageability
I've alluded to this before in passing, but one of my great frustrations with ZFS is that it has basically no public interface or API for getting information about its state; it is an almost completely opaque box. Do you want to know configuration information about your pools? Do you want to know state information about how healthy or damaged they are? Tough. You can't have it, or rather your programs and systems can't have it. Not with a public, reliable interface at any rate. There is no library that you can call, no program that dumps out comprehensive information in a parseable and reliable format. In short, ZFS is just not very observable.
Oh, sure, Solaris sort of makes some of the information you want
available in the form of
zpool status. But zpool is a frontend not
a tool; its output intended for people
to read, and so the information is incomplete and Solaris developers
feel free to change the output around to make it look better. (And it
sometimes helpfully lies to you.)
This opacity hurts. It hurts monitoring systems, which have no good reliable way of watching ZFS pool status so that they can do things like email you if a pool degrades. It hurts tools that want to live on top of ZFS in complex environments (such as SANs) in order to do things like check that your disk usage and layout constraints are being respected. It hurts attempts to add more sophisticated (and site-local) handling of things like spares replacement. It even hurts things like site inventories, where you want to make sure that you have a complete and accurate record of the filesystem setup on every server.
Solaris itself cannot possibly provide everything that everyone needs for ZFS management, and I wish that ZFS would stop trying to pretend otherwise.
Sidebar: extracting information from ZFS
If your systems need this information anyways, the current state of ZFS
gives you two equally unappetizing choices. First, you can parse the
zpool status and other ZFS commands, hoping that you can get
what you need and can make the resulting lash-up work reliably. Second,
you can use undocumented interfaces to directly get the information, at
the cost of dealing with changes in them. (This was a lot easier in the
days when OpenSolaris source code was being updated.)
We've done both. I'm not happy with either.