The ZFS opacity problem and its effect on manageability

June 23, 2011

I've alluded to this before in passing, but one of my great frustrations with ZFS is that it has basically no public interface or API for getting information about its state; it is an almost completely opaque box. Do you want to know configuration information about your pools? Do you want to know state information about how healthy or damaged they are? Tough. You can't have it, or rather your programs and systems can't have it. Not with a public, reliable interface at any rate. There is no library that you can call, no program that dumps out comprehensive information in a parseable and reliable format. In short, ZFS is just not very observable.

Oh, sure, Solaris sort of makes some of the information you want available in the form of zpool status. But zpool is a frontend not a tool; its output intended for people to read, and so the information is incomplete and Solaris developers feel free to change the output around to make it look better. (And it sometimes helpfully lies to you.)

This opacity hurts. It hurts monitoring systems, which have no good reliable way of watching ZFS pool status so that they can do things like email you if a pool degrades. It hurts tools that want to live on top of ZFS in complex environments (such as SANs) in order to do things like check that your disk usage and layout constraints are being respected. It hurts attempts to add more sophisticated (and site-local) handling of things like spares replacement. It even hurts things like site inventories, where you want to make sure that you have a complete and accurate record of the filesystem setup on every server.

Solaris itself cannot possibly provide everything that everyone needs for ZFS management, and I wish that ZFS would stop trying to pretend otherwise.

Sidebar: extracting information from ZFS

If your systems need this information anyways, the current state of ZFS gives you two equally unappetizing choices. First, you can parse the output of zpool status and other ZFS commands, hoping that you can get what you need and can make the resulting lash-up work reliably. Second, you can use undocumented interfaces to directly get the information, at the cost of dealing with changes in them. (This was a lot easier in the days when OpenSolaris source code was being updated.)

We've done both. I'm not happy with either.

Written on 23 June 2011.
« A basic Namespace metaclass for Python
Milter tools for Python and an experiment with coding in public »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Jun 23 01:01:30 2011
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.