I'm done with building tools around 'zpool status' output

March 31, 2014

Back when our fileserver environment was young, I built a number of local tools and scripts that relied on 'zpool status' to get information about pools, pool states, and so on. The problem with using 'zpool status' is of course that it is not an API, it's something intended for presentation to users, and so as a result people feel free to change its output from time to time. At the time using zpool's output seemed like the best option despite this, or more exactly the best (or easiest) of a bad lot of options.

Well, I'm done with that.

We're in the process of migrating to OmniOS. As I've had to touch scripts and programs to update them for OmniOS's changes in the output of 'zpool status', I've instead been migrating them away from using zpool at all in favour of having them rely on a local ZFS status reporting tool. This migration isn't complete (some tools haven't needed changes yet and I'm letting them be), but it's already simplified my life in various ways.

One of those ways is that now we control the tools. We can guarantee stable output and we can make them output exactly what we want. We can even make them output the same thing on both our current Solaris machines and our new OmniOS machines so that higher level tooling is insulated from what OS version it's running on. This is very handy and not something that would be easy to do with 'zpool status'.

The other, more subtle way that this makes my life better is that I now have much more confidence that things are not going to subtly break on me. One problem with using zpool's output is that all sorts of things can change about it and things that use it may not notice, especially if the output starts omitting things to, for example, 'simplify' the default output. Since our tools are abusing private APIs they may well break (and may well break more than zpool's output), but when they break we can make sure that it's a loud break. The result is much more binary; if our tools work at all they're almost certainly accurate. A script's interpretation of zpool's output is not necessarily so.

(Omitting things by default is not theoretical. In between S10U8 and OmniOS, 'zfs list' went from including snapshots by default to excluding them by default. This broke some of our code that was parsing 'zfs list' output to identify snapshots, and in a subtle way; the code just thought there weren't any when there were. This is of course a completely fair change, since 'zfs list' is not an API and this probably makes things better for ordinary users.)

I accept that rolling our own tools has some additional costs and has some risks. But I'd rather own those costs and those risks explicitly rather than have similar ones arise implicitly because I'm relying on a necessarily imperfect understanding of zpool's output.

Actually, writing this entry has made me realized that it's only half of the story. The other half is going to take another entry.

Written on 31 March 2014.
« Why I sometimes reject patches for my own software
I'm angry that ZFS still doesn't have an API »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Mar 31 23:22:29 2014
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.