I'm done with building tools around 'zpool status
' output
Back when our fileserver environment was young,
I built a number of local tools and scripts that relied on 'zpool
status
' to get information about pools, pool states, and so on. The
problem with using 'zpool status
' is of course that it is not an API,
it's something intended for presentation to users, and so as a result
people feel free to change its output from time to time. At the time
using zpool
's output seemed like the best option despite this, or more
exactly the best (or easiest) of a bad lot of options.
Well, I'm done with that.
We're in the process of migrating to OmniOS. As I've had to touch
scripts and programs to update them for OmniOS's changes in the output
of 'zpool status
', I've instead been migrating them away from using
zpool
at all in favour of having them rely on a local ZFS status
reporting tool. This migration isn't complete
(some tools haven't needed changes yet and I'm letting them be), but
it's already simplified my life in various ways.
One of those ways is that now we control the tools. We can guarantee
stable output and we can make them output exactly what we want. We
can even make them output the same thing on both our current Solaris
machines and our new OmniOS machines so that higher level tooling is
insulated from what OS version it's running on. This is very handy and
not something that would be easy to do with 'zpool status
'.
The other, more subtle way that this makes my life better is that I now
have much more confidence that things are not going to subtly break on
me. One problem with using zpool
's output is that all sorts of things
can change about it and things that use it may not notice, especially
if the output starts omitting things to, for example, 'simplify' the
default output. Since our tools are abusing private APIs they may well
break (and may well break more than zpool
's output), but when they
break we can make sure that it's a loud break. The result is much more
binary; if our tools work at all they're almost certainly accurate. A
script's interpretation of zpool
's output is not necessarily so.
(Omitting things by default is not theoretical. In between S10U8 and
OmniOS, 'zfs list
' went from including snapshots by default to
excluding them by default. This broke some of our code that was parsing
'zfs list
' output to identify snapshots, and in a subtle way; the
code just thought there weren't any when there were. This is of course
a completely fair change, since 'zfs list
' is not an API and this
probably makes things better for ordinary users.)
I accept that rolling our own tools has some additional costs and has
some risks. But I'd rather own those costs and those risks explicitly
rather than have similar ones arise implicitly because I'm relying on a
necessarily imperfect understanding of zpool
's output.
Actually, writing this entry has made me realized that it's only half of the story. The other half is going to take another entry.
|
|