Wandering Thoughts archives

2012-02-26

What information I want out of ZFS tools and libraries

Back in comments on my observation that Solaris 11 is closed source, Joshua M. Clulow noted that the Illumos people are working on making a better (and presumably public) version of libzfs, the nominal interface for dealing with ZFS. Although I've moved slowly on this, I think it's time to write down my thoughts about what I want for dealing with ZFS.

First off, my needs are probably somewhat unusual. I don't actually want to do anything to ZFS through libzfs; I just want to extract information. I also mostly don't care if I get an actual C-level API or simply some tools that give me information; either is about as convenient to me, since I'm actually going to consume the information in a non-C environment (either shell scripts or Python, depending on just what we're doing).

What I do need is three things: a stable and documented interface, information in a form that I can easily parse and interpret reliably, and complete information (not just things that have been cooked into some user-friendly form that elides details). The output of current zpool and zfs commands are none of these three; exact output is neither stable nor documented, it's very hard to parse, and it's not complete. What we current get through (ab)using Solaris's current libzfs is complete and easy to 'parse' (C structures are easy to deal with in one sense), but it's not stable or documented.

(I have a moderate bias towards a stable C API for libzfs because at this point I'd rather roll my own information extraction stuff than trust ZFS's own commands, and it's harder to cheat or omit things in a C API. And I don't have to worry that people will feel that, eg, XML is the perfect output format.)

Currently, we need two sorts of information; we need configuration information and pool state information. Configuration information covers things like what disks the pool uses and how it's organized, what filesystems there are, what snapshots there are, and so on. We use this both passively (we periodically record basic information about all pools for tracking purposes) and actively (knowing what disks are in use and how is a vital part of our spares system). Pool state information covers the health of disks in the pool and the state of things like resilvers and scrubs; we use this both for ongoing health monitoring and as part of our spares system.

(We don't currently need to extract performance data but we might at some point in the future.)

As for what specific pieces of configuration and state information we want, the likely answer is 'all of it'. If ZFS tracks it at all, I'm at least potentially interested in it.

Sidebar: how to test a proposed ZFS API

My rather obvious advice to anyone designing a public API for getting ZFS information is to test it by rewriting the information display portions of zpool and zfs using only the public API. If you can't do it at all, the API has obviously failed. However, if the API doesn't give you any extra information over what those two commands need today, it also fails, because both commands don't display most of the available information about configuration and state.

Generally you should be able to use the API to write an absurdly more verbose version of zpool status, one that will deluge you in a pile of detailed information.

ZFSInformationDesire written at 22:15:29; Add Comment

2012-02-01

A ZFS pool scrub wish: suspending scrubs

Like sensible people, we scrub our pools periodically in order to turn up latent problems. Because pool scrubs have a visible impact on responsiveness (at least in the lightly patched Solaris 10 update 8 that we're running), we only run scrubs on weekends (and only scrub one pool per fileserver). However, we've recently started running into problems where pool scrubs slow the fileservers down enough that backups have started failing.

The obvious way around this is to switch things to only doing scrubs when backups aren't running. Except there's a problem: we run backups every day, they run for a fairly long time every day, and some of our pools take up to fifteen hours to scrub. If we only scrub when backups aren't running, there just isn't a fifteen hour gap that our biggest pools need.

(It's possible that they would scrub somewhat faster if they never overlapped with backups, but that's only a vague possibility. And as the pools get more data, they'll take longer and longer to scrub.)

Which brings me to my wish: I wish you could suspend ZFS pool scrubs. Not stop them and start them again from the start, but just put one to sleep by telling the pool to remember where the scrub was but do no further scrub IO for now, then later resume the scrub from where it left off. This would allow us to do even big scrubs around the backups, and in fact we could schedule scrubs much more liberally than we do right now. For example, we might have a couple of hours in a weekday early morning after backups have finished that we could use to get some scrubbing in.

(I'd be perfectly happy if this was only an in-memory pause, so that if you rebooted your system or exported the pool you lost it and had to start from scratch. As an in-memory pause it ought to be relatively simple to implement.)

PS: I checked and this doesn't seem to be in Illumos, at least based on the current Illumos zpool manpage.

ZFSScrubWish written at 11:38:13; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.