A ZFS pool scrub wish: suspending scrubs

February 1, 2012

Like sensible people, we scrub our pools periodically in order to turn up latent problems. Because pool scrubs have a visible impact on responsiveness (at least in the lightly patched Solaris 10 update 8 that we're running), we only run scrubs on weekends (and only scrub one pool per fileserver). However, we've recently started running into problems where pool scrubs slow the fileservers down enough that backups have started failing.

The obvious way around this is to switch things to only doing scrubs when backups aren't running. Except there's a problem: we run backups every day, they run for a fairly long time every day, and some of our pools take up to fifteen hours to scrub. If we only scrub when backups aren't running, there just isn't a fifteen hour gap that our biggest pools need.

(It's possible that they would scrub somewhat faster if they never overlapped with backups, but that's only a vague possibility. And as the pools get more data, they'll take longer and longer to scrub.)

Which brings me to my wish: I wish you could suspend ZFS pool scrubs. Not stop them and start them again from the start, but just put one to sleep by telling the pool to remember where the scrub was but do no further scrub IO for now, then later resume the scrub from where it left off. This would allow us to do even big scrubs around the backups, and in fact we could schedule scrubs much more liberally than we do right now. For example, we might have a couple of hours in a weekday early morning after backups have finished that we could use to get some scrubbing in.

(I'd be perfectly happy if this was only an in-memory pause, so that if you rebooted your system or exported the pool you lost it and had to start from scratch. As an in-memory pause it ought to be relatively simple to implement.)

PS: I checked and this doesn't seem to be in Illumos, at least based on the current Illumos zpool manpage.

Written on 01 February 2012.
« The solution to the modern X font handling mystery
Understanding Resident Set Size and the RSS problem on modern Unixes »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Feb 1 11:38:13 2012
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.