ZFS and multi-pool shared spares

June 18, 2010

One of the features of ZFS's spares handling is that if you have multiple pools, you can share spare disks between them. This lets you have a single global pool of spares that are used by whichever pool needs them, or more complex schemes if you really want them (you might in, say, a SAN environment).

Our experience is that you want to be really cautious around multi-pool shared spares, because they're buggy and they don't necessarily work very well in practice in some failures. Overall, they seem far more like a first-cut feature than something that is either very useful or thoroughly tested. My strong general impression is that the Solaris engineering effort is almost entirely focused on what they see as the common case, where pools have dedicated spare devices; shared spares are a corner case that gets relatively little attention and development, something where a basic feature was thrown into the code because it looked easy and sort of met a need.

(In fact, our experience has been so negative that we are slowly building our own spare handling system.)

The bugs are the most serious issue. Solaris versions before Solaris 10 update 8 have significant bugs in adding and removing spares such that you can wind up with useless yet 'stuck' spares (we have some painful experience with this) or not be able to remove dead spares from pools. Even Solaris 10 update 8 has not completely fixed the spares problem; we have one system where there is one particular pool that simply will not share spares with other pools.

(If we added a spare to that pool and to any other pool on the system, the spare got a corrupted GUID in one of the pools. All of the other pools on the system could and do share spares with each other.)

Beyond the general issues with ZFS spare handling, shared spares work acceptably for simple situations. If you have a single failure, ZFS will activate a spare in whichever pool needs it, things will resilver, and you will be fine. The problems come when you have a large enough failure that you need more spares than you have, because ZFS (of course) has no idea of prioritization for what disks in what pools get replaced with spares; in the case of simultaneous failures, it basically picks randomly. The result can be essentially useless and as an unpleasant bonus the resilver IO load can destroy your system performance.

(I don't blame ZFS for not handling this case, since how to prioritize spares deployment is a local policy decision, but it does make shared spares less useful in some situations. And it would be nice to be able to have some control over the situation so you could actually implement a local policy; instead ZFS and Solaris has locked everything up inside a series of black boxes.)

Written on 18 June 2010.
« One reason why I prefer browser windows to browser tabs
Don't make your 'I am processing' animation too complex »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Jun 18 02:31:00 2010
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.