ZFS resilvers are a whole-pool activity

October 2, 2010

In a conventional RAID system with a RAID array that's made up from multiple mirrors, mirror resynchronization is a single-mirror affair. Other mirrors in the array are not affected. This is not how ZFS works.

One of the consequences of ZFS scrubs and resilvers being nonlinear is that resilvers do not neatly confine their activity to only the disks of the vdev being resilvered. Instead, ZFS may need to traverse data structures that live in other vdevs in order to find out what data is live on the resilvering vdev. (However, ZFS does try to do as little extra IO as possible.)

This makes a resilver a whole pool affair (which is really what you'd expect, given that scrubs and resilvers use basically the same code). The most important consequence of this is that starting a resilver on a second vdev restarts an ongoing resilver from the beginning, no matter how close the existing resilver was to completion.

So: if you have disk failure in one mirror vdev, activate a spare, and then have a second disk fail in another mirror and activate another spare, work on resilvering the first spare will immediately restart from scratch. Depending on how fast your resilver goes, this may cost you a significant amount of time. This is unlike traditional RAID systems, where you can start a new mirror resync on one mirror without doing anything to an almost-complete mirror resync on another mirror.

(We have pools that take hours to resilver, as we found out recently. And yes, I wound up restarting a resilver from scratch in just this way, losing a chunk of time in the process.)

This has obvious implications for how you want to deal with disk failures, whether in scripts or by hand. Also, I think that there is no universal answer on whether or not to abort an existing resilver in order to start an additional one, although with sufficient MTTDL math, you can probably work out the mathematic answer (based on whatever MTTDL model and numbers you wanted to use). This implies that there is no single right answer that can be coded into Solaris for you; there's always local policy decisions and risk factors to be considered.

(In part the balance is between time to return a pool to total redundancy and time to return each vdev to redundancy. If you wait for your first resilver to finish, you get partial redundancy faster and total redundancy slower.)

Written on 02 October 2010.
« Stopping kernel updates on Ubuntu
An API mistake Unix has made several times »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Oct 2 01:46:38 2010
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.