What determines how much work a ZFS resilver has to do

September 14, 2012

We lost another fileserver iSCSI backend the other day for the first time in a long while, and this has gotten me thinking about how expensive a ZFS resilver is. What I really mean by that is some way to start guessing at the amount of time that a resilver will take (or at least the relative amount of time compared to another potential resilver).

In a simple world, the time to resilver would be proportional to the amount of data on the vdev being resilvered. ZFS doesn't work this way. Because ZFS scrubs and resilvers are nonlinear, a ZFS resilver has to look at all of the pool's metadata in order to find all of the data it needs to copy (for mirroring) or recreate (for raidzN). So I think the answer is that a resilver takes time proportional to the size of the pool's metadata (including directories) plus the amount of data on the vdev or vdevs that are being resilvered. Scanning pool metadata is probably going to be largely seek bound; scanning and copying the data itself will hopefully be more linear.

Unfortunately I don't believe that our version of Solaris keeps track of the metadata space usage for ZFS pools (and I don't think you can deduce a good number for it from other information). You can get overall space usage but not information on how much of that space is directories, inodes, and so on versus actual file contents. However if you just want a relative comparison you can assume that two pools have the same relative metadata/data ratio and then directly compare pool sizes.

Where this approach hits the rocks for me is how to scale the relative contributions of the size of the vdevs being resilvered and the size of the theoretical pool metadata (or alternately how to factor in the size of the resilvering vdevs). Without doing much analysis I think that you want to take the total pool size, subtract the size of the vdev(s) being resilvered, scale down by some factor, and then add the size of the resilvering vdevs back in again.

(I think you need to specifically consider the size of the resilvering vdev(s). If pool A and pool B are both 500 GB but the vdev to be resilvered is 20 GB in pool A and 200 GB in pool B, pool A seems likely to finish resilvering first.)

Written on 14 September 2012.
« Why you need mass package rebuilds in some circumstances
Sensible reboot monitoring »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Sep 14 03:23:10 2012
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.