ZFS scrubs and resilvers are not sequential IO
July 6, 2010
Here is something that is probably not as widely known as it could be: ZFS scrubs and resilvers are not done with sequential IO, the way conventional RAID resynchronization and checking is done.
Conventional RAID resyncs run in linear order from the start of each disk to the end. This means that they're strictly sequential IO if they're left undisturbed by user IO (which is one reason that you can destroy conventional RAID resync performance by doing enough random user IO).
ZFS scrubs and resilvers don't work like this; instead, they effectively
walk 'down' the data structures that make up the ZFS pool and its
filesystems, starting from the uberblocks and ending up at file data
blocks. This isn't surprising for scrubs; this pattern is common
This has a number of consequences. One of them is that the more fragmented your pool is, the more you have randomly created and deleted and overwritten files and portions of files, the slower it will likely scrub and resilver. This is because fragmentation causes things to be scattered over the disk(s), which requires more seeks and gives the scrubbing process less chance for fast sequential IO. (Remember that modern disks can only do about 100 to 120 seeks a second.)
(I think that a corollary to this is that lots of little files will make your ZFS pools scrub slower, especially if you create and delete them randomly all over the filesystem. An old-style Usenet spool filesystem would probably be a ZFS worst case.)
I'm not sure how (or if) ZFS scrubbing deals with changes to the ZFS pool. ZFS's design means that scrubbing won't get confused by updates, but if it chases them it could do a potentially unbounded amount of work if you keep deleting old data and creating new data fast enough; if it doesn't chase updates, it may miss recent problems.
(This information rattles around the ZFS mailing list, which is where I picked it up from.)
* * *
Atom feeds are available; see the bottom of most pages.