'Scanned' versus 'issued' numbers for ZFS scrubs (and resilvers)

February 9, 2019

Sufficiently recent versions of ZFS have new 'zpool status' output during scrubs and resilvers. The traditional old output looks like:

scan: scrub in progress since Sat Feb  9 18:30:40 2019
      125G scanned out of 1.74T at 1.34G/s, 0h20m to go
      0B repaired, 7.02% done

(As you can probably tell from the IO rate, this is a SSD-based pool.)

The new output adds an additional '<X> issued at <RATE>' note in the second line, and in fact you can get some very interesting output in it:

scan: scrub in progress since Sat Feb  9 18:36:33 2019
      215G scanned at 2.24G/s, 27.6G issued at 294M/s, 215G total
      0B repaired, 12.80% done, 0 days 00:10:54 to go

Or (with just the important line):

      271G scanned at 910M/s, 14.5G issued at 48.6M/s, 271G total

In both cases, this claims to have 'scanned' the entire pool but has only 'issued' a much smaller amount of IO. As it turns out, this is a glaring clue as to what is going on, which is that these are the new sequential scrubs in action. Sequential scrubs (and resilvers) split the non-sequential process of scanning the pool into two sides, scanning through metadata to figure out what IOs to issue and then, separately, issuing the IOs after they have been sorted into order (I am pulling this from this presentation, via). A longer discussion of this is in the comment at the start of ZFS on Linux's dsl_scan.c.

This split is what the new 'issued' number is telling you about. In sequential scrubs and resilvers, 'scanned' is how much metadata and data ZFS has been able to consider and queue up IO for, while 'issued' is how much IO has been actively queued to vdevs. Note that it is not physical IO; instead it is progress through what 'zpool list' reports as ALLOC space, as covered in my entry on ZFS scrub rates and speeds.

(All of these pools I'm showing output from use mirrored vdevs, so the actual physical IO is twice the 'issued' figures.)

As we can see from these examples, it is possible for ZFS to completely 'scan' your pool before issuing much IO. This is generally going to require that your pool is relatively small and also that you have a reasonable amount of memory, because ZFS limits how much memory it will use for all of those lists of not yet issued IOs that it is sorting into order. Once your pool is fully scanned, the reported scan rate will steadily decay, because it's computed based on the total time the scrub or resilver has been running, not the amount of time that ZFS took to hit 100% scanned.

(In the current ZFS on Linux code, this memory limit appears to be a per-pool one. On the one hand this means that you can scan several pools at once without one pool limiting the others. On the other hand, this means that scanning multiple pools at once may use more memory than you're expecting.)

Sequential scrubs and resilvers are in FreeBSD 12 and will appear in ZFS on Linux 0.8.0 whenever that is released (ZoL is currently at 0.8.0-rc3). It doesn't seem to be in Illumos yet, somewhat to my surprise.

(This entry was sparked by reading this question to the BSD Now program, via BSD Now 281, which I stumbled over due to my Referer links.)

Comments on this page:

Just wanted to thank you for explaining this; everything makes sense now for what happens during scrubbing of my ZFS mirrored pool of HDs.

Written on 09 February 2019.
« Making more use of keyboard control over window position and size
Open protocols can evolve fast if they're willing to break other people »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Feb 9 19:21:15 2019
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.