2019-02-09
'Scanned' versus 'issued' numbers for ZFS scrubs (and resilvers)
Sufficiently recent versions of ZFS have new 'zpool status
' output
during scrubs and resilvers. The traditional old output looks like:
scan: scrub in progress since Sat Feb 9 18:30:40 2019 125G scanned out of 1.74T at 1.34G/s, 0h20m to go 0B repaired, 7.02% done
(As you can probably tell from the IO rate, this is a SSD-based pool.)
The new output adds an additional '<X> issued at <RATE>' note in the second line, and in fact you can get some very interesting output in it:
scan: scrub in progress since Sat Feb 9 18:36:33 2019 215G scanned at 2.24G/s, 27.6G issued at 294M/s, 215G total 0B repaired, 12.80% done, 0 days 00:10:54 to go
Or (with just the important line):
271G scanned at 910M/s, 14.5G issued at 48.6M/s, 271G total
In both cases, this claims to have 'scanned' the entire pool but has only 'issued' a much smaller amount of IO. As it turns out, this is a glaring clue as to what is going on, which is that these are the new sequential scrubs in action. Sequential scrubs (and resilvers) split the non-sequential process of scanning the pool into two sides, scanning through metadata to figure out what IOs to issue and then, separately, issuing the IOs after they have been sorted into order (I am pulling this from this presentation, via). A longer discussion of this is in the comment at the start of ZFS on Linux's dsl_scan.c.
This split is what the new 'issued' number is telling you about.
In sequential scrubs and resilvers, 'scanned' is how much metadata
and data ZFS has been able to consider and queue up IO for, while
'issued' is how much IO has been actively queued to vdevs. Note
that it is not physical IO; instead it is progress through what
'zpool list
' reports as ALLOC
space, as covered in my entry
on ZFS scrub rates and speeds.
(All of these pools I'm showing output from use mirrored vdevs, so the actual physical IO is twice the 'issued' figures.)
As we can see from these examples, it is possible for ZFS to completely 'scan' your pool before issuing much IO. This is generally going to require that your pool is relatively small and also that you have a reasonable amount of memory, because ZFS limits how much memory it will use for all of those lists of not yet issued IOs that it is sorting into order. Once your pool is fully scanned, the reported scan rate will steadily decay, because it's computed based on the total time the scrub or resilver has been running, not the amount of time that ZFS took to hit 100% scanned.
(In the current ZFS on Linux code, this memory limit appears to be a per-pool one. On the one hand this means that you can scan several pools at once without one pool limiting the others. On the other hand, this means that scanning multiple pools at once may use more memory than you're expecting.)
Sequential scrubs and resilvers are in FreeBSD 12 and will appear in ZFS on Linux 0.8.0 whenever that is released (ZoL is currently at 0.8.0-rc3). It doesn't seem to be in Illumos yet, somewhat to my surprise.
(This entry was sparked by reading this question to the BSD Now program, via BSD Now 281, which I stumbled over due to my Referer links.)