The limits of what ZFS scrubs check

December 28, 2015

In the ZFS community, there is a widespread view that ZFS scrubs are the equivalent of fsck for ordinary filesystems and so check for and find at least as many error conditions as fsck does. Unfortunately this view of ZFS scrubs is subtly misleading and can lead you to expect them to do things that they simply don't.

The simple version of what a ZFS scrub does is that it verifies the checksum for every copy of every (active) block in the ZFS pool. It also explicitly verifies parity blocks for RAIDZ vdevs (which a normal error-free read does not). In the process of doing this verification, the scrub must walk the entire object tree of the pool from the top downwards, which has the side effect of more or less verifying this hierarchy; certainly if there's something like a directory entry that points to an invalid thing, you will get a checksum error somewhere in the process.

However, this is all that a ZFS scrub verifies. In particular, it does not check the consistency and validity of metadata that isn't necessary to walk the ZFS object tree. This includes things like much of the inode data that is returned by stat() calls, and also internal structural information that is not necessary to walk the tree. Such information is simply tacitly assumed to be correct if its checksum verifies.

What this means at a broad level is that while a ZFS scrub guards against on disk corruption of data that was correct when it was written, it does not protect against internal corruption of data. If RAM errors or ZFS bugs cause corrupt data to be written, a ZFS scrub will not detect it even though it may be obvious in, for example, a ls -l. This is not just a theoretical issue, and has been encountered on multiple platforms.

(I also believe that ZFS scrubs don't try to do full consistency checks on ZFS's tracking of free disk blocks. I'm not sure if they even try to check that all in-use blocks are actually marked that way.)

This means that a ZFS scrub does somewhat different checks than a traditional fsck. Traditional fsck can't verify block integrity except indirectly, unlike scrubs, but fsck does a lot of explicit consistency checks of things like inode modes to make sure they're sane and it does verify that the filesystem's idea of free space is correct.

It would be possible to make ZFS scrubs do additional checks, and this may happen at some point. But it is not the state of affairs today, so today you can have a ZFS pool with corruption that never the less passes ZFS scrubs with no errors. In extreme cases, you may wind up with a pool that panics the system. You can do a certain amount of verification yourself, for example by writing a program that walks the entire filesystem to verify that there are no inodes with crazy modes. And if you make your backups with a conventional system that works through the filesystem (instead of with ZFS snapshot replication), your backups will do a certain amount of verification themselves just by walking the filesystem and trying to read all of the files (sooner or later).

Written on 28 December 2015.
« A confession: I find rejecting spam at SMTP time to be more satisfying
Take notes when (and as) you do things and keep them »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Dec 28 02:44:19 2015
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.