Some things on the GUID checksum in ZFS pool uberblocks

July 26, 2019

When I talked about how 'zpool import' generates its view of a pool's configuration, I mentioned that an additional kernel check of the pool configuration is that ZFS uberblocks have a simple 'checksum' of all of the GUIDs of the vdev tree. When the kernel is considering a pool configuration, it rejects it if the sum of the GUIDs in the vdev tree doesn't match the GUID sum from the uberblock.

(The documentation of the disk format claims that it's only the checksum of the leaf vdevs, but as far as I can see from the code it's all vdevs.)

I was all set to write about how this interacts with the vdev configurations that are in ZFS labels, but as it turns out this is no longer applicable. In versions of ZFS that have better ZFS pool recovery, the vdev tree that's used is the one that's read from the pool's Meta Object Set (MOS), not the pool configuration that was passed in from user level by 'zpool import'. Any mismatch between the uberblock GUID sum and the vdev tree GUID sum likely indicates a serious consistency problem somewhere.

(For the user level vdev tree, the difference between having a vdev's configuration and having all of its disks available is potentially important. As we saw yesterday, the ZFS label of every device that's part of a vdev has a complete copy of that vdev's configuration, including all of the GUIDs of its elements. Given a single intact ZFS label for a vdev, you can construct a configuration with all of the GUIDs filled in and thus pass the uberblock GUID sum validation, even if you don't have enough disks to actually use the vdev.)

The ZFS uberblock update sequence guarantees that the ZFS disk labels and their embedded vdev configurations should always be up to date with the current uberblock's GUID sum. Now that I know about the embedded uberblock GUID sum, it's pretty clear why the uberblock must be synced on all vdevs when the vdev or pool configuration is considered 'dirty'. The moment that the GUID sum of the current vdev tree changes, you'd better update everything to match it.

(The GUID sum changes if any rearrangement of the vdev tree happens. This includes replacing one disk with another, since each disk has a unique GUID sum. In case you're curious, the ZFS disk label always has the full tree for a top level vdev, including the special 'replacing' and 'spare' sub-vdevs that show up during these operations.)

PS: My guess from a not very extensive look through the kernel code is that it's very hard to tell from user level if you have a genuine uberblock GUID sum mismatch or another problem that returns the same extended error code to user level. The good news is that I think the only other case that returns VDEV_AUX_BAD_GUID_SUM is if you have missing log device(s).

Written on 26 July 2019.
« How 'zpool import' generates its view of a pool's configuration
What I want out of my window manager »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Jul 26 22:51:41 2019
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.