Some things on the GUID checksum in ZFS pool uberblocks
When I talked about how 'zpool import
' generates its view of a
pool's configuration, I mentioned that an
additional kernel check of the pool configuration is that ZFS
uberblocks have a simple 'checksum' of all of
the GUIDs of the vdev tree. When the kernel is considering
a pool configuration, it rejects it if the sum of the GUIDs in the
vdev tree doesn't match the GUID sum from the uberblock.
(The documentation of the disk format claims that it's only the checksum of the leaf vdevs, but as far as I can see from the code it's all vdevs.)
I was all set to write about how this interacts with the vdev
configurations that are in ZFS labels, but
as it turns out this is no longer applicable. In versions of ZFS
that have better ZFS pool recovery,
the vdev tree that's used is the one that's read from the pool's
Meta Object Set (MOS), not the pool configuration that was passed
in from user level by 'zpool import
'. Any mismatch between the
uberblock GUID sum and the vdev tree GUID sum likely indicates a
serious consistency problem somewhere.
(For the user level vdev tree, the difference between having a vdev's configuration and having all of its disks available is potentially important. As we saw yesterday, the ZFS label of every device that's part of a vdev has a complete copy of that vdev's configuration, including all of the GUIDs of its elements. Given a single intact ZFS label for a vdev, you can construct a configuration with all of the GUIDs filled in and thus pass the uberblock GUID sum validation, even if you don't have enough disks to actually use the vdev.)
The ZFS uberblock update sequence guarantees that the ZFS disk labels and their embedded vdev configurations should always be up to date with the current uberblock's GUID sum. Now that I know about the embedded uberblock GUID sum, it's pretty clear why the uberblock must be synced on all vdevs when the vdev or pool configuration is considered 'dirty'. The moment that the GUID sum of the current vdev tree changes, you'd better update everything to match it.
(The GUID sum changes if any rearrangement of the vdev tree happens.
This includes replacing one disk with another, since each disk has
a unique GUID sum. In case you're curious, the ZFS disk label always
has the full tree for a top level vdev, including the special
'replacing
' and 'spare
' sub-vdevs that show up during these
operations.)
PS: My guess from a not very extensive look through the kernel code
is that it's very hard to tell from user level if you have a genuine
uberblock GUID sum mismatch or another problem that returns the
same extended error code to user level. The good news is that I
think the only other case that returns VDEV_AUX_BAD_GUID_SUM
is if you have missing log device(s).
|
|