How 'zpool import
' generates its view of a pool's configuration
Full bore ZFS pool import happens in two stages,
where 'zpool import
' puts together a vdev configuration for the
pool, passes it to the kernel, and then the kernel reads the real
pool configuration from ZFS objects in the pool's Meta Object Set.
How 'zpool import
' does this is outlined at a high level by a
comment in zutil_import.c
;
to summarize the comment, the configuration is created by assembling
and merging together information from the ZFS label of each device.
There is an important limitation to this process, which is that the
ZFS label only contains information on the vdev configuration, not
on the overall pool configuration.
To show you what I mean, here's relevant portions of a ZFS label
(as dumped by 'zdb -l
') for a device from one of our pools:
txg: 5059313 pool_guid: 756813639445667425 top_guid: 4603657949260704837 guid: 13307730581331167197 vdev_children: 5 vdev_tree: type: 'mirror' id: 3 guid: 4603657949260704837 is_log: 0 children[0]: type: 'disk' id: 0 guid: 7328257775812323847 path: '/dev/disk/by-path/pci-0000:19:00.0-sas-phy3-lun-0-part6' children[1]: type: 'disk' id: 1 guid: 13307730581331167197 path: '/dev/disk/by-path/pci-0000:00:17.0-ata-4-part6'
(For much more details that are somewhat out of date, see the ZFS On-Disk Specifications [pdf].)
Based on this label, 'zpool import
' knows what the GUID of this
vdev is, which disk of the vdev it's dealing with and where the
other disk or disks in it are supposed to be found, the pool's GUID,
how many vdevs the pool has in total (it has 5) and which specific
vdev this is (it's the fourth of five; vdev numbering starts from
0). But it doesn't know anything about the other vdevs, except
that they exist (or should exist).
When zpool assembles the pool configuration, it will use the best
information it has for each vdev, where the 'best' is taken to be
the vdev label with the highest txg
(transaction group number).
The label with the highest txg for the entire pool is used to
determine how many vdevs the pool is supposed to have. Note that
there's no check that the best label for a particular vdev has a
txg that is anywhere near the pool's (assumed) current txg. This
means that if all of the modern devices for a particular vdev
disappear and a very old device for it reappears, it's possible for
zpool to assemble a (user-level) configuration that claims that the
old device is that vdev (or the only component available for that
vdev, which might be enough if the vdev is a mirror).
If zpool can't find any labels for a particular vdev, all it can
do in the configuration is fill in an artificial 'there is a vdev
missing' marker; it doesn't even know whether it was a raidz or a
mirrored vdev, or how much data is on it. When 'zpool import
'
prints the resulting configuration, it doesn't explicitly show these
missing vdevs; if I'm reading the code right, your only clue as to
where they are is that the pool configuration will abruptly skip
from, eg, 'mirror-0' to 'mirror-2' without reporting 'mirror-1'.
There's an additional requirement for a working pool configuration,
although it's only checked by the kernel, not zpool. The pool
uberblocks have a ub_guid_sum
field, which must match the sum
of all GUIDs in the vdev tree. If the GUID sum doesn't match, you'll
get one of those frustrating 'a device is missing somewhere' errors
on pool import. An entirely missing vdev naturally forces this to
happen, since all of its GUIDs are unknown and obviously not
contributing what they should be to this sum. I don't know how this
interacts with better ZFS pool recovery.
|
|