2017-08-16
The three different names ZFS stores for each vdev disk (on Illumos)
I sort of mentioned yesterday that ZFS keeps
information on several different ways of identifying disks in pools.
To be specific, it keeps three different names or ways of identifying
each disk. You can see this with 'zdb -C
' on a pool, so here's
a representative sample:
# zdb -C rpool
MOS Configuration: [...] children[0]: type: 'disk' id: 0 guid: 15557853432972548123 path: '/dev/dsk/c3t0d0s0' devid: 'id1,sd@SATA_____INTEL_SSDSC2BB08__BTWL4114016X080KGN/a' phys_path: '/pci@0,0/pci15d9,714@1f,2/disk@0,0:a' [...]
The guid
is ZFS's internal identifier for the disk,
and is stored on the disk itself as part of the disk label. Since
you have to find the disk to read it, it's not something that ZFS
uses to find disks, although it is part of verifying that ZFS has
found the right one. The three actual names for the disk are reported
here as path
, devid
aka 'device id', and phys_path
aka
'physical path'.
The path
is straightforward; it's the filesystem path to the
device, which here is a conventional OmniOS (Illumos, Solaris)
cNtNdNsN name typical of a plain, non-multipathed disk. As this
is a directly attached SATA disk, the phys_path
shows us the
PCI information about the controller for the disk in the form of
a PCI device name. If we pulled this
disk and replaced it with a new one, both of those would stay the
same, since with a directly attached disk they're based on physical
topology and neither has changed. However, the devid
is clearly
based on the disks's identity information; it has the vendor name,
the 'product id', and the serial number (as returned by the disk
itself in response to SATA inquiry commands). This will be the same
more or less regardless of where the disk is connected to the system,
so ZFS (and anything else) can find the disk wherever it is.
(I believe that the 'id1,sd@
' portion of the devid
is simply
giving us a namespace for the rest of it. See 'prtconf -v
' for
another representation of all of this information and much more.)
Multipathed disks (such as the iSCSI disks on our fileservers) look somewhat different. For them, the
filesystem device name (and thus path
) looks like c5t<long
identifier>d0s0
, the physical path is /scsi
vhci/disk@g<long
identifier>, and the
devid_ is not particularly useful in finding
the specific physical disk because our iSCSI targets generate
synthetic disk 'serial numbers' based on their slot position (and
the target's hostname, which at least lets me see which target a
particular OmniOS-level multipathed disk is supposed to be coming
from). As it happens, I already know that OmniOS multipathing
identifies disks only by their device ids,
so all three names are functionally the same thing, just expressed
in different forms.
If you remove a disk entirely, all three of these names go away for both plain directly attached disks and multipath disks. If you replace a plain disk with a new or different one, the filesystem path and physical path will normally still work but the devid of the old disk is gone; ZFS can open the disk but will report that it has a missing or corrupt label. If you replace a multipathed disk with a new one and the true disk serial number is visible to OmniOS, all of the old names go away since they're all (partly) based on the disk's serial number, and ZFS will report the disk as missing entirely (often simply reporting it by GUID).
Sidebar: Which disk name ZFS uses when bringing up a pool
Which name or form of device identification ZFS uses is a bit
complicated. To simplify a complicated situation (see vdev_disk_open
in vdev_disk.c)
as best I can, the normal sequence is that ZFS starts out by trying
the filesystem path but verifying the devid. If this fails, it tries
the devid, the physical path, and finally the filesystem path again
(but without verifying the devid this time).
Since ZFS verifies the disk label's GUID and other details after opening the disk, there is no risk that finding a random disk this way (for example by the physical path) will confuse ZFS. It'll just cause ZFS to report things like 'missing or corrupt disk label' instead of 'missing device'.
Things I do and don't know about how ZFS brings pools up during boot
If you import a ZFS pool explicitly, through 'zpool import
', the
user-mode side of the process normally searches through all of the
available disks in order to find the component devices of the pool.
Because it does this explicit search, it will find pool devices
even if they've been shuffled around in a way that causes them to
be renamed, or even (I think) drastically transformed, for example
by being dd
'd to a new disk. This is pretty much what you'd expect,
since ZFS can't really read what the pool thinks its configuration
is until it assembles the pool. When it imports such a pool, I
believe that ZFS rewrites the information kept about where to
find each device so that it's correct for the current
state of your system.
This is not what happens when the system boots. To the best of
my knowledge, for non-root pools the ZFS kernel
module directly reads /etc/zfs/zpool.cache
during module
initialization and converts it into a series of in-memory pool
configurations for pools, which are all in an unactivated state.
At some point, magic things attempt to activate some or all of these
pools, which causes the kernel to attempt to open all of the devices
listed as part of the pool configuration and verify that they are
indeed part of the pool. The process of opening devices only uses
the names and other identification of the devices that's in the
pool configuration; however, one identification is a 'devid', which
for many devices is basically the model and serial number of the
disk. So I believe that under at least some circumstances the kernel
will still be able to find disks that have been shuffled around,
because it will basically seek out that model plus serial number
wherever it's (now) connected to the system.
(See vdev_disk_open
in vdev_disk.c
for the gory details,
but you also need to understand Illumos devids. The various device
information available for disks in a pool can be seen with 'zdb
-C <pool>
'.)
To the best of my knowledge, this in-kernel activation makes no
attempt to hunt around on other disks to complete the pool's
configuration the way that 'zpool import
' will. In theory, assuming
that finding disks by their devid works, this shouldn't matter most
or basically all of the time; if that disk is there at all, it
should be reporting its model and serial number and I think the
kernel will find it. But I don't know for sure. I also don't know
how the kernel acts if some disks take a while to show up, for
example iSCSI disks.
(I suspect that the kernel only makes one attempt at pool activation and doesn't retry things if more devices show up later. But this entire area is pretty opaque to me.)
These days you also have your root filesystems on a ZFS pool, the
root pool. There are definitely some special code paths that seem
to be invoked during boot for a ZFS root pool, but I don't have
enough knowledge of the Illumos boot time environment to understand
how they work and how they're different from the process of loading
and starting non-root pools. I used to hear that root pools were
more fragile if devices moved around and you might have to boot
from alternate media in order to explicitly 'zpool import
' and
'zpool export
' the root pool in order to reset its device names,
but that may be only folklore and superstition at this point.