Wandering Thoughts archives

2017-08-16

The three different names ZFS stores for each vdev disk (on Illumos)

I sort of mentioned yesterday that ZFS keeps information on several different ways of identifying disks in pools. To be specific, it keeps three different names or ways of identifying each disk. You can see this with 'zdb -C' on a pool, so here's a representative sample:

# zdb -C rpool
MOS Configuration:
[...]
  children[0]:
    type: 'disk'
    id: 0
    guid: 15557853432972548123
    path: '/dev/dsk/c3t0d0s0'
    devid: 'id1,sd@SATA_____INTEL_SSDSC2BB08__BTWL4114016X080KGN/a'
    phys_path: '/pci@0,0/pci15d9,714@1f,2/disk@0,0:a'
    [...]

The guid is ZFS's internal identifier for the disk, and is stored on the disk itself as part of the disk label. Since you have to find the disk to read it, it's not something that ZFS uses to find disks, although it is part of verifying that ZFS has found the right one. The three actual names for the disk are reported here as path, devid aka 'device id', and phys_path aka 'physical path'.

The path is straightforward; it's the filesystem path to the device, which here is a conventional OmniOS (Illumos, Solaris) cNtNdNsN name typical of a plain, non-multipathed disk. As this is a directly attached SATA disk, the phys_path shows us the PCI information about the controller for the disk in the form of a PCI device name. If we pulled this disk and replaced it with a new one, both of those would stay the same, since with a directly attached disk they're based on physical topology and neither has changed. However, the devid is clearly based on the disks's identity information; it has the vendor name, the 'product id', and the serial number (as returned by the disk itself in response to SATA inquiry commands). This will be the same more or less regardless of where the disk is connected to the system, so ZFS (and anything else) can find the disk wherever it is.

(I believe that the 'id1,sd@' portion of the devid is simply giving us a namespace for the rest of it. See 'prtconf -v' for another representation of all of this information and much more.)

Multipathed disks (such as the iSCSI disks on our fileservers) look somewhat different. For them, the filesystem device name (and thus path) looks like c5t<long identifier>d0s0, the physical path is /scsivhci/disk@g<long identifier>, and the devid_ is not particularly useful in finding the specific physical disk because our iSCSI targets generate synthetic disk 'serial numbers' based on their slot position (and the target's hostname, which at least lets me see which target a particular OmniOS-level multipathed disk is supposed to be coming from). As it happens, I already know that OmniOS multipathing identifies disks only by their device ids, so all three names are functionally the same thing, just expressed in different forms.

If you remove a disk entirely, all three of these names go away for both plain directly attached disks and multipath disks. If you replace a plain disk with a new or different one, the filesystem path and physical path will normally still work but the devid of the old disk is gone; ZFS can open the disk but will report that it has a missing or corrupt label. If you replace a multipathed disk with a new one and the true disk serial number is visible to OmniOS, all of the old names go away since they're all (partly) based on the disk's serial number, and ZFS will report the disk as missing entirely (often simply reporting it by GUID).

Sidebar: Which disk name ZFS uses when bringing up a pool

Which name or form of device identification ZFS uses is a bit complicated. To simplify a complicated situation (see vdev_disk_open in vdev_disk.c) as best I can, the normal sequence is that ZFS starts out by trying the filesystem path but verifying the devid. If this fails, it tries the devid, the physical path, and finally the filesystem path again (but without verifying the devid this time).

Since ZFS verifies the disk label's GUID and other details after opening the disk, there is no risk that finding a random disk this way (for example by the physical path) will confuse ZFS. It'll just cause ZFS to report things like 'missing or corrupt disk label' instead of 'missing device'.

ZFSDiskNames written at 23:47:46; Add Comment

Things I do and don't know about how ZFS brings pools up during boot

If you import a ZFS pool explicitly, through 'zpool import', the user-mode side of the process normally searches through all of the available disks in order to find the component devices of the pool. Because it does this explicit search, it will find pool devices even if they've been shuffled around in a way that causes them to be renamed, or even (I think) drastically transformed, for example by being dd'd to a new disk. This is pretty much what you'd expect, since ZFS can't really read what the pool thinks its configuration is until it assembles the pool. When it imports such a pool, I believe that ZFS rewrites the information kept about where to find each device so that it's correct for the current state of your system.

This is not what happens when the system boots. To the best of my knowledge, for non-root pools the ZFS kernel module directly reads /etc/zfs/zpool.cache during module initialization and converts it into a series of in-memory pool configurations for pools, which are all in an unactivated state. At some point, magic things attempt to activate some or all of these pools, which causes the kernel to attempt to open all of the devices listed as part of the pool configuration and verify that they are indeed part of the pool. The process of opening devices only uses the names and other identification of the devices that's in the pool configuration; however, one identification is a 'devid', which for many devices is basically the model and serial number of the disk. So I believe that under at least some circumstances the kernel will still be able to find disks that have been shuffled around, because it will basically seek out that model plus serial number wherever it's (now) connected to the system.

(See vdev_disk_open in vdev_disk.c for the gory details, but you also need to understand Illumos devids. The various device information available for disks in a pool can be seen with 'zdb -C <pool>'.)

To the best of my knowledge, this in-kernel activation makes no attempt to hunt around on other disks to complete the pool's configuration the way that 'zpool import' will. In theory, assuming that finding disks by their devid works, this shouldn't matter most or basically all of the time; if that disk is there at all, it should be reporting its model and serial number and I think the kernel will find it. But I don't know for sure. I also don't know how the kernel acts if some disks take a while to show up, for example iSCSI disks.

(I suspect that the kernel only makes one attempt at pool activation and doesn't retry things if more devices show up later. But this entire area is pretty opaque to me.)

These days you also have your root filesystems on a ZFS pool, the root pool. There are definitely some special code paths that seem to be invoked during boot for a ZFS root pool, but I don't have enough knowledge of the Illumos boot time environment to understand how they work and how they're different from the process of loading and starting non-root pools. I used to hear that root pools were more fragile if devices moved around and you might have to boot from alternate media in order to explicitly 'zpool import' and 'zpool export' the root pool in order to reset its device names, but that may be only folklore and superstition at this point.

ZFSPoolBootUnknowns written at 00:36:46; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.