How ZFS on Linux names disks in ZFS pools

August 18, 2017

Yesterday I covered how on Illumos and Solaris, disks in ZFS pools have three names; the filesystem path, the 'physical path' (a PCI device name, similar to the information that lspci gives), and a 'devid', with the vendor, model name, and serial number of the disk. While these are Solaris concepts, Linux has similar things and you could at least mock up equivalents of them in the kernel.

ZFS on Linux doesn't try to do this. Instead of having three names, it has only one:

# zdb -C vmware2
MOS Configuration:
[...]
  children[0]:
    type: 'disk'
    id: 0
    guid: 8206543908042244108
    path: '/dev/disk/by-id/ata-ST500DM002-1BC142_Z2AA6A4E-part1'
    whole_disk: 0
[...]

ZoL stores only the filesystem path to the device, using whatever path that you told it to use. To get the equivalent of Solaris devids and physical paths, you need to use the right sort of filesystem path. Solaris devids roughly map to /dev/disk/by-id and physical paths map to /dev/disk/by-path (and there isn't really an equivalent to Solaris /dev/dsk names, which are more stable than Linux /dev/sd* names).

The comment about this in vdev_disk_open in vdev_disk.c discusses this in some detail, and it's worth repeating it in full:

Devices are always opened by the path provided at configuration time. This means that if the provided path is a udev by-id path then drives may be recabled without an issue. If the provided path is a udev by-path path, then the physical location information will be preserved. This can be critical for more complicated configurations where drives are located in specific physical locations to maximize the systems tolerance to component failure. Alternatively, you can provide your own udev rule to flexibly map the drives as you see fit. It is not advised that you use the /dev/[hd]d devices which may be reordered due to probing order. Devices in the wrong locations will be detected by the higher level vdev validation.

(It's a shame that this information exists only as a comment in a source file that most people will never look at. It should probably be in large type in the ZFS on Linux zpool manpage.)

This means that with ZFS on Linux, you get only one try for the disk to be there; there's no fallback the way there is on Illumos for ordinary disks. If you've pulled an old disk and put in a new one and you use by-id names, ZoL will see the old disk as completely missing. If you use by-path names and you move a disk around, ZoL will not wind up finding the disk in its new location the way ZFS on Illumos probably would.

(The net effect of this is that with ZFS on Linux you should normally see a lot more 'missing device' errors and a lot fewer 'corrupt or missing disk label' errors than you would in the same circumstances on Illumos or Solaris.)

At this point, you might wonder how you change what sort of of name ZFS on Linux is using for disks in your pool(s). Although I haven't done this myself, my understanding is that you export the pool then import it again using the -d option to zpool import. With -d, the import process will end up finding the disks for the pool using the type of names that you want, and then actually importing the pool will rewrite the saved path data in the pool's configuration (and /etc/zfs/zpool.cache) to use these new names as a side effect.

(I'm not entirely sure how I feel about this with ZFS on Linux. I think I can see some relatively obscure failure modes where no form of disk naming works as well as things do in Illumos. On the other hand, in practice using /dev/disk/by-id names is probably at least as good an experience as Illumos provides, and the disk names are always clear and explicit. What you see is what you get, somewhat unlike Illumos.)

Written on 18 August 2017.
« The three different names ZFS stores for each vdev disk (on Illumos)
Subnets and early Unix implementations of TCP/IP networking »

Page tools: View Source.
Search:
Login: Password:

Last modified: Fri Aug 18 02:35:11 2017
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.