How ZFS on Linux names disks in ZFS pools
Yesterday I covered how on Illumos and Solaris, disks in ZFS pools
have three names; the filesystem path,
the 'physical path' (a PCI device name, similar to the information
that lspci
gives), and a 'devid', with the vendor, model name,
and serial number of the disk. While these are Solaris concepts,
Linux has similar things and you could at least mock up equivalents
of them in the kernel.
ZFS on Linux doesn't try to do this. Instead of having three names, it has only one:
# zdb -C vmware2
MOS Configuration: [...] children[0]: type: 'disk' id: 0 guid: 8206543908042244108 path: '/dev/disk/by-id/ata-ST500DM002-1BC142_Z2AA6A4E-part1' whole_disk: 0 [...]
ZoL stores only the filesystem path to the device, using whatever
path that you told it to use. To get the equivalent of Solaris
devids and physical paths, you need to use the right sort of
filesystem path. Solaris devids roughly map to /dev/disk/by-id
and physical paths map to /dev/disk/by-path
(and there isn't
really an equivalent to Solaris /dev/dsk
names, which are more
stable than Linux /dev/sd*
names).
The comment about this in vdev_disk_open
in vdev_disk.c
discusses this in some detail, and it's worth repeating it in full:
Devices are always opened by the path provided at configuration time. This means that if the provided path is a udev by-id path then drives may be recabled without an issue. If the provided path is a udev by-path path, then the physical location information will be preserved. This can be critical for more complicated configurations where drives are located in specific physical locations to maximize the systems tolerance to component failure. Alternatively, you can provide your own udev rule to flexibly map the drives as you see fit. It is not advised that you use the /dev/[hd]d devices which may be reordered due to probing order. Devices in the wrong locations will be detected by the higher level vdev validation.
(It's a shame that this information exists only as a comment in a
source file that most people will never look at. It should probably
be in large type in the ZFS on Linux zpool
manpage.)
This means that with ZFS on Linux, you get only one try for the
disk to be there; there's no fallback the way there is on Illumos
for ordinary disks. If you've pulled an old disk and put in a new
one and you use by-id
names, ZoL will see the old disk as completely
missing. If you use by-path
names and you move a disk around, ZoL
will not wind up finding the disk in its new location the way ZFS
on Illumos probably would.
(The net effect of this is that with ZFS on Linux you should normally see a lot more 'missing device' errors and a lot fewer 'corrupt or missing disk label' errors than you would in the same circumstances on Illumos or Solaris.)
At this point, you might wonder how you change what sort of of name
ZFS on Linux is using for disks in your pool(s). Although I haven't
done this myself, my understanding is that you export the pool then
import it again using the -d
option to zpool import
. With -d
,
the import process will end up finding the disks for the pool using
the type of names that you want, and then actually importing the
pool will rewrite the saved path
data in the pool's configuration
(and /etc/zfs/zpool.cache
)
to use these new names as a side effect.
(I'm not entirely sure how I feel about this with ZFS on Linux. I
think I can see some relatively obscure failure modes where no form
of disk naming works as well as things do in Illumos. On the other
hand, in practice using /dev/disk/by-id
names is probably at least
as good an experience as Illumos provides, and the disk names are
always clear and explicit. What you see is what you get, somewhat
unlike Illumos.)
|
|