Wandering Thoughts archives

2024-04-11

Getting the underlying disks of a Linux software RAID array

Due to the pre-beta Ubuntu 24.04 issue I found with grub updates on systems with software RAID root filesystems and BIOS MBR booting, for a while I thought we'd need something that rewrote a debconf key to change it from naming the software RAID of the root filesystem to naming the devices it was on. So I spent a bit of time working out how best to do that, which I'm going to write down for any future use.

At one level this question seems silly, because the devices are right there in /proc/mdstat (once we get which software RAID the root filesystem is mounted from). However, you have to parse them out and be careful to get it right, so we'd ideally like an easier way, which is to use lsblk:

# lsblk -n -p --list --output TYPE,NAME -s /dev/md0
raid1 /dev/md0
part  /dev/sda2
disk  /dev/sda
part  /dev/sdb2
disk  /dev/sdb

We want the 'disk' type devices. Having the basic /dev names is good enough for some purposes (for example, directly invoking grub-install), but we may want to use /dev/disk/by-id names in things like debconf keys for greater stability if our system has additional data disks and their 'sdX' names may get renumbered at some point.

To get the by-id names, you have two options, depending on how old your lsblk is. Sufficiently recent versions of lsblk support an 'ID-LINK' field, so you can use it to directly get the name you want (just add it as an output field in the lsblk invocation above). Otherwise, the easiest way to do this is with udevadm:

udevadm info -q symlink /dev/sda | fmt -1 | sort

Since there are a bunch of /dev/disk/by-id names, you'll need to decide which one you pick and which ones you exclude. For our systems, it looks like we'd exclude 'wnn-' and 'nvme-eui.' names, probably exclude any 'scsi-' name that was all hex digits, and then take the alphabetically first option. Since lsblk's 'ID-LINK' field basically does this sort of thing for you, it's the better option if you can use it.

Going from a software RAID to the EFI System Partitions (ESPs) on its component disks is possible but harder (and you may need to do this if the relevant debconf settings have gotten scrambled). Given a disk, lsblk can report all of the components of it and what their partition type is:

# lsblk --list --output FSTYPE,PARTTYPE,NAME -n -p /dev/nvme0n1
ext4                                                   /dev/md0
                                                       /dev/nvme0n1
vfat              c12a7328-f81f-11d2-ba4b-00a0c93ec93b /dev/nvme0n1p1
linux_raid_member 0fc63daf-8483-4772-8e79-3d69d8477de4 /dev/nvme0n1p2

If a disk has an ESP, it will be a 'vfat' filesystem with the partition GUID shown here, which is the one assigned to indicate an ESP. In many Linux environments you can skip checking for the GUID and simply assume that any 'vfat' filesystem on your servers is there because it's the ESP. If you see this partition GUID but lsblk doesn't say that this is a vfat filesystem, what you have is a potential ESP that was set up during partitioning but then never formatted as a (vfat) filesystem. To do this completely properly you need to mount these filesystems to see if they have the right contents, but here we'd just assume that a vfat filesystem with the right partition GUID had been set up properly by the installer (or by whoever did the disk replacement).

(A partition GUID of '21686148-6449-6e6f-744e-656564454649' is a BIOS boot partition, which is often present on modern installs that use BIOS MBR booting.)

linux/SoftwareRaidToItsDisks written at 22:19:07;


Page tools: See As Normal.
Search:
Login: Password:

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.