2024-10-10
Linux software RAID and changing your system's hostname
Today, I changed the hostname of an old Linux system (for reasons) and rebooted it. To my surprise, the system did not come up afterward, but instead got stuck in systemd's emergency mode for a chain of reasons that boiled down to there being no '/dev/md0'. Changing the hostname back to its old value and rebooting the system again caused it to come up fine. After some diagnostic work, I believe I understand what happened and how to work around it if it affects us in the future.
One of the issues that Linux RAID auto-assembly faces is the question of what it should call the assembled array. People want their RAID array names to stay fixed (so /dev/md0 is always /dev/md0), and so the name is part of the RAID array's metadata, but at the same time you have the problem of what happens if you connect up two sets of disks that both want to be 'md0'. Part of the answer is mdadm.conf, which can give arrays names based on their UUID. If your mdadm.conf says 'ARRAY /dev/md10 ... UUID=<x>' and mdadm finds a matching array, then in theory it can be confident you want that one to be /dev/md10 and it should rename anything else that claims to be /dev/md10.
However, suppose that your array is not specified in mdadm.conf. In that case, another software RAID array feature kicks in, which is that arrays can have a 'home host'. If the array is on its home host, it will get the name it claims it has, such as '/dev/md0'. Otherwise, well, let me quote from the 'Auto-Assembly' section of the mdadm manual page:
[...] Arrays which do not obviously belong to this host are given names that are expected not to conflict with anything local, and are started "read-auto" so that nothing is written to any device until the array is written to. i.e. automatic resync etc is delayed.
As is covered in the documentation for the '--homehost' option in the mdadm manual page, on modern 1.x superblock formats the home host is embedded into the name of the RAID array. You can see this with 'mdadm --detail', which can report things like:
Name : ubuntu-server:0Name : <host>:25 (local to host <host>)
Both of these have a 'home host'; in the first case the home host
is 'ubuntu-server', and in the second case the home host is the
current machine's hostname. Well, its 'hostname' as far as mdadm
is concerned, which can be set in part through mdadm.conf's
'HOMEHOST
' directive. Let me repeat that, mdadm by default
identifies home hosts by their hostname, not by any more stable
identifier.
So if you change a machine's hostname and you have arrays not in your mdadm.conf with home hosts, their /dev/mdN device names will get changed when you reboot. This is what happened to me, as we hadn't added the array to the machine's mdadm.conf.
(Contrary to some ways to read the mdadm manual page, arrays are not renamed if they're in mdadm.conf. Otherwise we'd have noticed this a long time ago on our Ubuntu servers, where all of the arrays created in the installer have the home host of 'ubuntu-server', which is obviously not any machine's actual hostname.)
Setting the home host value to the machine's current hostname when an array is created is the mdadm default behavior, although you can turn this off with the right mdadm.conf HOMEHOST setting. You can also tell mdadm to consider all arrays to be on their home host, regardless of the home host embedded into their names.
(The latter is 'HOMEHOST <ignore>', the former by itself is 'HOMEHOST <none>', and it's currently valid to combine them both as 'HOMEHOST <ignore> <none>', although this isn't quite documented in the manual page.)
PS: Some uses of software RAID arrays won't care about their names. For example, if they're used for filesystems, and your /etc/fstab specifies the device of the filesystem using 'UUID=' or with '/dev/disk/by-id/md-uuid-...' (which seems to be common on Ubuntu).
PPS: For 1.x superblocks, the array name as a whole can only be 32 characters long, which obviously limits how long of a home host name you can have, especially since you need a ':' in there as well and an array number or the like. If you create a RAID array on a system with a too long hostname, the name of the resulting array will not be in the '<host>:<name>' format that creates an array with a home host; instead, mdadm will set the name of the RAID to the base name (either whatever name you specified, or the N of the 'mdN' device you told it to use).
(It turns out that I managed to do this by accident on my home desktop, which has a long fully qualified name, by creating an array with the name 'ssd root'. The combination turns out to be 33 characters long, so the RAID array just got the name 'ssd root' instead of '<host>:ssd root'.)