Wandering Thoughts archives

2024-09-23

Mostly getting redundant UEFI boot disks on modern Ubuntu (especially 24.04)

When I wrote about how our primary goal for mirrored (system) disks is increased redundancy, including being able to reboot the system after the primary disk failed, vowhite asked in a comment if there was any trick to getting this working with UEFI. The answer is sort of, and it's mostly the same as you want to do with BIOS MBR booting.

In the Ubuntu installer, when you set up redundant system disks it's long been the case that you wanted to explicitly tell the installer to use the second disk as an additional boot device (in addition to setting up a software RAID mirror of the root filesystem across both disks). In the BIOS MBR world, this installed GRUB bootblocks on the disk; in the UEFI world, this causes the installer to set up an extra EFI System Partition (ESP) on the second drive and populate it with the same sort of things as the ESP on the first drive.

(The 'first' and the 'second' drive are not necessarily what you think they are, since the Ubuntu installer doesn't always present drives to you in their enumeration order.)

I believe that this dates from Ubuntu 22.04, when Ubuntu seems to have added support for multi-disk UEFI. Ubuntu will mount one of these ESPs (the one it considers the 'first') on /boot/efi, and as part of multi-disk UEFI support it will also arrange to update the other ESP. You can see what other disk Ubuntu expects to find this ESP on by looking at the debconf selection 'grub-efi/install_devices'. For perfectly sensible reasons this will identify disks by their disk IDs (as found in /dev/disk/by-id), and it normally lists both ESPs.

All of this is great but it leaves you with two problems if the disk with your primary ESP fails. The first is the question of whether your system's BIOS will automatically boot off the second ESP. I believe that UEFI firmware will often do this, and you can specifically set this up with EFI boot entries through things like efibootmgr (also); possibly current Ubuntu installers do this for you automatically if it seems necessary.

The bigger problem is the /boot/efi mount. If the primary disk fails, a mounted /boot/efi will start having disk IO errors and then if the system reboots, Ubuntu will probably be unable to find and mount /boot/efi from the now gone or error-prone primary disk. If this is a significant concern, I think you need to make the /boot/efi mount 'nofail' in /etc/fstab (per fstab(5)). Energetic people might want to go further and make it either 'noauto' so that it's not even mounted normally, or perhaps mark it as a systemd automounted filesystem with 'x-systemd.automount' (per systemd.mount).

(The disclaimer is that I don't know how Ubuntu will react if /boot/efi isn't mounted at all or is a systemd automount mountpoint. I think that GRUB updates will cope with having it not mounted at all.)

If any disk with an ESP on it fails and has to be replaced, you have to recreate a new ESP on that disk and then, I believe, run 'dpkg-reconfigure grub-efi-amd64', which will ask you to select the ESPs you want to be automatically updated. You may then need to manually run '/usr/lib/grub/grub-multi-install --target=x86_64-efi', which will populate the new ESP (or it may be automatically run through the reconfigure). I'm not sure about this because we haven't had any UEFI system disks fail yet.

(The ESP is a vfat formatted filesystem, which can be set up with mkfs.vfat, and has specific requirements for its GUIDs and so on, which you'll have to set up by hand in the partitioning tool of your choice or perhaps automatically by copying the partitioning of the surviving system disk to your new disk.)

If it was the primary disk that failed, you will probably want to update /etc/fstab to get /boot/efi from a place that still exists (probably with 'nofail' and perhaps with 'noauto'). This might be somewhat easy to overlook if the primary disk fails without the system rebooting, at which point you'd get an unpleasant surprise on the next system reboot.

The general difference between UEFI and BIOS MBR booting for this is that in BIOS MBR booting, there's no /boot/efi to cause problems and running 'grub-install' against your replacement disk is a lot easier than creating and setting up the ESP. As I found out, a properly set up BIOS MBR system also 'knows' in debconf what devices you have GRUB installed on, and you'll need to update this (probably with 'dpkg-reconfigure grub-pc') when you replace a system disk.

(We've been able to avoid this so far because in Ubuntu 20.04 and 22.04, 'grub-install' isn't run during GRUB package updates for BIOS MBR systems so no errors actually show up. If we install any 24.04 systems with BIOS MBR booting and they have system disk failures, we'll have to remember to deal with it.)

(See also my entry on multi-disk UEFI in Ubuntu 22.04, which goes deeper into some details. That entry was written before I knew that a 'grub-*/install_devices' setting of a software RAID array was actually an error on Ubuntu's part, although I'd still like GRUB's UEFI and BIOS MBR scripts to support it.)

linux/UbuntuUEFIRedundantBootDisks written at 22:44:59;


Page tools: See As Normal.
Search:
Login: Password:

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.