Mostly getting redundant UEFI boot disks on modern Ubuntu (especially 24.04)

September 23, 2024

When I wrote about how our primary goal for mirrored (system) disks is increased redundancy, including being able to reboot the system after the primary disk failed, vowhite asked in a comment if there was any trick to getting this working with UEFI. The answer is sort of, and it's mostly the same as you want to do with BIOS MBR booting.

In the Ubuntu installer, when you set up redundant system disks it's long been the case that you wanted to explicitly tell the installer to use the second disk as an additional boot device (in addition to setting up a software RAID mirror of the root filesystem across both disks). In the BIOS MBR world, this installed GRUB bootblocks on the disk; in the UEFI world, this causes the installer to set up an extra EFI System Partition (ESP) on the second drive and populate it with the same sort of things as the ESP on the first drive.

(The 'first' and the 'second' drive are not necessarily what you think they are, since the Ubuntu installer doesn't always present drives to you in their enumeration order.)

I believe that this dates from Ubuntu 22.04, when Ubuntu seems to have added support for multi-disk UEFI. Ubuntu will mount one of these ESPs (the one it considers the 'first') on /boot/efi, and as part of multi-disk UEFI support it will also arrange to update the other ESP. You can see what other disk Ubuntu expects to find this ESP on by looking at the debconf selection 'grub-efi/install_devices'. For perfectly sensible reasons this will identify disks by their disk IDs (as found in /dev/disk/by-id), and it normally lists both ESPs.

All of this is great but it leaves you with two problems if the disk with your primary ESP fails. The first is the question of whether your system's BIOS will automatically boot off the second ESP. I believe that UEFI firmware will often do this, and you can specifically set this up with EFI boot entries through things like efibootmgr (also); possibly current Ubuntu installers do this for you automatically if it seems necessary.

The bigger problem is the /boot/efi mount. If the primary disk fails, a mounted /boot/efi will start having disk IO errors and then if the system reboots, Ubuntu will probably be unable to find and mount /boot/efi from the now gone or error-prone primary disk. If this is a significant concern, I think you need to make the /boot/efi mount 'nofail' in /etc/fstab (per fstab(5)). Energetic people might want to go further and make it either 'noauto' so that it's not even mounted normally, or perhaps mark it as a systemd automounted filesystem with 'x-systemd.automount' (per systemd.mount).

(The disclaimer is that I don't know how Ubuntu will react if /boot/efi isn't mounted at all or is a systemd automount mountpoint. I think that GRUB updates will cope with having it not mounted at all.)

If any disk with an ESP on it fails and has to be replaced, you have to recreate a new ESP on that disk and then, I believe, run 'dpkg-reconfigure grub-efi-amd64', which will ask you to select the ESPs you want to be automatically updated. You may then need to manually run '/usr/lib/grub/grub-multi-install --target=x86_64-efi', which will populate the new ESP (or it may be automatically run through the reconfigure). I'm not sure about this because we haven't had any UEFI system disks fail yet.

(The ESP is a vfat formatted filesystem, which can be set up with mkfs.vfat, and has specific requirements for its GUIDs and so on, which you'll have to set up by hand in the partitioning tool of your choice or perhaps automatically by copying the partitioning of the surviving system disk to your new disk.)

If it was the primary disk that failed, you will probably want to update /etc/fstab to get /boot/efi from a place that still exists (probably with 'nofail' and perhaps with 'noauto'). This might be somewhat easy to overlook if the primary disk fails without the system rebooting, at which point you'd get an unpleasant surprise on the next system reboot.

The general difference between UEFI and BIOS MBR booting for this is that in BIOS MBR booting, there's no /boot/efi to cause problems and running 'grub-install' against your replacement disk is a lot easier than creating and setting up the ESP. As I found out, a properly set up BIOS MBR system also 'knows' in debconf what devices you have GRUB installed on, and you'll need to update this (probably with 'dpkg-reconfigure grub-pc') when you replace a system disk.

(We've been able to avoid this so far because in Ubuntu 20.04 and 22.04, 'grub-install' isn't run during GRUB package updates for BIOS MBR systems so no errors actually show up. If we install any 24.04 systems with BIOS MBR booting and they have system disk failures, we'll have to remember to deal with it.)

(See also my entry on multi-disk UEFI in Ubuntu 22.04, which goes deeper into some details. That entry was written before I knew that a 'grub-*/install_devices' setting of a software RAID array was actually an error on Ubuntu's part, although I'd still like GRUB's UEFI and BIOS MBR scripts to support it.)


Comments on this page:

By vowhite at 2024-09-24 11:11:47:

Thanks. efibootmgr was a tool I wasn't really aware of, and after playing with it a bit I was able to add my second disk. Debian doesn't seem to auto-copy GRUB as Ubuntu does, but their wiki shows how to add a hook to do that. efibootmgr adds to the front of the boot order by default, which may be undesirable for a "backup" disk (although, if the aforementioned hook is working, it shouldn't matter), so it might be better to use -C rather than -c; and, either way, then set --bootorder manually.

So, the general summary:

  • ensure both disks have EFI partitions with the same contents
  • run "efibootmgr" to see the configuration
  • use efibootmgr's -C option to add the second disk, if necessary, with -l specifying the same loader path as the existing one, -d specifying the disk, and -L giving a display label
  • use efibootmgr -o to ensure both are at the head of the boot order

After rebooting, something (probably my Asus firmware) had appended a few unwanted items to the boot order, including a removable USB drive's MBR for some reason. Anyway, I didn't try unplugging a disk, but it looks like this will basically work.

By Ian Z aka nobrowser at 2024-09-24 22:13:34:

There's kernelstub as an "easier" wrapper around efibootmgr:

https://github.com/isantop/kernelstub

But, after having to look under the hood because it kept doing unexpected things (see below), it's really horrible code, and best avoided.

https://github.com/isantop/kernelstub/issues/40

By nanaya at 2024-09-26 00:43:48:

with zfsbootmenu I just copy the efi file to each partitions and leave them alone until the boot pool is upgraded. No need to permanently mount them or anything. It's been truly a joy over dealing with grub

By ValdikSS at 2024-09-29 19:53:01:

>The bigger problem is the /boot/efi mount. If the primary disk fails, a mounted /boot/efi will start having disk IO errors and then if the system reboots

You can have it on RAID1, and mount it as /dev/md.

You need to make a partition slightly larger than the underlying file system (I usually have 4 MB spare, although RAID uses not much than 64K), and create RAID with metadata v1.0 — this version of metadata places itself in the end of the partition, while the current v1.2 is lying in the beginning.

This way your UEFI would detect ESP as a regular FAT32 partition and won't know anything about RAID, while your OS would keep the bootloader on both disks in sync.

Written on 23 September 2024.
« Old (Unix) workstations and servers tended to boot in the same ways
Go and my realization about what I'll call the 'Promises' pattern »

Page tools: View Source, View Normal.
Search:
Login: Password:

Last modified: Mon Sep 23 22:44:59 2024
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.