My failure to migrate my workstation from MBR booting to UEFI

February 9, 2018

I wrote earlier about how I planned to migrate my work Fedora system from MBT booting to UEFI booting once I moved to its new hardware, complete with a plan of what to do. Today I put that plan into action but unfortunately it didn't go well. I've now reverted back to MBR booting and plan to stay on it for at least the lifetime of this hardware (probably at least five years). Here is what happened, the various surprises I ran into, and what went wrong.

To start with, when I made my rushed switch to the new hardware I forgot to do one step I'd planned; I didn't save the output of efibootmgr -v from my scratch Fedora install. I don't think this made a real difference in the eventual outcome, but I would have at least liked to know. (I did 'save' the scratch /boot/efi contents in that I set the scratch install's disk aside and later retrieved /boot/efi from it.)

The initial steps to set up my /boot/efi on my real (U)EFI system partition went fine but led to my first surprise. It turns out that grub2-mkconfig won't create an EFI enabled grub.cfg and efibootmgr won't work at all unless your system was booted with UEFI. This creates an obvious but unfortunate chicken and egg situation when you're trying to transition to UEFI, so I created my (U)EFI grub.cfg by copying my existing MBR one and changing 'linux ...' and 'initrd ...' to 'linuxefi ...' and 'initrd ...'. I figured that I would have to manually set up a suitable UEFI boot entry in the Asus BIOS, as I had with my Dell XPS 13 laptop.

When I rebooted, I got my second surprise. Unlike Dell's laptop BIOS, the Asus BIOS does not let you configure UEFI boot entries by hand. Instead it automatically and magically hunts around your EFI system partitions to look for whatever plausible EFI boot things it can find, adds them as UEFI boot entries, and then generally tries to boot one. In my case what it tried to boot was what it labeled as 'RedHat Boot Manager', aka EFI/redhat/grub.efi, which was from a very left over grub-efi package from Fedora 18. Since I had not set up any sort of configuration for it, this did not go well (and I don't know if it'd have worked in general). I did eventually manage to get the BIOS to boot its 'Fedora' UEFI boot entry, which was booting EFI/fedora/shim.efi and using my grub.cfg.

(Why not shimx64.efi? I don't know. Both were present, but the Asus BIOS ignored shimx64.efi as it ignored several other EFI things that I believe were bootable.)

Booting my Fedora kernel through UEFI was visibly different. Based on the presence of a row of penguins at the top of the screen, it appears to have come up in framebuffer mode as opposed to the basic text mode that both Grub and it used in my MBR boot setup (based on kernel-parameters.txt, these penguins can be turned off with the kernel command line argument logo.nologo, which I plan to use in the future). With the kernel booted via UEFI, I could use efibootmgr to dump the setup and run grub2-mkconfig to create a brand new proper EFI-based grub.cfg.

Unfortunately, using that grub.cfg caused my system to hang on boot just after the kernel printed messages about initializing the amdgpu driver (if left alone, dracut eventually timed out and dropped me into a RAM-based rescue environment). At the time I was already aggravated and naturally suspicious of the amdgpu driver (because I'd already had problems with it), so I flailed around commenting out more and more weird grub bits of the generated grub.cfg to make it look like my old one, with no success. At a much greater distance from the whole situation, I've now run kdiff3 against the two grub.cfgs and have discovered that grub2-mkconfig probably left out vital kernel command line arguments that tell the boot environment how to find my actual root filesystem. This would explain why the boot sat around for a while before timing out; it was waiting in the hopes that something with the right UUID would magically show up, which would have let it continue the boot. I don't know why a modern Fedora system doesn't print a clear message about 'your root filesystem is missing', but either it doesn't or I failed to see it.

(I also don't know why grub2-mkconfig left out the magic rd.md.uuid arguments (also) that tell dracut what software RAID devices to assemble, but it did.)

Next, I theorized wildly that booting with shim.efi instead of shimx64.efi was part of my problems, so I flailed around with efibootmgr and was eventually successful at creating UEFI boot entries that used shimx64.efi and booting with them (using my grub.cfg). However this failed to fix my problems with the generated grub.cfg.

At this point I gave up on UEFI booting because there were too many things going wrong and that I didn't understand. I moved /boot/efi back into my root filesystem (which includes /boot as a whole) and completely erased the EFI system partitions on both of my system SSDs by mkfsing them as ext4 filesystems. The Asus BIOS then reverted back to entirely MBR booting and has probably magically erased all of those UEFI boot entries that it magically set up.

(I didn't want to leave the EFI system partitions as anything that the BIOS could understand because I was nervous about the BIOS doing something bad if I had a valid but empty EFI system partition, or an EFI system partition with contents but without any boot loaders.)

Even after working out what was probably wrong, I've decided that I'm going to leave things this way. This system only boots Linux and I don't particularly care about Secure Boot, so as far as I can see UEFI is inferior to MBR booting in practice (for example, I can't mirror /boot/efi across both my system SSDs; I'd have to set up some other mechanism to keep the backup EFI system partition in sync with the primary version). MBR booting involves less BIOS magic, works better for the case that I care about (which is not doing weird graphics things for as long as possible during boot), works perfectly today, and gives me valuable features that I would lose with (U)EFI. All I have to remember to do is to update my MBR bootblocks periodically.

I've taken some lessons learned from this whole episode, ones that I hope to remember to apply the next time around, but they don't go in this entry (partly because it's already long enough). The big meta-lesson is that a lot of things go wrong when I'm rushed, and also if a machine is my main workstation I'm always going to be rushed when dealing with any issues with it.

Written on 09 February 2018.
« What the Linux kernel's messages about segfaulting programs mean on 64-bit x86
The interesting error codes from Linux program segfault kernel messages »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Feb 9 01:19:26 2018
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.