My failure to migrate my workstation from MBR booting to UEFI
I wrote earlier about how I planned to migrate my work Fedora system from MBT booting to UEFI booting once I moved to its new hardware, complete with a plan of what to do. Today I put that plan into action but unfortunately it didn't go well. I've now reverted back to MBR booting and plan to stay on it for at least the lifetime of this hardware (probably at least five years). Here is what happened, the various surprises I ran into, and what went wrong.
To start with, when I made my rushed switch to the new hardware I forgot to do one step I'd planned; I
didn't save the output of efibootmgr -v
from my scratch Fedora
install. I don't think this made a real difference in the eventual
outcome, but I would have at least liked to know. (I did 'save' the
scratch /boot/efi
contents in that I set the scratch install's
disk aside and later retrieved /boot/efi
from it.)
The initial steps to set up my /boot/efi
on my real (U)EFI
system partition
went fine but led to my first surprise. It turns out that
grub2-mkconfig
won't create an EFI enabled grub.cfg
and
efibootmgr
won't work at all unless your system was booted with
UEFI. This creates an obvious but unfortunate chicken and egg
situation when you're trying to transition to UEFI, so I created
my (U)EFI grub.cfg
by copying my existing MBR one and changing
'linux ...
' and 'initrd ...
' to 'linuxefi ...
' and 'initrd
...
'. I figured that I would have to manually set up a suitable
UEFI boot entry in the Asus BIOS, as I had with my Dell XPS 13
laptop.
When I rebooted, I got my second surprise. Unlike Dell's laptop
BIOS, the Asus BIOS does not let you configure UEFI boot entries by hand. Instead it automatically and
magically hunts around your EFI system partitions to look for
whatever plausible EFI boot things it can find, adds them as UEFI
boot entries, and then generally tries to boot one. In my case what
it tried to boot was what it labeled as 'RedHat Boot Manager', aka
EFI/redhat/grub.efi
, which was from a very left over grub-efi
package from Fedora 18. Since I had not set up any sort of configuration
for it, this did not go well (and I don't know if it'd have worked
in general). I did eventually manage to get the BIOS to boot its
'Fedora' UEFI boot entry, which was booting EFI/fedora/shim.efi
and using my grub.cfg
.
(Why not shimx64.efi
? I don't know. Both were present, but the
Asus BIOS ignored shimx64.efi as it ignored several other EFI things
that I believe were bootable.)
Booting my Fedora kernel through UEFI was visibly different. Based
on the presence of a row of penguins at the top of the screen, it
appears to have come up in framebuffer mode as opposed to the basic
text mode that both Grub and it used in my MBR boot setup (based
on kernel-parameters.txt,
these penguins can be turned off with the kernel command line
argument logo.nologo
, which I plan to use in the future). With
the kernel booted via UEFI, I could use efibootmgr
to dump the
setup and run grub2-mkconfig
to create a brand new proper EFI-based
grub.cfg
.
Unfortunately, using that grub.cfg
caused my system to hang on
boot just after the kernel printed messages about initializing the
amdgpu
driver (if left alone, dracut eventually timed out and
dropped me into a RAM-based rescue environment). At the time I was
already aggravated and naturally suspicious of the amdgpu driver
(because I'd already had problems with it),
so I flailed around commenting out more and more weird grub bits
of the generated grub.cfg
to make it look like my old one, with
no success. At a much greater distance from the whole situation,
I've now run kdiff3
against the two grub.cfg
s and have discovered
that grub2-mkconfig
probably left out vital kernel command line
arguments that tell the boot environment how to find my actual root
filesystem. This would explain why the boot sat around for a while
before timing out; it was waiting in the hopes that something with
the right UUID would magically show up, which would have let it
continue the boot. I don't know why a modern Fedora system doesn't
print a clear message about 'your root filesystem is missing', but
either it doesn't or I failed to see it.
(I also don't know why grub2-mkconfig
left out the magic
rd.md.uuid
arguments (also) that
tell dracut
what software RAID devices to assemble, but it did.)
Next, I theorized wildly that booting with shim.efi
instead of
shimx64.efi
was part of my problems, so I flailed around with
efibootmgr
and was eventually successful at creating UEFI boot
entries that used shimx64.efi
and booting with them (using my
grub.cfg
). However this failed to fix my problems with the generated
grub.cfg
.
At this point I gave up on UEFI booting because there were too many
things going wrong and that I didn't understand. I moved /boot/efi
back into my root filesystem (which includes /boot
as a whole)
and completely erased the EFI system partitions on both of my system
SSDs by mkfs
ing them as ext4 filesystems. The Asus BIOS then
reverted back to entirely MBR booting and has probably magically
erased all of those UEFI boot entries that it magically set up.
(I didn't want to leave the EFI system partitions as anything that the BIOS could understand because I was nervous about the BIOS doing something bad if I had a valid but empty EFI system partition, or an EFI system partition with contents but without any boot loaders.)
Even after working out what was probably wrong, I've decided that
I'm going to leave things this way. This system only boots Linux
and I don't particularly care about Secure Boot, so as far as I can
see UEFI is inferior to MBR booting in practice (for example, I
can't mirror /boot/efi
across both my system SSDs; I'd have to
set up some other mechanism to keep the backup EFI system partition
in sync with the primary version). MBR booting involves less BIOS
magic, works better for the case that I care about (which is not
doing weird graphics things for as long as possible during boot),
works perfectly today, and gives me valuable features that I would
lose with (U)EFI. All I have to remember to do is to update my
MBR bootblocks periodically.
I've taken some lessons learned from this whole episode, ones that I hope to remember to apply the next time around, but they don't go in this entry (partly because it's already long enough). The big meta-lesson is that a lot of things go wrong when I'm rushed, and also if a machine is my main workstation I'm always going to be rushed when dealing with any issues with it.
|
|