An interesting (and alarming) Grub2 error and its cause
I upgraded my office workstation from Fedora 23 to Fedora 24 today,
following my usual procedure of doing a live upgrade with
Everything went smoothly, which is normal, and it was pretty fast,
which isn't my normal experience but was probably because my root
filesystem is now on SSDs. After the updates
finished, I ran the
grub2-install command that you're instructed
to do and rebooted. My machine made it into Grub's menu but trying
to start any kernel immediately halted with an error about the
grub_efi_secure_boot not being found (as in this Fedora
24 bug report
or this old Ubuntu one).
This could politely be called somewhat alarming. Since it seemed to involve (U)EFI booting in some way, I went through the BIOS settings for my motherboard to see if I could turn that off and force a pure BIOS boot to make things work. Naturally I wound up looking through the boot options screen, at which point I noticed that the boot order looked a little odd. The BIOS's boot list didn't have enough room to display full names for drives, but the first and second drives had names that started with 'ST31000', and things called 'Samsung ...' were way down the list at the bottom.
At this point the penny dropped: my BIOS was still booting from my
older hard drives, from before I'd moved the root filesystem to the
SSDs. The SSDs were definitely considered
sdb by Linux
and they're on the first two SATA links, but the BIOS didn't care;
as far as booting went, it was sticking to its old disk ordering.
When I'd updated the Grub2 boot blocks with
of course updated the SSD boot blocks because that's what I thought
I was booting from; I hadn't touched the HD boot blocks. As a result
the old Fedora 23 Grub boot blocks were trying to load Fedora 24
Grub modules, which apparently doesn't work very well
and is a classic cause of these Grub 'undefined symbol' errors.
Once I realized this the fix was pleasantly simple; all I had to do was put the SSDs in their rightful place at the top of the (disk) boot priority list. Looking at the dates, this is the first Fedora version upgrade I've done since I added the SSDs, which explains why I didn't see it before now.
There's an argument that the BIOS's behavior here is sensible. If I'm correct about what's going on, it has essentially adopted a 'persistent boot order' in the same way that Linux (and other Unixes) are increasingly adopting persistent device names. I can certainly see people being very surprised if they add an extra SSD and suddenly their system fails to boot or boots oddly because the SSD is on a channel that the BIOS enumerates first. However, it's at least surprising for someone like me; I'm used to BIOSes cheerfully renumbering everything just because you stuck something into a previously unused SATA channel. A BIOS that doesn't do that for boot ordering is a bit novel.
(This may be especially likely on motherboards with a mix of 6G and 3G SATA ports. You probably want the 6G SATA ports enumerated first, and even if HDs live there for now, they're going to wind up being used for SSDs sooner or later.)
In the process of writing this entry I've also discovered that while
I moved my root filesystem over to the SSDs, I seem to never have
/boot; it's still a mirrored partition on the HDs. I'm not
sure if this was something I deliberately planned, if I was going
/boot later but forgot, or if I just plain overlooked the
issue. I have some notes from my transition planning, but they're
silent on this.
/boot is still on the HDs, I'm now uncertain both about
how the BIOS is actually numbering my drives and how Grub2 is finding
/boot. Maybe the Grub boot blocks (technically the core image)
have a hard-coded UUID for
/boot instead of looking at specific