An interesting (and alarming) Grub2 error and its cause

July 19, 2016

I upgraded my office workstation from Fedora 23 to Fedora 24 today, following my usual procedure of doing a live upgrade with dnf. Everything went smoothly, which is normal, and it was pretty fast, which isn't my normal experience but was probably because my root filesystem is now on SSDs. After the updates finished, I ran the grub2-install command that you're instructed to do and rebooted. My machine made it into Grub's menu but trying to start any kernel immediately halted with an error about the symbol grub_efi_secure_boot not being found (as in this Fedora 24 bug report or this old Ubuntu one).

This could politely be called somewhat alarming. Since it seemed to involve (U)EFI booting in some way, I went through the BIOS settings for my motherboard to see if I could turn that off and force a pure BIOS boot to make things work. Naturally I wound up looking through the boot options screen, at which point I noticed that the boot order looked a little odd. The BIOS's boot list didn't have enough room to display full names for drives, but the first and second drives had names that started with 'ST31000', and things called 'Samsung ...' were way down the list at the bottom.

At this point the penny dropped: my BIOS was still booting from my older hard drives, from before I'd moved the root filesystem to the SSDs. The SSDs were definitely considered sda and sdb by Linux and they're on the first two SATA links, but the BIOS didn't care; as far as booting went, it was sticking to its old disk ordering. When I'd updated the Grub2 boot blocks with grub2-install, I'd of course updated the SSD boot blocks because that's what I thought I was booting from; I hadn't touched the HD boot blocks. As a result the old Fedora 23 Grub boot blocks were trying to load Fedora 24 Grub modules, which apparently doesn't work very well and is a classic cause of these Grub 'undefined symbol' errors.

Once I realized this the fix was pleasantly simple; all I had to do was put the SSDs in their rightful place at the top of the (disk) boot priority list. Looking at the dates, this is the first Fedora version upgrade I've done since I added the SSDs, which explains why I didn't see it before now.

There's an argument that the BIOS's behavior here is sensible. If I'm correct about what's going on, it has essentially adopted a 'persistent boot order' in the same way that Linux (and other Unixes) are increasingly adopting persistent device names. I can certainly see people being very surprised if they add an extra SSD and suddenly their system fails to boot or boots oddly because the SSD is on a channel that the BIOS enumerates first. However, it's at least surprising for someone like me; I'm used to BIOSes cheerfully renumbering everything just because you stuck something into a previously unused SATA channel. A BIOS that doesn't do that for boot ordering is a bit novel.

(This may be especially likely on motherboards with a mix of 6G and 3G SATA ports. You probably want the 6G SATA ports enumerated first, and even if HDs live there for now, they're going to wind up being used for SSDs sooner or later.)

In the process of writing this entry I've also discovered that while I moved my root filesystem over to the SSDs, I seem to never have moved /boot; it's still a mirrored partition on the HDs. I'm not sure if this was something I deliberately planned, if I was going to move /boot later but forgot, or if I just plain overlooked the issue. I have some notes from my transition planning, but they're silent on this.

(Since /boot is still on the HDs, I'm now uncertain both about how the BIOS is actually numbering my drives and how Grub2 is finding /boot. Maybe the Grub boot blocks (technically the core image) have a hard-coded UUID for /boot instead of looking at specific BIOS disks.)


Comments on this page:

By Kalexan at 2018-06-19 09:41:01:

Hello.

Just found your post about the error you encountered. After I replaced a faulty disk at a CentOS7 server (RAID1 with 2 NVME disks), when I rebooted the system, after any kernel selection I got the same error as yours:

Symbol 'grub_efi_secure_boot' not found

Server has BIOS, not UEFI.

Only when I changed boot order from BIOS (selected the old drive to boot) the server started just fine. Note that after RAID rebuild, I installed grub on the new NVME disk to make it bootable with:

grub2-install /dev/nvme0n1

If the old disk will fail, how the system is going to boot, since tha new disk is not bootable?

By cks at 2018-06-23 17:45:09:

Belatedly: I'm afraid I don't have any idea how to fix this. It's possible that you need to reinstall the Grub boot blocks using a slightly different disk name for the new disk, but if it's not that I've got no clues.

Written on 19 July 2016.
« A good solution to our Unbound caching problem that sadly won't work
How not to set up your DNS (part 23) »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Tue Jul 19 00:02:06 2016
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.