BIOS MBR booting isn't always flawless and can be odd

February 15, 2023

I recently wrote about one reason I prefer BIOS MBR booting over UEFI, where I praised the predictability of BIOS MBR booting. This is true, but just as with everything else, sometimes BIOS makers have been creative and the result of their creativity is not as useful as you might want. Today's story is about one such BIOS, found on some of our servers.

The most important job of a BIOS in MBR booting is picking the 'boot disk'. The traditional way of doing this is that the BIOS has a fixed order of disk ports and, by default, the first disk found in one of them is the boot disk. Many BIOSes allow you to change this default, so that you order some disks in front of others. Most of the time, a BIOS doing this actually means 'the disk slot', not 'the disk itself'; any disk in the slot will have the same position in the boot drive order.

Sometimes, though, a BIOS really means 'the disk', not the slot it's currently in. This is somewhat reasonable if the BIOS tells you about it and perhaps requires you to specifically set things to disks instead of slots, because it means that wherever your boot drive gets moved to and however the BIOS inventories them, you always boot off it. However, a consequence of this is that if you remove the old disk and replace it with a different one, the new disk is not in the boot order, or is at the end of the boot order.

(How the BIOS identifies the disks may vary, but many will harvest at least the model number as part of probing all of the disk slots, partly so they can tell you something about what's in each slot when they offer you a one-time boot menu.)

We have one set of servers where the BIOS always does the boot order based on disks, not slots (as far as we can tell). If nothing is set up, there's a default slot-based order, but once you put disks in the slots the BIOS memorizes the disks and their position is fixed. Where this goes badly is that these machines also have a bunch of non-boot data drives, which the BIOS also includes in the boot order. This has various failure modes.

The obvious failure mode is that if you have a system disk fail and replace it, your replaced system disk is not in its right place in the BIOS boot order. Since it's a new disk it's been put at the end, after all of the data drives. If you replaced the first system disk, you're still able to boot because you're actually booting off the other system disk.

The subtle failure mode is that if you build a new set of mirrored system disks in another chassis for a fast in-place upgrade where you shut down the server, pull the old system disks, and put in the new system disks, it won't work. When you pull both original system disks and replace them with new ones, the new disks go at the end of the BIOS boot order, after all of the data disks. Since your data disks aren't bootable, the server will then hang mysteriously on boot.

At least in theory UEFI avoids any chance of this, because UEFI doesn't boot disks, it at most boots things that it finds sitting around on your EFI System Partitions. If you don't put ESPs on your disks or don't put anything in those ESPs, a system's EFI firmware has no choice but to ignore them. It could decide to not look at all of your disks for ESPs, but that would be a relatively peculiar decision.

(EFI boot entries normally or perhaps always specify the GUID of the ESP they're about, which is a potential trap in the case of replacing system disks. But if there's no boot entries, your EFI firmware should look around for ESPs with bootable things.)

Written on 15 February 2023.
« Things I ran into when moving from Fvwm2 to Fvwm3
Linux Ext4 directories have a maximum size (in entries) »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Feb 15 23:12:42 2023
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.