How I went wrong in thinking about /boot mirroring

April 13, 2009

I've always seen how a mirrored /boot could work if the disks were absolutely identical up to the end of the /boot partition; if they were identical, the absolute block positions of everything that the boot blocks wanted to load would be identical, so nothing would care about which disk it was talking to. My concern was always what happened if the disks weren't quite so utterly identical, because back in the day it seemed to me like that required quite a lot of magic.

(I'm going to neglect here the magic necessary to see through the software RAID to get the block positions on the real physical devices.)

But how much magic it requires depends on how much you want out of mirroring /boot and a mirror-aware bootloader. The simple version is that /boot is just replicated and your boot blocks on each disk will only boot from that disk's copy of /boot; this lets you survive a total disk failure but not something that destroys the primary copy of /boot (well, not without turning it into a total disk failure). The most elaborate version is that if your boot block(s) and any version of /boot survive, the system will still boot.

(In the days that I started thinking about this, I did set up LILO by hand so that it would in theory do the full version. Mind you, in the LILO days this was more of a potential concern, since there were somewhat more ways to destroy the bootability of /boot.)

My impression is that GRUB's mirror awareness is limited to the first version. This is conceptually straightforward; each disk's boot blocks only have to know about the absolute block position of things on that disk, and the setup program just has to be aware of how to work that out. Alternatively, it has to be completely oblivious to mirroring and only work with the low-level 'below of the mirror' view of the partitions. (I believe that the latter is what GRUB does, since you tell it what hard disk partition to look at when installing things.)

Setting up the full version would be much more complex and challenging. In the extreme case (where the /boot mirrors have different partition numbers), the boot block would have to be able to probe the disk, find the right partition, and then load things from there. Given the severe space limitations that boot blocks operate under, I suspect that no one does this.

(The full version is feasible if you assume that the absolute block positions are identical across all drives; then you can just try to load things from each drive in the system in turn until you either find one with a valid signature or you give up.)

Written on 13 April 2009.
« Your ticketing system should be optional
The problem with Solaris 10 update 6's ZFS failmode setting »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Apr 13 23:22:02 2009
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.