Emergency repairs with GRUB

May 1, 2006

This server lost its boot drive today, which occasioned a certain amount of flailing. We use mirrored drives, but there were two problems: the dead disk wasn't actually dead, just puking, and the second disk hadn't been set up to be bootable (although it had a fully populated copy of the /boot partition). The first problem was dealt with by yanking the drive's power plug out, but the second one took more work.

Although GRUB people may hem and haw about it, GRUB needs boot blocks and other magic setup done just like LILO does (some more details are here). If you don't have them set up, it doesn't matter that all of the GRUB stuff is sitting in a /boot2 partition; your drive's not booting. To fix that, I needed to install the GRUB boot stuff. Which meant that I needed to boot the system so I could run the GRUB boot block installer.

First attempt: boot the FC2 CD in rescue mode. This failed to bring up the RAID-1 partitions on the remaining drive, so it couldn't find our install, so it went nowhere. (I don't know why it failed; possibly the rescue mode refuses to bring up incomplete RAID-1 mirrors.)

Second attempt: boot through a GRUB boot floppy. This is the one area where GRUB is a clear win over LILO; armed with a boot floppy, you can boot anything that is sitting on a readable partition. The easy way to make a boot floppy is with:

$ cd /usr/share/grub/i386*
$ cat stage1 stage2 | dd of=/dev/fd0

(The fine manual has a more complicated incantation with multiple dd commands that didn't work for me for some reason.)

This worked, once I had a working floppy. (Familiarity with kernel boot arguments is recommended.)

With the system at least booted, I could make the drive bootable. The important thing was to fix GRUB's idea of what Linux drive was what BIOS drive; since /dev/hdc was the only surviving drive, it was clearly hd0 (GRUB's name for the first BIOS drive), so I changed the surviving device.map file accordingly.

The GRUB documentation will tell you to install GRUB with grub-install. Ignore it; grub-install makes a number of unwise assumptions that make it rather fragile, and most of the time it's easy enough to use the grub shell under Linux. The magic incantation I used was:

$ grub --device-map=/boot2/grub/devices.map
root (hd0,N)
setup --prefix=/grub (hd0)
quit

Our boot partition copy was /boot2, with the usual layout. N is the number of that partition minus one; GRUB counts partitions from 0 instead of 1. Our custom is to make the boot partition the first partition on the drive, so for me it was (hd0,0).

Fortunately the drive that died had starting glitching out a few days earlier, so we had a replacement drive already on hand. Once the system was at least up (and the mail backlog had cleared), I swapped it in as /dev/hda; this led to another run of the GRUB shell to install bootblocks on it. (Being careful to change GRUB's device mapping so that /dev/hda would now be hd0.)

Apart from that, bringing the new drive into service was pretty much like the last time we had to do this.

(Department of belated corrections, November 14th: I've changed the setup line above to have the correct '(hd0)' instead of the 'hd0' that I originally wrote. I actually noticed the error some time back, when I had to do this again and the literal version didn't work, but I never got around to actually correcting it until now. Bad me.)

Written on 01 May 2006.
« Weekly spam summary on April 29th, 2006
CSS and syndication (another CSS limitation) »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon May 1 02:23:11 2006
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.