2006-05-01
Emergency repairs with GRUB
This server lost its boot drive today, which occasioned a certain amount
of flailing. We use mirrored drives, but there were two problems: the
dead disk wasn't actually dead, just puking, and the second disk
hadn't been set up to be bootable (although it had a fully populated
copy of the /boot
partition). The first problem was dealt with by
yanking the drive's power plug out, but the second one took more work.
Although GRUB people may hem and haw about it, GRUB needs boot blocks
and other magic setup done just like LILO does (some more details are
here). If you don't have them set up, it doesn't matter
that all of the GRUB stuff is sitting in a /boot2
partition; your
drive's not booting. To fix that, I needed to install the GRUB boot
stuff. Which meant that I needed to boot the system so I could run the
GRUB boot block installer.
First attempt: boot the FC2 CD in rescue mode. This failed to bring up the RAID-1 partitions on the remaining drive, so it couldn't find our install, so it went nowhere. (I don't know why it failed; possibly the rescue mode refuses to bring up incomplete RAID-1 mirrors.)
Second attempt: boot through a GRUB boot floppy. This is the one area where GRUB is a clear win over LILO; armed with a boot floppy, you can boot anything that is sitting on a readable partition. The easy way to make a boot floppy is with:
$ cd /usr/share/grub/i386*
$ cat stage1 stage2 | dd of=/dev/fd0
(The fine manual has a more complicated incantation with multiple
dd
commands that didn't work for me for some reason.)
This worked, once I had a working floppy. (Familiarity with kernel boot arguments is recommended.)
With the system at least booted, I could make the drive bootable. The
important thing was to fix GRUB's idea of what Linux drive was what BIOS
drive; since /dev/hdc
was the only surviving drive, it was clearly
hd0
(GRUB's name for the first BIOS drive), so I changed the surviving
device.map
file accordingly.
The GRUB documentation will tell you to install GRUB with
grub-install
. Ignore it; grub-install makes a number of unwise
assumptions that make it rather fragile, and most of the time it's easy
enough to use the grub
shell under Linux. The magic incantation I
used was:
$ grub --device-map=/boot2/grub/devices.map
root (hd0,N)
setup --prefix=/grub (hd0)
quit
Our boot partition copy was /boot2
, with the usual layout. N is
the number of that partition minus one; GRUB counts partitions from 0
instead of 1. Our custom is to make the boot partition the first
partition on the drive, so for me it was (hd0,0)
.
Fortunately the drive that died had starting glitching out a few days
earlier, so we had a replacement drive already on hand. Once the system
was at least up (and the mail backlog had cleared), I swapped it in
as /dev/hda
; this led to another run of the GRUB shell to install
bootblocks on it. (Being careful to change GRUB's device mapping so
that /dev/hda
would now be hd0
.)
Apart from that, bringing the new drive into service was pretty much like the last time we had to do this.
(Department of belated corrections, November 14th: I've changed the
setup
line above to have the correct '(hd0)
' instead of the 'hd0
'
that I originally wrote. I actually noticed the error some time back,
when I had to do this again and the literal version didn't work, but
I never got around to actually correcting it until now. Bad me.)