2009-05-14
Fixing your system after hitting the RAID growth gotcha
The easiest way to dig yourself out of the hole created by the RAID
growth gotcha is probably to use a live/rescue CD.
But let us suppose that you don't have one handy, which was the case
for me yesterday. Further let us suppose that you have /boot
as a
separate filesystem, not as part of the root
filesystem (if this is not the case, you absolutely need a rescue CD;
sorry).
The basic goal is to rebuild a version of your current initrd that has
an updated /etc/mdadm.conf
that specifies that your root mirror has
the right number of devices. Since we can't boot the system normally,
we can't just bring it up, edit the real mdadm.conf
, and regenerate
the initrd with the normal tools; instead, you need to boot the system
in a minimal mode and unpack, fix, and rebuild the initrd by hand.
First, you need some setup:
- you need to boot the system without the root mirror. Heed the cautions.
Once booted, you'll probably want to mount
/usr
(read-only), if only so that you can read manpages. - you need writeable scratch space to rebuild the initd; I mounted
a tmpfs
/tmp
. - you need
/boot
mounted read-write; if your/boot
is mirrored, you'll have to assemble the software RAID first with appropriatemdadm
invocations.
After that, it is relatively simple:
- make a scratch directory in
/tmp
(or wherever) and unpack your current initrd to it. Initrds are compressed cpio images, so this is something like:cd /tmp/t; zcat </boot/initrd | cpio -di
- edit the now-unpacked
etc/mdadm.conf
to have the rightnum-devices
value for the software RAID with your root filesystem. You don't need to update the numbers for the other software RAID devices; they aren't started in the initrd. - reassemble the initrd. On Fedora, the 100% authentic way
to do this, exactly duplicating what
mkinitrd
does, is:echo nash-find . | /sbin/nash --force --quiet | cpio -H newc --quiet -o | gzip -9 >/tmp/initrd
- rename your current initrd to something else as a backup,
and then copy your newly generated initrd into
/boot
with the right name.
At this point you can reboot and your system should come up as far
as mounting the root filesystem. If you have other software RAID
mirrors it will then throw a fit because none of them will successfully
assemble either, and drop you into rescue mode. To fix this, edit
/etc/mdadm.conf
to specify the right num-devices
number for all
software RAID mirrors (including the root mirror). After you've
done this, reboot and things should work.
(You may need to make the root filesystem read-write first with
'fsck /
' followed by 'mount -o remount,rw /
'.)
If you have no other software RAID mirrors, you still need to update
/etc/mdadm.conf
once the system has booted or you will get to go
through this all over again after the next kernel update.
(You can see why I suspect a rescue CD would be easier. With a rescue CD
you should be able to assemble and mount the root filesystem, chroot to
it and set any other necessary filesystems up, edit /etc/mdadm.conf
,
and then run mkinitrd
or the like.)
For safety, you probably want to rebuild your current initrd using
the official tools after doing the full mdadm.conf
update.