Fixing your system after hitting the RAID growth gotcha

May 14, 2009

The easiest way to dig yourself out of the hole created by the RAID growth gotcha is probably to use a live/rescue CD. But let us suppose that you don't have one handy, which was the case for me yesterday. Further let us suppose that you have /boot as a separate filesystem, not as part of the root filesystem (if this is not the case, you absolutely need a rescue CD; sorry).

The basic goal is to rebuild a version of your current initrd that has an updated /etc/mdadm.conf that specifies that your root mirror has the right number of devices. Since we can't boot the system normally, we can't just bring it up, edit the real mdadm.conf, and regenerate the initrd with the normal tools; instead, you need to boot the system in a minimal mode and unpack, fix, and rebuild the initrd by hand.

First, you need some setup:

  • you need to boot the system without the root mirror. Heed the cautions.

    Once booted, you'll probably want to mount /usr (read-only), if only so that you can read manpages.

  • you need writeable scratch space to rebuild the initd; I mounted a tmpfs /tmp.

  • you need /boot mounted read-write; if your /boot is mirrored, you'll have to assemble the software RAID first with appropriate mdadm invocations.

After that, it is relatively simple:

  • make a scratch directory in /tmp (or wherever) and unpack your current initrd to it. Initrds are compressed cpio images, so this is something like:
    cd /tmp/t; zcat </boot/initrd | cpio -di

  • edit the now-unpacked etc/mdadm.conf to have the right num-devices value for the software RAID with your root filesystem. You don't need to update the numbers for the other software RAID devices; they aren't started in the initrd.

  • reassemble the initrd. On Fedora, the 100% authentic way to do this, exactly duplicating what mkinitrd does, is:
    echo nash-find . | /sbin/nash --force --quiet | cpio -H newc --quiet -o | gzip -9 >/tmp/initrd

  • rename your current initrd to something else as a backup, and then copy your newly generated initrd into /boot with the right name.

At this point you can reboot and your system should come up as far as mounting the root filesystem. If you have other software RAID mirrors it will then throw a fit because none of them will successfully assemble either, and drop you into rescue mode. To fix this, edit /etc/mdadm.conf to specify the right num-devices number for all software RAID mirrors (including the root mirror). After you've done this, reboot and things should work.

(You may need to make the root filesystem read-write first with 'fsck /' followed by 'mount -o remount,rw /'.)

If you have no other software RAID mirrors, you still need to update /etc/mdadm.conf once the system has booted or you will get to go through this all over again after the next kernel update.

(You can see why I suspect a rescue CD would be easier. With a rescue CD you should be able to assemble and mount the root filesystem, chroot to it and set any other necessary filesystems up, edit /etc/mdadm.conf, and then run mkinitrd or the like.)

For safety, you probably want to rebuild your current initrd using the official tools after doing the full mdadm.conf update.

Written on 14 May 2009.
« Booting a Linux system without a root mirror
Why df on an NFS-mounted ZFS filesystem can give odd results »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu May 14 00:35:47 2009
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.