A serious gotcha with growing software RAID devices
Suppose that you have a fully mirrored system and you want to expand your available space by migrating to a pair of larger hard drives. The obvious way to do this is fairly straightforward, assuming that you can connect all of your drives at once: partition the new drives (with larger partitions for at least some of your software RAID devices), add the new partitions to the appropriate MD device, let things resync, and then detach the old disks (and eventually physically remove them).
(Hopefully the obvious way is also the right way.)
One of the things that you will probably have to do in order to do this
is to grow the number of devices in your mirrors. This is simple to do;
mdadm -G -n 4 /dev/md0' will make it so that
/dev/md0 can now have
a four-way mirror (instead of the two-way one it's probably set up with).
If you do this, your system will probably fail to come up the next time you reboot it.
(Specifically, it will report that it is unable to set up the mirror for your root filesystem.)
In the old days, the Linux kernel automatically assembled software
RAID devices itself. These days, this is done by your initrd running
mdadm to assemble the root mirror, which uses information from an
embedded copy of
/etc/mdadm.conf. One of the pieces of information in
mdadm.conf is how many devices each mirror has.
(You can see where this is going.)
mdadm will refuse to assemble a mirror, like say the one for your root
filesystem, if the RAID superblocks on the disks say that the mirror
has a different number of devices than the configuration file says it
has, even if everything else matches. When you added more devices to
your root mirror you probably didn't update
especially you probably didn't rebuild your initrd after doing that. And
who does? The whole point of modern Linux initrds is that you don't have
to rebuild them.
In my barely printable opinion, this behavior of
mdadm is asinine,
especially in combination with an uninformative initrd environment;
it is robotically correct but practically useless. In an initrd
environment, the only things
mdadm should care about are whether the
array UUID matches and the device can be assembled intact, and I am not
entirely convinced about the UUID.
(Disclaimer: I know this happens on Fedora 8. It is possible that
more recent versions of
mdadm have fixed this, although I don't see
anything in the changelog.)