Ubuntu 12.04 can't reliably boot with software RAID (and why)
Recently one of my co-workers discovered, diagnosed, and worked around a significant issue with software RAID on Ubuntu 12.04. I'm writing it up here partly to get it all straight in my head and partly so we can help out anyone else with the same problem. The quick summary of the situation comes from my tweet:
Ubuntu 12.04 will not reliably boot a system with software RAID arrays due to races in the initramfs scripts.
(As you might guess, I am not happy.)
If you set up Ubuntu 12.04 with one or more software RAID arrays for
things other than the root filesystem, you will almost certainly
find that some of the time when you reboot your system it will come
up with one or more software RAID arrays in a degraded state with
one or more component devices not added to the array. If you have
set bootdegraded=true
as one of your boot options (eg on the
kernel command line), your system will boot
fully (and you can hot-add omitted device back to the array); if you
haven't, the initramfs will pause briefly to ask you if you want to
continue booting anyways, time out on the question, and drop you into an
initramfs shell.
This can happen whether or not your root filesystem is on a software
RAID array (although it doesn't happen to the root array itself,
only to other arrays) and even if you do not have the software RAID
arrays configured or used in your system in any way (not listed in
/etc/mdadm/mdadm.conf
, not used in /etc/fstab
and so on); simply
having software RAID arrays on a disk attached to your system at boot
time is enough to trigger the problem. It doesn't require disks that are
slow to respond to the kernel (to the extent that we've reproduced this
in VMWare, where the disks aren't even physical and respond to kernel
probes basically instantly).
Now let's talk about how this happens.
Like other modern systems Ubuntu 12.04 handles device discovery with
udev
, even during early boot in the initramfs. Part of udev's device
discovery is the assembly of RAID arrays from components. What this
means is that software RAID assembly is asynchronous; the initramfs
starts the udev daemon, the daemon ends up with a list of events to
process, and as it works through them the software RAID arrays start
to appear. In the mean time the rest of the initramfs boot process
continues on and in short order sets itself up to mount the root
filesystem. As part of preparing to mount the root filesystem, the
initramfs code then checks to see if all visible arrays are fully
assembled and healthy without waiting for udev to have processed all
pending events. You know, the events that can include incrementally
assembling those arrays.
This is a race. If udev wins the race and fully assembles all visible software RAID arrays before the rest of the initramfs checks them, you win and your system boots. If udev loses the race, you lose; the check for degraded software RAID arrays will see some partially assembled arrays and throw up its hands.
Our brute force solution is to modify the check for degraded software
RAID arrays to explicitly wait for the udev event queue to drain by
running 'udevadm settle
'. This appears to work so far but we haven't
extensively tested it; it's possible that there's still a race present
but it's now small enough that we haven't managed to hit it yet.
This is unquestionably an Ubuntu bug and I hope that it will be fixed in some future update.
Sidebar: our fix in specific
(For the benefit of anyone with this problem who's doing Internet searches.)
Change /usr/share/initramfs-tools/scripts/mdadm-functions
as follows:
degraded_arrays() { + udevadm settle mdadm --misc --scan --detail --test >/dev/null 2>&1 return $((! $?)) }
Then rebuild your current initramfs by running 'update-initramfs -u
'.
Since I suspect that mdadm-functions
is not considered a configuration
file, you may want to put a dpkg hold on the Ubuntu mdadm
package so
that an automatic upgrade doesn't wipe out your change.
(This may not be the best and most Ubuntu-correct solution. It's just what we've done and tested right now.)
Sidebar: where the bits of this are on 12.04
/lib/udev/rules.d/85-mdadm.rules
: the udev rule to incrementally assemble software RAID arrays as components become available.
Various parts of the initramfs boot process are found (on a running
system) in /usr/share/initramfs-tools/scripts
:
init-top/udev
: the scriptlet that starts udev.local-premount/mdadm
: the scriptlet that checks for all arrays being good; however, it just runs some functions from the next bit. (All oflocal-premount
is run by thelocal
scriptlet, which is run by the initramfs/init
if the system is booting from a local disk.)mdadm-functions
: the code that does all the work of checking and 'handling' incomplete software RAID arrays.
Looking at this, I suspect that a better solution is to stick our own
script in local-premount
, arranged to run before the mdadm
script,
and have it run the 'udevadm settle
'. That would avoid changing any
package-supplied scripts.
(Testing has shown that creating a local-top/mdadm-settle
scriptlet
isn't good enough. It gets run, but too early. This probably means that
modifying the degraded_arrays
function is the most reliable solution
since it happens the closest to the actual check, and we just get to
live with modifying a package-supplied file and so on.)
|
|