2012-07-22
The history of booting Linux with software RAID
One of the broad developments in the Linux kernel's boot process over the past N years has been a steady move from having the kernel do things inside itself to having them done in user level code (which is typically run from an initramfs). The handling of software RAID arrays is no exception to this.
In the beginning, activating software RAID arrays at boot time was
handled inside the kernel. At boot time the kernel (specifically the
software RAID code) scanned all disk partitions of type fd
('Linux
raid') and automatically assembled and activated any software RAID
arrays that it found. Although there were a bunch of corner cases
that this didn't handle, it worked great in most normal situations and
meant that you could boot a 'root on software RAID' system without
an initramfs (well, back then it was an initrd). Since this process
happened entirely in the kernel, the contents of any mdadm.conf
were
irrelevant; all that mattered was that the partitions had the right type
(and that they had valid RAID superblocks). In fact back in the old days
many systems with software RAID had no mdadm.conf
at all.
(I don't remember specific kernel versions any more, but I believe that most or all of the 2.4 kernel series could work this way.)
The first step away from this was to have software RAID arrays assembled
in the initrd environment by explicitly running mdadm
from the
/init
script, using a copy of mdadm.conf
that was embedded in the
initrd image. I believe that the disk partition type no longer mattered
(since mdadm
would normally probe all devices for RAID superblocks). It was possible to have explosive
failures if your mdadm.conf
did not completely
match the state of critical RAID arrays.
(I don't know if this stage would assemble RAID arrays not listed in
your mdadm.conf
and I no longer have any systems I could use to check
this.)
The next state of moving boot time handling of software RAID out of
the kernel is the situation we have today. As I described recently, a modern Linux system does all assembly of
software RAID arrays asynchronously through udev
(along with a great
deal of other device discovery and handling). In order to have all of
this magical udev
device handling happen in the initramfs environment
too, your initramfs starts an instance of udev
quite early on and this
instance is used to process boot-time device events and so on. This
instance uses a subset of the regular rules for processing events,
generally only covering what is considered important devices for booting
your system. As we've seen, this process of assembling software RAID
arrays is generally indifferent to whether or not the arrays are listed
in mdadm.conf
; I believe (but have not tested) that it also doesn't
care about the partition type.
(I think that the udev
process that the initramfs starts is later
terminated and replaced by a udev
process started during real system
boot.)
A sleazy trick to capture debugging output from an initramfs
Suppose, not entirely hypothetically, that something in your system's initramfs is failing or that you just want to capture some debugging output or state information in general. The traditional way to do this when console output isn't good enough is to just dump the output into a file and read the file later, but this has a problem in the initramfs world; the file you write out will be in the initramfs, which means that it will quietly disappear when boot process is finished and the initramfs goes away.
So we need two things. We need to preserve the initramfs or at least the bit of it that we care about, and then we need some way to get access to it. There is probably an official way to do this, but here is my sleazy trick.
We can preserve a file from the initramfs by starting a process in the initramfs (and then having it stay running) that has a file descriptor for the file. For example (on Ubuntu 12.04):
(udevadm monitor >/tmp/logfile 2>&1) &
(I believe that even something like 'sleep 16000 >/tmp/logfile &
'
should do it. You can then have other commands append things to it with
'>>/tmp/logfile
'.)
There are undoubtedly clever ways to preserve the initramfs or get
access to it, but once you have a preserved file descriptor there's a
simpler brute force way. Simply look at /proc/<pid>/fd/<N>
(<N> is
often 1 or 2) and there's your debug file. You can now use whatever tool
you like (including a pager like less
) to look at it.