Wandering Thoughts archives

2012-07-22

The history of booting Linux with software RAID

One of the broad developments in the Linux kernel's boot process over the past N years has been a steady move from having the kernel do things inside itself to having them done in user level code (which is typically run from an initramfs). The handling of software RAID arrays is no exception to this.

In the beginning, activating software RAID arrays at boot time was handled inside the kernel. At boot time the kernel (specifically the software RAID code) scanned all disk partitions of type fd ('Linux raid') and automatically assembled and activated any software RAID arrays that it found. Although there were a bunch of corner cases that this didn't handle, it worked great in most normal situations and meant that you could boot a 'root on software RAID' system without an initramfs (well, back then it was an initrd). Since this process happened entirely in the kernel, the contents of any mdadm.conf were irrelevant; all that mattered was that the partitions had the right type (and that they had valid RAID superblocks). In fact back in the old days many systems with software RAID had no mdadm.conf at all.

(I don't remember specific kernel versions any more, but I believe that most or all of the 2.4 kernel series could work this way.)

The first step away from this was to have software RAID arrays assembled in the initrd environment by explicitly running mdadm from the /init script, using a copy of mdadm.conf that was embedded in the initrd image. I believe that the disk partition type no longer mattered (since mdadm would normally probe all devices for RAID superblocks). It was possible to have explosive failures if your mdadm.conf did not completely match the state of critical RAID arrays.

(I don't know if this stage would assemble RAID arrays not listed in your mdadm.conf and I no longer have any systems I could use to check this.)

The next state of moving boot time handling of software RAID out of the kernel is the situation we have today. As I described recently, a modern Linux system does all assembly of software RAID arrays asynchronously through udev (along with a great deal of other device discovery and handling). In order to have all of this magical udev device handling happen in the initramfs environment too, your initramfs starts an instance of udev quite early on and this instance is used to process boot-time device events and so on. This instance uses a subset of the regular rules for processing events, generally only covering what is considered important devices for booting your system. As we've seen, this process of assembling software RAID arrays is generally indifferent to whether or not the arrays are listed in mdadm.conf; I believe (but have not tested) that it also doesn't care about the partition type.

(I think that the udev process that the initramfs starts is later terminated and replaced by a udev process started during real system boot.)

linux/SoftwareRaidBootHistory written at 22:41:26; Add Comment

A sleazy trick to capture debugging output from an initramfs

Suppose, not entirely hypothetically, that something in your system's initramfs is failing or that you just want to capture some debugging output or state information in general. The traditional way to do this when console output isn't good enough is to just dump the output into a file and read the file later, but this has a problem in the initramfs world; the file you write out will be in the initramfs, which means that it will quietly disappear when boot process is finished and the initramfs goes away.

So we need two things. We need to preserve the initramfs or at least the bit of it that we care about, and then we need some way to get access to it. There is probably an official way to do this, but here is my sleazy trick.

We can preserve a file from the initramfs by starting a process in the initramfs (and then having it stay running) that has a file descriptor for the file. For example (on Ubuntu 12.04):

(udevadm monitor >/tmp/logfile 2>&1) &

(I believe that even something like 'sleep 16000 >/tmp/logfile &' should do it. You can then have other commands append things to it with '>>/tmp/logfile'.)

There are undoubtedly clever ways to preserve the initramfs or get access to it, but once you have a preserved file descriptor there's a simpler brute force way. Simply look at /proc/<pid>/fd/<N> (<N> is often 1 or 2) and there's your debug file. You can now use whatever tool you like (including a pager like less) to look at it.

linux/CaptureInitramfsDebugging written at 00:53:53; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.