How modern Linux software RAID arrays are assembled on boot (and otherwise)

November 29, 2013

Here is a question that is periodically relevant: just how does a modern Linux system assemble and bring up your software RAID arrays (and other software-defined things, for that matter)?

I've written about the history of this before so I'll summarize: in the very old days the kernel did it all for you and in the not as old days it was done by a script in your initial ramdisk that ran mdadm, often using an embedded copy of your regular mdadm.conf.

The genesis of modern software RAID activation was udev and general support for dynamically appearing devices, including 'hotplug' disk devices (which was and is a good thing, to be clear). When disks can appear over time, simply running mdadm once at some arbitrary point is clearly not good enough. Instead the whole RAID assembly system was changed so that every time a disk appears, udev arranges to run mdadm in a special 'incremental' mode. As the manpage describes it:

Add a single device to an appropriate array. If the addition of the device makes the array runnable, the array will be started. This provides a convenient interface to a hot-plug system.

A modern Linux system embeds a copy of udev (and the important udev rules and various supporting bits) in the initramfs and starts it early in the initramfs boot process. The kernel feeds this copy of udev events corresponding to all of the hardware that has been recognized so far and then udev starts kicking off more or less all of its usual processing, including handling newly appeared disk devices and thus incrementally assembling your software RAID arrays. Hopefully this process fully completes before you need the RAID arrays.

(I'm not sure when and how this incremental assembly process decides that a RAID array is ready to be started, given that ideally you'd want all of an array's devices to be present instead of just the minimum number. Note that the intelligence for this is in mdadm, not udev.)

The same general process is used to assemble and activate things like LVM physical volumes and volume groups; as devices appear, udev runs appropriate LVM commands to incrementally update the collection of known physical volumes and so on and activate any that have become ready for it. This implies that one physical disk finally appearing can cause a cascade of subsequent events as the physical disk causes a software RAID device to be assembled, the new RAID device is reported back to udev and recognized as an LVM physical volume, and so on.

Where exactly the udev rules for all of this live varies from distribution to distribution, so really you need to grep through /lib/udev/rules.d (or /usr/lib/udev/rules.d) to find and read everything that mentions mdadm. Then you can read your mdadm and mdadm.conf manpages to see what sort of control (if any) you can exert over this process.

The drawback of this process is that there is no longer a clear chain of scripts or the like that you can read to follow (or predict) the various actions that get taken. Instead everything is event driven and thus much harder to trace (and much less obvious, and much more split up across many different files, and so on). A modern Linux system booting is a quite complicated asynchronous environment that is built from many separate little pieces. Generally it's not well documented.

One corollary of all of this is that it is remarkably hard to have a disk device appear and then be left alone. The moment the kernel sends the 'new device' event to udev (either during boot or when the system is running), udev will start kicking off all of its usual processing and so on. udevadm can be used to turn off event processing in general but that's a rather blunt hammer (and may have bad consequences if other important events happen during this). For that matter you probably don't want to totally turn off processing of the disk device's events given that udev is also responsible for creating the /dev entries for newly appearing disks.

Written on 29 November 2013.
« A quick analysis of bounces here
The case of the disappearing ESATA disk »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Nov 29 02:27:24 2013
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.