2013-11-29
How modern Linux software RAID arrays are assembled on boot (and otherwise)
Here is a question that is periodically relevant: just how does a modern Linux system assemble and bring up your software RAID arrays (and other software-defined things, for that matter)?
I've written about the history of this before so I'll summarize: in the very old days the
kernel did it all for you and in the not as old days it was done by a
script in your initial ramdisk that ran mdadm
, often using an embedded
copy of your regular mdadm.conf
.
The genesis of modern software RAID activation was udev
and general
support for dynamically appearing devices, including 'hotplug' disk
devices (which was and is a good thing, to be clear). When disks
can appear over time, simply running mdadm
once at some arbitrary
point is clearly not good enough. Instead the whole RAID assembly
system was changed so that every time a disk appears, udev
arranges
to run mdadm
in a special 'incremental' mode. As the manpage
describes it:
Add a single device to an appropriate array. If the addition of the device makes the array runnable, the array will be started. This provides a convenient interface to a hot-plug system.
A modern Linux system embeds a copy of udev
(and the important
udev
rules and various supporting bits) in the initramfs and
starts it early in the initramfs boot process. The kernel feeds this
copy of udev
events corresponding to all of the hardware that has
been recognized so far and then udev
starts kicking off more or
less all of its usual processing, including handling newly appeared
disk devices and thus incrementally assembling your software RAID
arrays. Hopefully this process fully completes before you need the
RAID arrays.
(I'm not sure when and how this incremental assembly process decides
that a RAID array is ready to be started, given that ideally you'd
want all of an array's devices to be present instead of just the
minimum number. Note that the intelligence for this is in mdadm
,
not udev
.)
The same general process is used to assemble and activate things like
LVM physical volumes and volume groups; as devices appear, udev
runs
appropriate LVM commands to incrementally update the collection of known
physical volumes and so on and activate any that have become ready for
it. This implies that one physical disk finally appearing can cause a
cascade of subsequent events as the physical disk causes a software RAID
device to be assembled, the new RAID device is reported back to udev
and recognized as an LVM physical volume, and so on.
Where exactly the udev rules for all of this live varies from
distribution to distribution, so really you need to grep
through
/lib/udev/rules.d
(or /usr/lib/udev/rules.d
) to find and read
everything that mentions mdadm
. Then you can read your mdadm
and mdadm.conf
manpages to see what sort of control (if any) you
can exert over this process.
The drawback of this process is that there is no longer a clear chain of scripts or the like that you can read to follow (or predict) the various actions that get taken. Instead everything is event driven and thus much harder to trace (and much less obvious, and much more split up across many different files, and so on). A modern Linux system booting is a quite complicated asynchronous environment that is built from many separate little pieces. Generally it's not well documented.
One corollary of all of this is that it is remarkably hard to have
a disk device appear and then be left alone. The moment the kernel
sends the 'new device' event to udev
(either during boot or when
the system is running), udev
will start kicking off all of its
usual processing and so on. udevadm
can be used to turn off event
processing in general but that's a rather blunt hammer (and may
have bad consequences if other important events happen during this).
For that matter you probably don't want to totally turn off processing
of the disk device's events given that udev
is also responsible for
creating the /dev
entries for newly appearing disks.