2012-02-08
Choosing the superblock format for Linux's software RAID
Linux's software RAID implementation stores metadata about the RAID
device in each physical device involved in the RAID, in what mdadm
calls 'RAID superblocks' by analogy to the filesystem superblocks that
describe filesystems. In modern versions of software RAID there are a
number of different formats for these RAID superblocks with different
tradeoffs involved in each one, and one of the decisions you need to
make when you create a software RAID array is what format you want to
use.
(Even if you don't actively make a decision, mdadm
will pick a format
for you. Sometimes it will whine irritatingly at you about the situation,
which is how I discovered the whole issue.)
In my opinion, at the moment there are three sensible options to choose from: the 0.90 format and then two variants of the 'version-1' metadata format.
- 0.90 is the original metadata format, which is widely understood
and used. For most people, the most potentially important
limitation of 0.90 metadata is that component devices can't be
larger than 2 TB.
The 0.90 superblock goes at the end of the underlying partition.
- 1.0 puts the superblock at the end of the underlying partition.
- 1.2 puts the superblock 4 Kb from the start of the underlying partition.
It's the sort of default for modern versions of
mdadm
.
(You can see what format your current RAID arrays are using by looking
at /proc/mdstat
. If an array doesn't say 'super <something>
' it's
using 0.90 format metadata; otherwise, it's using whatever version it
says it is. Many relatively modern systems, such as Ubuntu 10.04, either
don't support anything past 0.90 or default to 0.90 in system setup.)
Where the superblock goes is potentially important for RAID-1 arrays. A RAID-1 array with the superblock at the end can relatively easily have whatever filesystem it contains mounted read-only without the RAID running, because the filesystem will start at the start of the underlying raw partitions; this can be important sometimes. A RAID-1 array with the superblock at or near the start of the underlying partitions can't have the raw partitions used this way, because you have to look somewhat beyond the start of the raw partition to see the filesystem.
(Some versions of mdadm
will explicitly warn you about this or even
quiz you about it if you don't specify a format explicitly.)
If you want to use a modern format and are going to directly use the
RAID-1 array for a filesystem, I would use 1.0 format (this is what
I've done for my new /
and /boot
). For swap areas you might as well
use 1.2 format; if you ever need to use swap without software RAID, you
can just destroy the 1.2 superblocks with mkswap
. For LVM physical
volumes you can argue back and forth either way; right now I've chosen
1.2 format because I really don't want to think about what it would take
to safely bring up an LVM physical volume without software RAID running.
(LVM physical volumes have their own metadata, which normally goes at
the start of the 'raw' partition that LVM is using but which can be
replicated to the end as well. See pvcreate
's manpage.)
As far as I know you can't change the superblock format of an array after it has been created, at least not without destroying it and recreating it. You can sort of do this without an extra disk with sufficient work, but really you want to get it right at creation time.
PS: note that in theory you can use dmsetup
to gain access to
filesystems or other sorts of data that doesn't begin at the start of
a raw partition, so you can get at a filesystem embedded inside the
raw partition of a RAID-1 array with 1.2 format metadata. However this
requires user level intervention, which means that you're going to need
a rescue environment or rescue disk of some sort.