Choosing the superblock format for Linux's software RAID

February 8, 2012

Linux's software RAID implementation stores metadata about the RAID device in each physical device involved in the RAID, in what mdadm calls 'RAID superblocks' by analogy to the filesystem superblocks that describe filesystems. In modern versions of software RAID there are a number of different formats for these RAID superblocks with different tradeoffs involved in each one, and one of the decisions you need to make when you create a software RAID array is what format you want to use.

(Even if you don't actively make a decision, mdadm will pick a format for you. Sometimes it will whine irritatingly at you about the situation, which is how I discovered the whole issue.)

In my opinion, at the moment there are three sensible options to choose from: the 0.90 format and then two variants of the 'version-1' metadata format.

  • 0.90 is the original metadata format, which is widely understood and used. For most people, the most potentially important limitation of 0.90 metadata is that component devices can't be larger than 2 TB.

    The 0.90 superblock goes at the end of the underlying partition.

  • 1.0 puts the superblock at the end of the underlying partition.
  • 1.2 puts the superblock 4 Kb from the start of the underlying partition It's the sort of default for modern versions of mdadm.

(You can see what format your current RAID arrays are using by looking at /proc/mdstat. If an array doesn't say 'super <something>' it's using 0.90 format metadata; otherwise, it's using whatever version it says it is. Many relatively modern systems, such as Ubuntu 10.04, either don't support anything past 0.90 or default to 0.90 in system setup.)

Where the superblock goes is potentially important for RAID-1 arrays. A RAID-1 array with the superblock at the end can relatively easily have whatever filesystem it contains mounted read-only without the RAID running, because the filesystem will start at the start of the underlying raw partitions; this can be important sometimes. A RAID-1 array with the superblock at or near the start of the underlying partitions can't have the raw partitions used this way, because you have to look somewhat beyond the start of the raw partition to see the filesystem.

(Some versions of mdadm will explicitly warn you about this or even quiz you about it if you don't specify a format explicitly.)

If you want to use a modern format and are going to directly use the RAID-1 array for a filesystem, I would use 1.0 format (this is what I've done for my new / and /boot). For swap areas you might as well use 1.2 format; if you ever need to use swap without software RAID, you can just destroy the 1.2 superblocks with mkswap. For LVM physical volumes you can argue back and forth either way; right now I've chosen 1.2 format because I really don't want to think about what it would take to safely bring up an LVM physical volume without software RAID running.

(LVM physical volumes have their own metadata, which normally goes at the start of the 'raw' partition that LVM is using but which can be replicated to the end as well. See pvcreate's manpage.)

As far as I know you can't change the superblock format of an array after it has been created, at least not without destroying it and recreating it. You can sort of do this without an extra disk with sufficient work, but really you want to get it right at creation time.

PS: note that in theory you can use dmsetup to gain access to filesystems or other sorts of data that doesn't begin at the start of a raw partition, so you can get at a filesystem embedded inside the raw partition of a RAID-1 array with 1.2 format metadata. However this requires user level intervention, which means that you're going to need a rescue environment or rescue disk of some sort.

Written on 08 February 2012.
« The advantage of HDMI for dual displays
A general point about SSH personal keys »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Feb 8 01:20:38 2012
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.