Shifting a software RAID mirror from disk to disk in modern Linux

April 3, 2014

Suppose that you have a software RAID mirror and you want to migrate one side of the mirror from one disk to another to replace the old disk. The straightforward way is to remove the old disk, put in the new disk, and resync the mirror. However this leaves you without a mirror at all for the duration of the resync so if you can get all three disks online at once what you'd like to do is add the new disk as a third mirror and then remove the old disk later. Modern Linux makes this a little bit complicated.

The core complication is that your software RAID devices know how many active mirrors they are supposed to have. If you add a device beyond that, it becomes a hot spare instead of being an active mirror. To activate it as a mirror you must add it then grow the number of active devices in the mirror. Then to properly deactivate the old disk you need to do the reverse.

Here are the actual commands (for my future use if nothing else):

  1. Hot-add the new device:
    mdadm -a /dev/md17 /dev/sdd7

    If you look at /proc/mdstat afterwards you'll see it marked as a spare.

  2. 'Grow' the number of active devices in the mirror:
    mdadm -G -n 3 /dev/md17

  3. Wait for the mirror to resync. You may want to run the new disk in parallel with the old disk for a few days to make sure that all is well with it; this is fine. You may want to be wary about reboots during this time.

  4. Take the old disk out by first manually failing it and then actually removing it:
    mdadm --fail /dev/md17 /dev/sdb7
    mdadm -r /dev/md17 /dev/sdb7

  5. Finally, shrink the number of active devices in the mirror down to two again:
    mdadm -G -n 2 /dev/md17

You really do want to explicitly shrink the number of active devices in the mirror. A mismatch between the number of actual devices and the number of expected devices can have various undesirable consequences. If a significant amount of time happened between step three and four, make sure that your mdadm.conf still has the correct number of devices configured in it for all of the arrays (ie, two).

Unfortunately marking the old disk as failed will likely get you warning email from mdadm's status monitoring about a failed device. This is the drawback of mdadm not having a way to directly do 'remove an active device' as a single action. I can understand why mdadm doesn't have an operation for this, but it's still a bit annoying.

(Looking at this old entry makes it clear that I've run into the need to grow and shrink the number of active mirror devices before, but apparently I didn't consider it noteworthy at that point.)

Written on 03 April 2014.
« The scariness of uncertainty
An important additional step when shifting software RAID mirrors around »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Apr 3 19:51:05 2014
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.