Shifting a software RAID mirror from disk to disk in modern Linux
Suppose that you have a software RAID mirror and you want to migrate one side of the mirror from one disk to another to replace the old disk. The straightforward way is to remove the old disk, put in the new disk, and resync the mirror. However this leaves you without a mirror at all for the duration of the resync so if you can get all three disks online at once what you'd like to do is add the new disk as a third mirror and then remove the old disk later. Modern Linux makes this a little bit complicated.
The core complication is that your software RAID devices know how many active mirrors they are supposed to have. If you add a device beyond that, it becomes a hot spare instead of being an active mirror. To activate it as a mirror you must add it then grow the number of active devices in the mirror. Then to properly deactivate the old disk you need to do the reverse.
Here are the actual commands (for my future use if nothing else):
- Hot-add the new device:
mdadm -a /dev/md17 /dev/sdd7
If you look at
/proc/mdstat
afterwards you'll see it marked as a spare. - 'Grow' the number of active devices in the mirror:
mdadm -G -n 3 /dev/md17
- Wait for the mirror to resync. You may want to run the new disk in
parallel with the old disk for a few days to make sure that all is
well with it; this is fine. You may want to be wary about reboots
during this time.
- Take the old disk out by first manually failing it and then actually
removing it:
mdadm --fail /dev/md17 /dev/sdb7
mdadm -r /dev/md17 /dev/sdb7
- Finally, shrink the number of active devices in the mirror down to two
again:
mdadm -G -n 2 /dev/md17
You really do want to explicitly shrink the number of active devices
in the mirror. A mismatch between the number of actual devices and the
number of expected devices can have various undesirable consequences. If a significant amount of time happened
between step three and four, make sure that your mdadm.conf
still has
the correct number of devices configured in it for all of the arrays
(ie, two).
Unfortunately marking the old disk as failed will likely get you warning
email from mdadm
's status monitoring about a failed device. This is
the drawback of mdadm
not having a way to directly do 'remove an
active device' as a single action. I can understand why mdadm
doesn't
have an operation for this, but it's still a bit annoying.
(Looking at this old entry makes it clear that I've run into the need to grow and shrink the number of active mirror devices before, but apparently I didn't consider it noteworthy at that point.)
|
|