Linux software RAID resync speed limits are too low for SSDs

May 8, 2020

When you add or replace a disk in Linux's software RAID, it has to be resynchronized with the rest of the RAID array. As very briefly covered in the RAID wiki's page on resync, this resync process has speed limits that are controlled by the kernel sysctls and (in KBytes a second). As covered in md(4)), if there's no other relevant IO activity, resync will run up to the maximum speed; if there is other relevant IO activity, the resync speed will throttle down to the minimum (which many people would raise on the fly in order to make resyncs go faster).

(In current kernels, it appears that relevant IO activity is any IO activity to the underlying disks of the software RAID, whether or not it's through the array being resynced.)

If you look at your system, you will very likely see that the values for minimum and maximum speeds are 1,000 KB/sec and 200,000 KB/sec respectively; these have been the kernel defaults since at least 2.6.12-rc2 in 2005, when the Linux kernel git repository was started. These were fine defaults in 2005 in the era of hard drives that were relatively small and relatively slow, and in particular for you were very unlikely to approach the maximum speed even on fast hard drives. Even fast hard drives generally only went at 160 Mbytes/sec of sustained write bandwidth, comfortably under the default and normal speed_limit_max.

This is no longer true in a world where SSDs are increasingly common (for example, all of our modern Linux servers with mirrored disks use SSDs). In theory SSDs can write at data rates well over 200 MBytes/sec; claimed data rates are typically around 500 Mbytes/sec for sustained writes. In this world, the default software RAID speed_limit_max value is less than half the speed that you might be able to get, and so you should strongly consider raising if you have SSDs.

You should probably also raise speed_limit_min, whether or not you have SSDs, because the current minimum is effectively 'stop the resync when there's enough other IO activity' since modern disks are big enough that they will often take more than a week to resync at 1,000 KB/sec. You probably don't want to wait that long. If you have SSDs, you should probably raise it a lot, since SSDs don't really suffer from random IO slowing everything down the way HDs do.

(Raising both of these significantly will probably become part of our standard server install, now that this has occurred to me.)

Unfortunately, depending on what SSDs you use, this may not do you as much good as you would like, because it seems that some SSDs can have very unimpressive sustained write speeds in practice over a large resync. We have a bunch of basic SanDisk 64 GB SSDs (the 'SDSSDP06') that we use in servers, and we lost one recently and had to do a resync on that machine. Despite basically no other IO load at the time (and 100% utilization of the new disk), the eventual sustained write rate we got was decidedly unimpressive (after an initial amount of quite good performance). The replacement SSD had been used before, so perhaps the poor SSD was busy frantically erasing flash blocks and so on as we were trying to push data down its throat.

(Our metrics system makes for interesting viewing during the resync. It appears that we wrote about 43 GB of the almost 64 GB to the new SSD at probably the software RAID speed limit before write bandwidth fell off a cliff. It's just that the remaining portion of about 16 GB of writes took several times as long as the first portion.)

Written on 08 May 2020.
« Modern versions of systemd can cause an unmount storm during shutdowns
Revisiting what the ZFS recordsize is and what it does »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri May 8 00:20:57 2020
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.