An advantage for hardware RAID over software RAID

June 26, 2009

I am generally fairly negative on hardware RAID; I feel that both in theory and in especially in practice, it is almost never a benefit. However, today I realized that there is one way that a hardware RAID card could have an advantage: avoiding PCI bandwidth limits during RAID reconstruction.

In the general case, RAID reconstruction has to read all of the remaining intact disks and then write back to the new disk. With software RAID, this data must cross the PCI bus, since it is the server's main CPU and RAM that do all of the work. With hardware RAID, nothing crosses the PCI bus; it all happens on the card.

But is the PCI bus going to be the limiting factor? I think that it's at least possible. There is some evidence that our iSCSI targets are PCI bandwidth limited for sequential reads with 1 TB disks; they can read from each individual disk at around 105 MBytes/sec, but if we try reading all 12 at once, we get only around 59 MBytes/sec from each disk (for an aggregate 708 MBytes/sec, much less than the theoretical 1260 MBytes/sec we'd get if we could drive each disk at full speed).

(We were reading from the raw device, so there was no filesystem overhead. Which isn't to say that we weren't running into some other kernel performance limit, instead of an intrinsic hardware one. And for that matter, the hardware limits may be in our ESATA controller cards instead of in the PCI bus, although at that point it doesn't make a difference for planning systems; if you can't get an ESATA controller that works fast enough but you can get a hardware RAID controller that does, you don't care too much about exactly why the ESATA controller isn't fast enough.)

One might say that RAID reconstruction is an obscure corner case that's not worth optimizing for. Well, yes, but on the other hand when it happens people tend to care a great deal about how fast you can return your RAID to full protection (and get upset if it is not pretty fast).


Comments on this page:

From 82.69.129.105 at 2009-06-26 21:06:29:

Chris,

You absolutely have to effectively reconstruct your RAID regularly: you must verify that the data on each disk is corresponds to its mirrors or checksums on the other disk(s).

As I'm sure you're aware, disks accumulate errors and RAID can mask them because you may never read the corrupted sectors -- until the time one of your disks dies and you need to reconstruct the array from what's left. Then another failed read can break your array and lose you at least some data.

In operating systems like Debian, the mdadm package comes with a cron job to verify the disks once per month. Most RAID cards do the same in hardware (e.g. the 3ware cards have a "verify" task that you can schedule).

What I've found is that Linux software RAID performs absolutely fine for serving data, but the verify task under moderate write load can take days and does impact performance. I can speed it up by letting it kill performance more, or I can slow it down so it takes weeks and isn't so noticeable.

By contrast, the 3ware setups with the same disks and RAID configuration manage it in about 12 hours with no noticeable performance impact.

I can only assume that this is because the data doesn't need to come across the PCIe bus and through the kernel.

Cheers, Andy

Written on 26 June 2009.
« Patching systems versus patching appliances
How not to set up your DNS (part 19) »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Jun 26 02:27:12 2009
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.