My view on software RAID and the RAID write hole

April 12, 2013

The old issue of Software RAID versus hardware RAID came up recently on Twitter, which got Chris Cowley to write Stop the hate on Software RAID, which prompted a small discussion in which people pointed to the RAID 5 write hole as a reason to prefer hardware RAID over software RAID. I've written several entries about how I favour software RAID but I've never talked about the write hole.

(For now let's ignore some other issues with RAID 5 or pretend that we're talking about RAID 6 instead, which also has this write hole issue.)

I'll start by being honest even if it's painful: hardware RAID has an advantage here. Yes, you can (and should) put your software RAID system on a UPS (or two) and so on, but there are simply more parts that can fail abruptly when you're dealing with a full server than when you're dealing with an on-card battery. This doesn't mean either that hardware RAID is risk free (hardware RAID cards fail too) or that software RAID is particularly risky (abrupt crashes of this sort are extreme outliers in most environments), but it does mean that hardware RAID is less risky in this specific respect.

This is where we get into tradeoffs. Hardware RAID has both drawbacks and risks of its own (relative to software RAID). When building any real system you have to assess the relative importance and real world chances of these risks (and how successfully you feel that you can mitigate them), because real systems are always almost always a balance between (potential) problems. My personal view is that in general, abrupt system halts are a vanishingly rare in properly designed systems. This makes the RAID write hole essentially a non-issue for software RAID.

(Of course there are all sorts of cautions here. For example, if you're operating enough systems the vanishingly rare can start happening more often than you want.)

Thus my overall feeling is (and remains) that most people and most systems are better off with software RAID than with hardware RAID. In practice I think you are much more likely to get bitten by various issues with hardware RAID than you are to blow things up by hitting the software RAID write hole with a system crash or power loss event.

(By the way, if you're seriously worried about the RAID write hole you'll want to carefully verify that your disks actually write data when they tell you that they have. This is probably much less of a risk if you buy expensive 'enterprise' SAS drives, of course.)

Written on 12 April 2013.
« Something I'd like to be easier in Solaris's IPS
Classic linked lists versus Python's list (array) type »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Apr 12 00:24:19 2013
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.