The advantages of open source software RAID

December 24, 2009

In light of a recent entry, I feel like singing the praises of software RAID, especially open source software RAID. For the moment, let's set aside the performance and cost issues of software RAID versus hardware RAID, because it's honestly not what I really care most about.

Software RAID in general has two major advantages:

  • much more of what is going on is exposed, instead of being locked away inside a black box. For example, you are guaranteed to see the raw disk status and error messages if something starts to go wrong, and often you can directly inspect system state with the tools of your choice.

  • it is not tied to having specific hardware. If you can bring your OS up on something that you can plug your disks into, you can get at your data again.

The alternate phrasing of the second advantage is that hardware RAID is portable across operating systems while software RAID is portable across hardware. In practice, the second is vastly more important to most people than the former.

The further advantage of open source software RAID is that you can actually look into things to find out what's going on, and you may be able to do something about any problems that you run into. (In short, it's even more open to inspection than ordinary software RAID.)

In theory, it's possible for hardware RAID to be almost as open and as inspectable as software RAID. In practice, I don't think I've ever seen hardware RAID come close; the degree of openness required seems to be foreign to hardware RAID companies, who often barely document the interface to their cards much less things like RAID storage formats.

All of this matters because every RAID implementation has historically had bugs; the only question is when you find out about them and how much you can do about it. Hardware RAID and closed software RAID is saying 'trust us, we got it right this time and all of our stuff just works'. The positive side of my Linux software RAID situation is that if I want to, I actually can instrument the kernel, get full reports about everything going on, and so on, which is a lot more than I'd get with hardware RAID.


Comments on this page:

From 82.69.129.105 at 2009-12-24 08:28:34:

Chris,

What's your take on the write caching problem?

Most of my servers are 4 disk 1U jobs with a 3ware RAID controller with battery backup unit, configured as RAID-10.

Without the 3ware and using Linux software RAID-10, performance is very good (often better), and I appreciate the open nature -- I follow the Linux-RAID list and am continually amazed at the disastrous situations people are bailed out of due to the documented layout of the data.

The problem is that having no battery-backed cache, I have to turn off the disk's write cache, which really hurts performance.

As far as I can see there is no such thing as a SAS controller with a BBU that just does JBOD. You have to buy a real RAID card and then it may not even use the BBU in JBOD mode anyway. I can't see one cheaper than about $500. A gap in the market?

Do you just turn off disk write caceh and sped the money saved on the RAID card on extra spindles?

Cheers,

Andy

From 24.97.127.154 at 2009-12-24 10:14:47:

I totally agree. After being raised on "Software RAID is crap" mantra from gray beards my whole SA life, I went with Linux MDRAID. When my server died and I was able to just chuck the disks into my tower and rebuild the array I was sold. If performance isn't a factor I can't think of any reason to NOT use this.

By cks at 2009-12-31 22:32:32:

The short version of my view on write caching is that I don't like it in general for reasons that don't fit in the margin of this comment. If we had this problem I would say that the software RAID answer is to treat the entire system as a 'RAID controller' and put it on a UPS, or to accelerate your system at a higher level by things like putting filesystem logs on an SSD.

(Adding a UPS is not quite the same thing as a controller with a BBU, because the whole system is subject to more failure modes, such as kernel panics.)

Written on 24 December 2009.
« Another demonstration of SSL Certification Authority (in)competence
Linux's non-strict overcommit is the right default »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Dec 24 02:41:14 2009
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.