When you don't want RAID-5

September 6, 2007

Here's a paradox that we only realized recently: in some situations, using RAID-5 can be less reliable overall than using no RAID at all.

This comes about because while RAID-5 preserves your data over a single-drive failure, it loses all your data if there is ever a double drive failure.

Our specific case is a disk-based incremental backup system. Right now a day's backups take up about a third of a disk, and we have it set up so each day's backups go to a different disk (eventually cycling around). The older a backup is the less useful it is. If we lose the disk with yesterday's incrementals we will still have several previous days, so we are not too bad off even after the worst single disk failure (and losing older disks is less damaging). If we lose two disks we are much better off than with RAID-5, since we still have all the remaining backups and thus can (at worst) get back three days ago.

And of course, not using RAID-5 gets us three more days of online incrementals.

(This is not our only backup system; we do less frequent backups to tape. These have the full backups that serve as the baseline for the incrementals.)

What makes this situation work is that losing some of the data is not really a fatal thing while losing all of the data would be fairly alarming, combined with the fact that we can fit each 'unit' of data on a single disk.


Comments on this page:

From 121.72.82.232 at 2007-09-13 05:12:12:

I had this problem on a client's server some years ago running Red Hat Linux. The RAID system was setup as drive mirroring. The first drive failed at some point, but the system kept working until the second drive failed. But by that stage the computer server failed completely (as it would). I was able to resurrect the system, but it left me skeptical of RAID systems - especially in systems where there is no on-site system administrator to regularly check the system.

From 124.170.73.31 at 2007-09-13 08:53:41:

That's why you setup automatic email notification of RAID drive failure!

RAID5 is bad news; stay away from it. If you want to keep your data moderately reliably, use RAID1 and an incremental backup scheme (offsite, on-net, anything other than on hard disk on the same machine as the RAID).

From 199.72.20.100 at 2007-09-13 12:37:17:

Are you forgetting integrated hot-swap RAID 5E and 5EE. If you can loose two drives before your data gives up the ghost, then you can configure proper notifications and replace your dead FRU without worry. Also, have you considered redundant arrays configured as software RAID 1? If you use a remote iSCSI device as your mirror drive you can keep a complete copy of your data colo'ed away from your burned out crust of a data center. RAID 5 isn't the problem, badly designed storage solutions are the problem.

From 12.158.220.130 at 2007-09-13 13:18:41:

RAID5 is a good thing.

A distinction that I've heard elsewhere applies here:

   RAID is a *CONTINUITY* strategy, not a backup strategy.

RAID shrinks recovery time down to nothing in many situations. Backups do not.

Backups provide "deep sh*t" recovery.

The two purposes can overlap.

By cks at 2007-09-13 16:33:16:

RAID always has a trade-off of cost (or overhead) versus the amount of protection you get. Picking RAID-5 over RAID-1 should mean in part that you have decided that you cannot afford (or do not need) that much protection and you are willing to be exposed to two-disk total data loss.

(It is not just a straight RAID-1 versus RAID-5 choice, either; you can choose how many RAID-5 groups you split your disks into, trading off overhead against how much data you will lose if you have two disks fail in the same group and so on.)

From 12.109.229.8 at 2007-09-13 17:23:41:

This is a rather short-sighted approach and assumes you will always be able to fit a complete backup onto one disk. Most decent software (and hardware) RAID solutions offer the option to have a hot spare drive. This will reduce your exposure to total loss to the time it takes to re sync your drive back to a spare. RAID 5 offers a much faster read time than your single drive implementations.

From 24.207.191.29 at 2007-09-14 22:23:34:

What about Raid-6 (double parity) such as used by Network Appliance?

By cks at 2007-09-15 00:28:05:

Both a hot spare and RAID 6 lose an extra disk to overhead. Also, hot spare re-sync on a large RAID-5 SATA array is not exactly fast these days, so you can be exposed for a not insignificant amount of time.

Read times faster than raw single disk IO are not exactly a priority for this use; in fact, most of this data will never be read after it is written, because most of the time we hope to never need our backups. That we can read from this setup faster and more conveniently than going to tape is good enough.

No system design is static. If our storage and incremental backup sizes grow enough to invalidate this design (which would have to be a lot of growth), we would have to re-evaluate this approach and perhaps switch to something else. In the mean time, we can gain the benefits, including more days of conveniently accessible online incremental backups in the common case.

From 124.170.16.103 at 2007-09-15 10:58:34:

If you lose a disk in RAID5, even if you have a hot swap disk ready and activate it immediately, there's still a significant amount of time spent writing data to that new disk. The danger is losing a second disk while your repair disk is still being brought online. The repair time for RAID5 is much greater than for RAID1 because there's a lot more data to read.

If the disks are getting old, the probability of a second drive failure increases once the first drive has failed. And if your hot swap is just as old as the RAID5 drives, you may be replacing a failed old device with another which is just as old.

I'd want more redundancy - e.g. can recover from a 2-drive loss.

Written on 06 September 2007.
« Where to find specifications on HTTP POST behavior
My view of what 'strongly typed' means »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Sep 6 23:13:14 2007
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.