Wandering Thoughts archives

2006-02-28

Practical RAID-1 read balancing

If you're doing performance analysis of a RAID-1 setup, one of the interesting questions is 'which drive gets read from when?'

(Measuring how balanced the current IO load is is useful, but it doesn't tell you how a changed load will affect the balance.)

Since seeks are the expensive thing on modern drives, you want your RAID-1 system to deal in requests, not in individual blocks, and to send a sequence of sequential reads off to the same disk. If your IO isn't strictly sequential, ideally the system keeps track of the last known head position for each disk (influenced by writes as well as by reads), and issues to the disk that is some combination of positioned closest and least loaded.

But all of this is theory. What you really need to do is measure, because you can never be sure just what a RAID-1 system is doing. Sometimes it can really surprise you.

For example, I once dealt with a reasonably fancy hardware RAID-10 controller where which disk a read would go to was statically determined, not based on load or anything. The controller divided the RAID-10 array up into slices (64K at the time); reads from odd slices went to the first disk in a mirror pair and reads from even slices went to the second. In extreme cases, half the array could be completely idle while the other half was melting down. (Presumably that didn't happen too often.)

We only found this out by running a test program against an otherwise idle test array and watching the front panel disk lights, as part of trying to sort out how the array distributed IO over the drives in general. Much to our surprise, in one test half the disks went active and the other half didn't. All I can say is thank goodness for front panel lights.

Sidebar: RAID-10 versus RAID-01

Because I always get them confused: RAID-10 is striped mirrors, RAID-0 on top of RAID-1. RAID-01 is the reverse, mirrored stripes, and is much less resilient against failures of two or more disks; RAID-01 dies unless all failed drives are in the same stripe, whereas RAID-10 will survive unless both sides of a single mirror die.

RAID1ReadBalancing written at 23:10:30; Add Comment

A surprising effect of RAID-1 resynchronization

Today I got to run into an interesting performance impact of having a RAID-1 mirror resync running on a big partition of a live system.

An important system was having performance problems today, so we were poking around it. When we watched the disk statistics, we noticed that only the first disk was seeing read traffic; the second disk was loafing along with just occasional bursts of writes. Looking more closely we noticed that a RAID-1 resync of a big partition was in progress; because the system was loaded, the resync's IO bandwidth had been choked and it hadn't gotten very far, only 5% or so in a 100G partition.

Then the light dawned. Normally, reads are distributed over both sides of a RAID-1 mirror. However, at the moment only 5% of the second disk was valid; a read for something in the remaining 95% could only be be done by the first disk. No wonder the first disk was running hot and the second disk was seeing virtually no reads.

Like everybody, I already knew about the direct IO impact of a RAID-1 resync. But the choking effect of not being able to read from both disks for most of the filesystem hadn't previously occurred to me.

Sidebar: what's a RAID-1 resync?

A RAID-1 resync is what happens when the two disks in a RAID-1 mirror cease to be identical copies of each other, usually due to some calamity (power loss, system crash, disk failure). When this happens, one of the mirrors is identified as the most up to date and its data gets dumped to the other disk to bring them back into sync.

The obvious effect of a RAID-1 resync is that it adds extra IO to the system: reads on the first disk, writes on the second disk. However, any decent RAID system has various things to limit this IO so that it happens more or less when the disks are idle and doesn't steal IO bandwidth from real work.

RAID1ResyncSurprise written at 00:33:05; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.