The impact of single-disk slow writes on mirrors (and other RAID arrays)

April 13, 2010

We had a performance problem today that we fairly rapidly determined came from our mail spool filesystem somehow being too slow. This was pretty impressive, because our mail spool sits on a four-way mirror on our fileservers and we could easily see that our IO rate on its fileserver wasn't particularly large, well below what we knew things could manage.

(Using NFS with a single interface makes IO rate easy to measure directly, no matter how complex your IO system; just look at the network traffic volume.)

Intuitively, I have traditionally expected RAID mirrors to perform well even if one disk is busy or slower to respond than others, so that as long as you had enough mirrors you could expect decent performance more or less no matter what happened. While this is true for reads, it is not true for writes; mirror writes wait on the slowest disk, because in a conventional mirror setup the write must hit all disks before it's considered complete. This means that a single disk that has slow writes can drag down the (write) performance of an entire mirror array, no matter how many ways your mirrors are, as eventually all of your write traffic bottlenecks for that single disk to write things out.

(This happens immediately with synchronous write IO but can also happen with asynchronous writeback IO if your sustained write bandwidth is above what the slow disk can handle. You can also be hit with read slowdowns in some situations.)

This is what happened to us today. Various programs make lockfiles in our mail spool, creating a modest amount of write traffic, and one particular disk was being hit by a significant write IO load from another source. It got just slow enough to start backlogging write traffic, and eventually everyone piled up on it in a massive (and surprising) traffic jam.

(This also happens with RAID 5 and RAID 6 arrays, which clearly have to write to all disks in order to make the stripe parity work out correctly.)

This isn't the first time that this has happened to us, and it can be relatively subtle; we've had filesystems that felt kind of slow, and the culprit turned out to be that one of the disks that they were mirrored to wasn't quite up to the write load that it was being asked to carry. Clearly this is something that I need to remember the next time we have an oddly slow mirror or RAID set.

(The problem can come and go, too, based on how much write load you're putting on the filesystem at the time.)


Comments on this page:

From 92.236.77.29 at 2010-04-14 18:01:21:

This might be a silly question, but do you often have different disks in mirrors/RAID setups of various kinds, or perhaps disks with other partitions on them as well as the RAID?

By cks at 2010-05-04 12:30:44:

Department of belated replies: the short answer is yes, and this is actually not an uncommon thing if you have a SAN. A common SAN setup is backend RAID controllers doing RAID 5 or RAID 6 and exporting chunks of space to servers, which put separate filesystems on each chunk. At this point each filesystem is exposed to IO from other filesystems, and it gets worse if a server mirrors a filesystem across controllers for reliability.

Our setup is somewhat different but results in the same net effect; backend disks are divided into multiple chunks so we can be more or less indifferent to disk sizes, and different chunks will be used by different ZFS pools.

(It would be better if we could use multiple chunks from the same disk in the same ZFS pool, but this turns out to have a very bad effect on ZFS performance.)

Written on 13 April 2010.
« The importance of figuring out low-level symptoms of problems
The myth of a completely shared knowledge base across sysadmins »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Tue Apr 13 00:56:20 2010
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.