A good reason to use write-intent bitmaps

December 23, 2013

Modern versions of Linux's software RAID support what the documentation calls 'write-intent bitmaps' (see also the md(4) manpage). To summarize the documentation, this bitmap keeps track of which parts of the RAID array are potentially 'dirty', ie where different components of the array may be out of sync with each other. An array with this information can often be resynchronized much faster after a crash or after a situation where one component drops out temporarily. However the drawback of a write intent bitmap is some amount of extra synchronous writes that will probably require seeks, since the relevant portion of the bitmap must be marked as dirty and then flushed to disk before the real write happens.

Now, here's a serious question: how many writes do your systems typically do to their arrays, and how performance-critical are the speeds of those writes? On most of our systems both answers are 'not very', because we're using RAID-1 to make sure systems can ride out a single disk failure.

As far as I can tell the great advantage to write-intent bitmaps is if your system abruptly crashes while it's active (so your arrays have to resync), you have a much better chance to survive if a 'good' source drive for the resync has a read error (whether latent or live). In an ideal world Linux's software RAID would be smart enough in this situation to pick the data from another drive, since it's probably good; however, I'm not sure if it is that smart right now and I don't want to have to trust its smarts. Write-intent bitmaps improve this situation because you're resyncing much less data; instead of a read error anywhere on your entire disk being really dangerous, now you only have to dread a read error on the hopefully small amount being resynced.

Based on this logic I'm going to be turning on write-intent bitmaps on most of our systems, because frankly very few of them are write-intensive. Even on my desktop(s) I think I'd rather live with some write overhead in order to cut down the risk of catastrophic data loss.

(All of this came to mind after a recent power failure at home that required resyncing a more-than-a-TB array. I was suddenly nervous when I started to think about it.)

PS: yes, I know about ZFS on Linux and it would definitely do this much better. But it's not ready yet.

Sidebar: about that overhead

How much the write-intent bitmap overhead affects you depends on IO patterns and how much disk space a given bit in the bitmap covers. The best case is probably repeatedly rewriting the same area of the array, where the bits will get set and then just sit there. The worst case is writing once to spots all over the array, requiring one bit to be marked for every real write you do.

Write intent bitmaps on SSD based arrays are much cheaper but probably not totally free, since they require an extra disk cache flush.

Written on 23 December 2013.
« The benefits of using expendable email addresses for most things
The 'entry as file' blog engine problem with tags »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Dec 23 11:34:16 2013
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.