How Linux software RAID is making me grumpy right now

December 16, 2009

This weekend, one of my machines sent me email to report:

WARNING: mismatch_cnt is not 0 on /dev/md0
WARNING: mismatch_cnt is not 0 on /dev/md3

What this means (as opposed to what it says) is that a software RAID data scrub has detected some number of inconsistencies between the mirrors for two of my software RAID devices.

(I believe that the kernel also notices this under some other circumstances, but I can't follow the code well enough to be sure or tell what they are. (The mismatch_cnt it is talking about is the one found in /sys/block/mdN/md. You can read the full discussion about it in Documentation/md.txt.)

Let me inventory the obvious failures here.

  • Fedora's raid-check script doesn't bother to tell you what mismatch_cnt is, apart from 'not zero'. Since this is both volatile (it's only in kernel memory so it gets reset on reboot) and a measure of how much inconsistency was found, sysadmins would kind of like to have it recorded for posterity. Speaking for myself, I would really like to know if my arrays are progressively getting more and more inconsistent every week, or if it seems to have happened once and then stopped.

  • the software RAID code does not log any messages when it detects inconsistencies. If you do not know to look at mismatch_cnt and naively just watch syslog or the kernel messages, you are out of luck.

  • worse, the software RAID code doesn't tell you where the errors are. What do they affect? You have no way of finding out short of duplicating the work yourself in order to actually find out the sector numbers.

    (I have read of people who shut down the software RAID device, directly mount each side's filesystem read-only, and diff -r them. People with LVM on software RAID are plain out of luck.)

The lack of information about where the errors are is extremely bad, because there is no actual repair process for this problem. The software RAID 'repair' operation is not a repair, it is a resync; if there is an inconsistency, it picks one side of the mirror (somehow) and force-updates the other to match it. There is no certainty that it will pick the right one.

Therefor, if this happens to you you are best off doing nothing until you can specifically identify what was damaged (if anything) and then either try to recover data from the other mirror or restore things from backups. I foresee a very long downtime with a live CD in my future. Or some kernel hacks. Or both.

The final failure is what may have caused this inconsistency. According to Neil Brown (in a message quoted here), under some circumstances the software RAID code can write inconsistent data to the two sides of the mirror because it allows the page to be changed between when it is written to one side and when it is written to the other. According to his message, this should be harmless because the newly-dirty page will be rewritten at some point. Other reports suggest strongly that this is not the case and that the inconsistencies can persist in real files.

I am frankly dumbfounded that any software RAID implementation allows inconsistent data to be written to different sides of its mirrors. It strikes me as an utterly basic correctness invariant that a RAID-1 pair is always in sync (apart from in-flight writes, etc etc) in the abscence of disk errors and abnormal shutdowns.


Comments on this page:

From 65.38.42.251 at 2009-12-16 09:24:17:

Wow...that's just awful. Making a mental note to stay away from software RAID in Linux.

-- Saint Aardvark the Carpeted

Written on 16 December 2009.
« The high costs of true security paranoia in the face of compromises
The good and bad of SQL »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Dec 16 01:58:15 2009
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.