How Linux software RAID is making me grumpy right now
This weekend, one of my machines sent me email to report:
WARNING: mismatch_cnt is not 0 on /dev/md0
WARNING: mismatch_cnt is not 0 on /dev/md3
What this means (as opposed to what it says) is that a software RAID data scrub has detected some number of inconsistencies between the mirrors for two of my software RAID devices.
(I believe that the kernel also notices this under some other
circumstances, but I can't follow the code well enough to be
sure or tell what they are. (The
mismatch_cnt it is talking
about is the one found in
/sys/block/mdN/md. You can
read the full discussion about it in Documentation/md.txt.)
Let me inventory the obvious failures here.
raid-checkscript doesn't bother to tell you what
mismatch_cntis, apart from 'not zero'. Since this is both volatile (it's only in kernel memory so it gets reset on reboot) and a measure of how much inconsistency was found, sysadmins would kind of like to have it recorded for posterity. Speaking for myself, I would really like to know if my arrays are progressively getting more and more inconsistent every week, or if it seems to have happened once and then stopped.
- the software RAID code does not log any messages when it detects
inconsistencies. If you do not know to look at
mismatch_cntand naively just watch syslog or the kernel messages, you are out of luck.
- worse, the software RAID code doesn't tell you where the errors
are. What do they affect? You have no way of finding out short
of duplicating the work yourself in order to actually find out
the sector numbers.
(I have read of people who shut down the software RAID device, directly mount each side's filesystem read-only, and
diff -rthem. People with LVM on software RAID are plain out of luck.)
The lack of information about where the errors are is extremely bad, because there is no actual repair process for this problem. The software RAID 'repair' operation is not a repair, it is a resync; if there is an inconsistency, it picks one side of the mirror (somehow) and force-updates the other to match it. There is no certainty that it will pick the right one.
Therefor, if this happens to you you are best off doing nothing until you can specifically identify what was damaged (if anything) and then either try to recover data from the other mirror or restore things from backups. I foresee a very long downtime with a live CD in my future. Or some kernel hacks. Or both.
The final failure is what may have caused this inconsistency. According to Neil Brown (in a message quoted here), under some circumstances the software RAID code can write inconsistent data to the two sides of the mirror because it allows the page to be changed between when it is written to one side and when it is written to the other. According to his message, this should be harmless because the newly-dirty page will be rewritten at some point. Other reports suggest strongly that this is not the case and that the inconsistencies can persist in real files.
I am frankly dumbfounded that any software RAID implementation allows inconsistent data to be written to different sides of its mirrors. It strikes me as an utterly basic correctness invariant that a RAID-1 pair is always in sync (apart from in-flight writes, etc etc) in the abscence of disk errors and abnormal shutdowns.