How Linux software RAID is making me grumpy right nowThis weekend, one of my machines sent me email to report:
What this means (as opposed to what it says) is that a software RAID data scrub has detected some number of inconsistencies between the mirrors for two of my software RAID devices. (I believe that the kernel also notices this under some other
circumstances, but I can't follow the code well enough to be
sure or tell what they are. (The Let me inventory the obvious failures here.
The lack of information about where the errors are is extremely bad, because there is no actual repair process for this problem. The software RAID 'repair' operation is not a repair, it is a resync; if there is an inconsistency, it picks one side of the mirror (somehow) and force-updates the other to match it. There is no certainty that it will pick the right one. Therefor, if this happens to you you are best off doing nothing until you can specifically identify what was damaged (if anything) and then either try to recover data from the other mirror or restore things from backups. I foresee a very long downtime with a live CD in my future. Or some kernel hacks. Or both. The final failure is what may have caused this inconsistency. According to Neil Brown (in a message quoted here), under some circumstances the software RAID code can write inconsistent data to the two sides of the mirror because it allows the page to be changed between when it is written to one side and when it is written to the other. According to his message, this should be harmless because the newly-dirty page will be rewritten at some point. Other reports suggest strongly that this is not the case and that the inconsistencies can persist in real files. I am frankly dumbfounded that any software RAID implementation allows inconsistent data to be written to different sides of its mirrors. It strikes me as an utterly basic correctness invariant that a RAID-1 pair is always in sync (apart from in-flight writes, etc etc) in the abscence of disk errors and abnormal shutdowns. (One comment.)
|
These are my WanderingThoughts GettingAround This is part of CSpace, and is written by ChrisSiebenmann. * * * Atom feeds are available; see the bottom of most pages. Categories: links, linux, programming, python, snark, solaris, spam, sysadmin, tech, unix, web |