More on mismatched sectors on Linux software RAID mirrors

January 18, 2010

Some brief followups from my first entry on this.

First, the mismatch_cnt numbers are reset from scratch every time you re-run a check (and probably every time there is a mirror resync). On many current systems, this means that they will be reset every week. This makes sense and is even implied by the documentation (in the usual Unix fashion of reading between the lines), but it would have been nice to have it explicitly documented.

(I'm aware that I'm grumpy about this, but sysadmins really care about having clear and explicit documentation about what error messages mean. Sadly we rarely get either clear error messages or clear documentation about them.)

Second, I have seen the numbers go up and down from week to week, some times significantly, and I've even seen the problem go away for one of my software RAID devices (the smaller one, in both size and error counts) and then come back worse. I can't say that this makes me even more unhappy, because I was pretty unhappy from the start, but it does mean that whatever has started causing my problems with this is an ongoing problem, not a one-time event.

Unfortunately I have no practical alternative to software RAID in Linux at the current time. However, the urge to add some sort of real error logging to the kernel code for this is getting stronger and stronger.

(Please do not suggest hardware RAID; it isn't practical for various reasons, and I still believe that software RAID is better.)

Written on 18 January 2010.
« I do not like Unix's fossilization
OpenSolaris versus Solaris »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Jan 18 02:18:59 2010
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.