Clearing disk errors (or SMART complaints) for Linux software RAID arrays

I've written in the past about clearing SMART disk complaints for ZFS pools, which involved an intricate dance with hdparm and various other things. Due to having a probably failing HD on my home machine, I've now had a chance to deal with much the same issue for a software RAID mirror (with LVM on top). It turns out that trying to fix up disk errors for software RAID levels with redundancy is embarrassingly simple; to try to fix disk read errors, you have software RAID check the array. Because this reads all of every component of the array, it will hopefully discover any read errors and then automatically try to fix them by rewriting the sectors.

(This doesn't happen with ZFS's self-checks, because ZFS optimizes to only read and check used space. If the disk read errors are in currently unused space, they don't get hit.)

To start a check of the array, you write either 'repair' or 'check' to /sys/block/md<N>/md/sync_action. Based on the description in the Linux kernel's software RAID documentation, using 'repair' is better.

Sometimes you know where the error is, for example because the kernel has told you with a message like:

md/raid1:md53: read error corrected (8 sectors at 3416896 on sdd4)
md/raid1:md53: redirecting sector 3416896 to other mirror: sdd4

If you're using mirrored RAID and you want to speed up the repair process (or not take the IO performance hit of re-scanning your entire array while you're trying to do other things), you can limit the portion of the array that 'repair' scans by writing limits to the sync_min and sync_max files in /sys/block/md<N>/md. These are normally '0' and 'max' respectively, which you're going to want to remember because you want to reset them to that after your check is done.

As the documentation more or less covers, the process to do this is:

  1. Echo appropriate values to sync_min and sync_max, perhaps 100 sectors before and after the value reported in the kernel messages.
  2. Start a check by echoing 'repair' to sync_action.
  3. Watch sync_completed until it says that the repair has reached your sync_max value.
  4. Echo 'idle' to sync_action to officially stop the repair.
  5. Reset sync_min and sync_max to their defaults of '0' and 'max'.

If you also have kernel messages logged that report the raw HD sector numbers of your problem sectors, you can also use 'hdparm --read-sector' afterward to verify that sectors no longer have read errors and have identical contents to the version on the good drives.

Of course, all of this is a good reason to make sure that your system automatically does a 'check' (or 'repair') of all of your software RAID arrays on a regular basis. I believe that most current Linux distributions have this already set up, but sometimes these things can get turned off.

Note that you should never clear read errors on software RAID array components by using 'hdparm --write-sector'. With software RAID, it's absolutely crucial that either the sector has the correct contents or that it returns an error. If it reads but returns different data, you have a software RAID inconsistency and may corrupt your data.

PS: As we found out the hard way once, you should keep an eye on how many read errors your software RAID arrays are seeing on their components. Unfortunately this information doesn't seem to currently be captured by things like the Prometheus host agent, so we're probably going to add a script for it. You may also want to keep an eye on the count of sector content mismatches in /sys/block/md<N>/md/mismatch_cnt. Although this is too potentially noisy to alert on and gets reset periodically, it's useful and important enough to be tracked somehow.

