The better way to clear SMART disk complaints, with safety provided by ZFS
A couple of months ago I wrote about clearing SMART complaints
about one of my disks by very carefully
overwriting sectors on it, and how ZFS made this kind of safe. In
a comment, Christian Neukirchen
hdparm --write-sector to overwrite sectors with
read errors instead of the complicated dance with
dd that I used
in my entry. As it happens, that disk
coughed up a hairball of
smartd complaints today, so I got a
chance to go through my procedures again and the advice is spot on.
hdparm makes things much simpler.
So my revised steps are:
- Scrub my ZFS pool in the hopes that this will make the problem go
away. It didn't, which means that any read errors in the partition
for the ZFS pool is in space that ZFS shouldn't
ddto read all of the ZFS partition. I did this with '
dd if=/dev/sdc7 of=/dev/null bs=512k conv=noerror iflag=direct'. This hit several bad spots, each of which produced kernel errors that included a line like this:
blk_update_request: I/O error, dev sdc, sector 1748083315
hdparm --read-sectorto verify that this is indeed the bad sector:
hdparm --read-sector 1748083315 /dev/sdc
If this is the correct sector,
hdparmwill report a read error and the kernel will log a failed SATA command. Note that is not a normal disk read, as
hdparmis issuing a low-level read, so you don't get a normal message; instead you get something like this:
ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 ata3.00: irq_stat 0x40000001 ata3.00: failed command: READ SECTOR(S) EXT ata3.00: cmd 24/00:01:73:a2:31/00:00:68:00:00/e0 tag 3 pio 512 in res 51/40:00:73:a2:31/00:00:68:00:00/00 Emask 0x9 (media error) [...]
The important thing to notice here is that you don't get the sector reported (at least not in decoded form), so you have to rely on getting the sector number correct in the
hdparmcommand instead of being able to cross check it against earlier kernel logs.
(Sector 1748083315 is 0x6831a273 in hex. All the bytes are there in the
cmdpart of the message, but clearly shuffled around.)
hdparm --write-sectorto overwrite the sector, forcing it to be spared out:
hdparm --write-sector 1748083315 <magic option> /dev/sdc
hdparmwill tell you what the hidden magic option you need is when you use
- Scrub my ZFS pool again and then re-run the
ddto make sure that I got all of the problems.
I was pretty sure I'd gotten everything even before the re-scrub
and the re-
dd scan, because
smartd reported that there were no
more currently unreadable (pending) sectors or offline uncorrectable
sectors, both of which it had been complaining about before.
This was a lot easier and more straightforward to go through than
my previous procedure, partly because I can directly reuse the
sector numbers from the kernel error messages without problems and
hdparm does exactly what I want.
There's probably a better way to scan the hard drive for read
dd. I'm a little bit nervous about my 512Kb block
size here potentially hiding a second bad sector that's sufficiently
close to the first, but especially with direct IO I think it's a
tradeoff between speed and thoroughness. Possibly I should explore
how well the
badblocks program works here, since it's the obvious
(These days I force
dd to use direct IO when talking to disks
because that way
dd does much less damage to the machine's overall
(This is the kind of entry that I write because I just looked up my first entry for how to do it again, so clearly I'm pretty likely to wind up doing this a third time. I could just replace the drive, but at this point I don't have enough drive bay slots in my work machine's case to do this easily. Also, I'm a peculiar combination of stubborn and lazy where it comes to hardware.)