A ZFS feature wish: rewriting read errors

June 2, 2010

Today's missing ZFS feature is most easily described by telling you about the problem. Suppose that you have a redundant pool (okay, a redundant vdev) and one of the disks in it develops some bad sectors that can't be read. My current understanding is that this is not a 'replace disk immediately' sign the way that write errors are, and thus can happen on otherwise healthy and usable disks.

(A persistent write error is a 'replace disk immediately' sign because it means that the disk has run out of spare sectors to remap bad sectors to. Modern disks have quite a lot of spare sectors, so seeing an actual errors means that the disk has already had quite a lot of silent write errors that it's fixed up for you.)

Now, you'd like to fix the problem. At the hard drive level, the way to do this is to rewrite the sector so that the hard drive recognizes it as bad and spares it out. Because your pool is redundant, ZFS can recreate the data that should be there and thus it could rewrite the bad sector with the correct data; in fact, if you had a checksum error instead of a read error ZFS would already have done this.

(The hard drive itself can't silently spare out bad sectors on read because it cannot recreate the data that should be in them.)

If ZFS supported doing this rewriting itself, it could fix the problem rapidly and with minimal impact on IO load and pool redundancy. Without ZFS support for rewriting on read errors, you have to fix the problem by hand and the only ZFS-level way to do this (that I know of) is by forcing a full resilver of the device. At a minimum this has a significant IO impact.

(Disclaimer: it's possible that I'm wrong about the danger level of read errors on modern SATA disks. And yes, always immediately replacing disks that report any visible errors may be the cautiously safe approach, but in our environment it has various drawbacks that make us avoid it when possible, including user-visible performance issues as things resilver.)

Written on 02 June 2010.
« My sad little irritation with Twitter
How disk write caches can corrupt filesystem metadata »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Jun 2 23:32:13 2010
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.