One example of why I like ZFS on Linux

January 21, 2016

Yesterday evening, my office workstation blared notifications at me of SMART errors on one of my HDs. The disk is old enough by now for this not to be too surprising (and it's a 1 TB Seagate, which we've had some problems with), but still, a disk issue is never exactly welcome, even if all of the data on it is mirrored. Since we've seen SMART errors be not really a problem before, I did the obvious and easy thing to test the situation: I started a ZFS pool scrub on the ZFS pool that takes up most of the disk. This scrub turned up actual read errors (as reported by the disk to the kernel), but it also caused ZFS to say that it was repairing the issue. After the scrub finished without any reported errors (but with 256 KB reported repaired), I did a second scrub; this one did not report any problems and didn't cause the disk to report any read errors.

(Specifically the HD reported '3 currently unreadable (pending) sectors' and '3 offline uncorrectable sectors'. The SMART daemon reported that this condition had cleared somewhat after the first pool scrub and repair finished.)

So, what seems to have happened here is that ZFS scanned most of the disk, found some bad sectors, and quietly rewrote them in place. When it did the rewrite, the normal operation of the HD caused the bad sectors to be spared out and replaced by good ones. My HD is, for the moment, back to being healthy and doesn't need to be replaced. And if does need to replaced, the ZFS scrub gives me pretty good confidence that the data on the other mirror is fully intact and there are no latent read errors that are going to cause me heartburn.

This is not a big save by ZFS, the way I've had on other systems. But I consider it a midsized save; the ZFS scrub turned an alarming and uncertain situation into a much more certain one that may even be fully fixed.

None of this is exceptional for ZFS and parts of it are normal for anyone with mirrored storage (which has saved me from abrupt disk failure at home). But the whole reassuring, simple, and pain free experience is unusual for Linux. And that in a nutshell is one of the big benefits of ZFS on Linux and a good part of why I like it so much. Dealing with potentially failing drives and uncertain read error locations and so on would be much more hassle with basically any other setup, and hassle is exactly the thing I don't want when I'm already jumpy enough because smartd is alarming me.

(There are other reasons to like ZFS, of course. And you can get this sort of scan, checksum verify, and repair experience with btrfs too as far as I know, assuming that you're willing to use btrfs in its current state.)

Written on 21 January 2016.
« Illumos's ZFS prefetching has recently become less superintelligent than it used to be
Memory-safe languages and reading very sensitive files »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Jan 21 00:10:32 2016
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.