An alarming ZFS status message and what is usually going on with it
Suppose that you have a ZFS pool with redundancy (mirroring or ZFS's
version of RAID 5 or RAID 6), and that someday you run '
and see the alarming output:
status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'.
(This has been re-linewrapped for my convenience.)
The rest of the
zpool status output should have one or more disks with
CKSUM fields and a final line that reports '
errors: No known
What this generally really means is something like this:
ZFS has detected repairable checksum errors and has repaired them by rewriting the affected disk blocks. If the errors are from a slowly failing disk, replace the disk with '
zpool replace'; if they are instead from temporary problems in the storage system, clear this message and the error counts with '
zpool clear'. You may wish to check this pool for other latent errors with '
(I have to admit that Sun's own error explanation page for this is pretty good, too. This is unfortunately somewhat novel, which explains why I didn't look at it before now.)
I assume that ZFS throws up this alarming status message even though it automatically handled the issue because it doesn't want to hide that a problem happened from you. While the problem might just be a temporary glitch (we've seen this a few times on our iSCSI based fileservers), it might instead be an indication of a more serious issue that you should look into, so at least you need to know that something happened.
(And even temporary glitches shouldn't happen all that often, or ideally at all; if they do, you have a problem somewhere.)
Sidebar: Our experience with these errors
We've seen a few of these temporary glitches with our iSCSI based
fileservers. So far our procedure to deal with
this is to note down at least which disk had the checksum errors
(sometimes we save the full '
zpool status' output for the pool),
zpool clear' the errors on that specific disk, and then '
scrub' the pool. This should normally turn up a clean bill of health;
if it doesn't, I would re-clear and re-scrub and then panic if the
second scrub did not come back clean. (Okay, I wouldn't panic, but I
would replace the disk as fast as possible.)
On our fileservers, my suspicions are on the hardware or driver for the
onboard nVidia Ethernet ports. The fileservers periodically report that
they lost and then immediately regained the link on
nge0, which is one
of the iSCSI networks, and usually report
at the same time. Unfortunately, the ever so verbose Solaris fault
manager system does not log when the ZFS checksum errors are detected,
so we can't correlate them to
nge0 link resets.
(In contributing evidence, the Linux iSCSI backends, running on very similar hardware, also had problems with their onboard nVidia Ethernet ports under sufficient load.)
Why btrfs was inevitable: a corollary to (not) getting ZFS in Linux
One of the things I said before about ZFS in Linux boils down to that it's a lot of hard work to get outside code into the Linux kernel. This has an important corollary.
This sort of hard work, of modifying foreign code so that it fits into the Linux kernel, takes people who are pretty skilled Linux kernel programmers. These people are skilled enough to have a choice of what to do, so they can either work on the tedious grinding job of adopting other people's code into the kernel or they can write something new of their own (and get it into the kernel if it's kernel code).
It should not surprise anyone that most programmers, facing a choice between grinding maintenance work and writing something new and interesting, will pick writing something new and interesting. My impression is that Linux kernel hackers are not an exception to this, and that pretty much all of the Linux kernel hackers that are competent to get outside code into the Linux kernel would rather write something new, and generally they do.
Thus, even without other issues we would almost certainly get btrfs instead of an adoption of ZFS; it's simply more interesting for the people doing the work. Arguments that Linux kernel programmers should choose the boring work anyways are missing the point in several ways, including that people simply don't behave that way no matter what you would like.
(The one exception to this is, of course, when someone is paying kernel hackers to do the grinding work. I don't think it's a coincidence that the integration of things like IBM's JFS, SGI's XFS, and even Reiserfs have mostly been driven by employees of their respective companies.)