An unpleasant surprise about ZFS scrubbing in Solaris 10 U6

July 10, 2009

Here is something that we discovered recently: ZFS will refuse to scrub a pool that is DEGRADED, even if the degraded state is harmless to actual pool redundancy and there is no resilvering going on. In the usual ZFS manner, it doesn't give you any actual errors, it just doesn't do anything when you ask for a pool scrub.

(Now, I can't be completely and utterly sure that it was the DEGRADED state that blocked the scrub and not coincidence or something unrelated. But I do know that the moment we zpool detach'd the faulted device, restoring its vdev and thus the entire pool to the normal ONLINE state, we could start a scrub that did something.)

Regardless of what exactly is causing this, this behavior is bad (and a number of other words). When your pool is experiencing problems is exactly when you most want to scrub it, so you have the best information possible about how bad the problem is (and where it is) and you don't take actions in haste that actually make your problems worse.

I don't know what you could do if you couldn't detach the device. It's possible that ZFS somehow thought that the pool was still resilvering and thus that either exporting and importing the pool or rebooting the server would have fixed the problem (both of these often reset information about scrubs in zpool status output).

(Neither was an option on a production fileserver, so we didn't try them; this is pure speculation.)

Sidebar: exactly what happened

Last week, one side of one mirror in a pool on one of our ZFS fileservers started reporting read errors (and the iSCSI backend started reporting drive errors to go with it). Since we were shorthanded due to vacations, we opted to not immediately replace the disk; instead we added a third device to that vdev to make it a three-way mirror, so that we would still have a two-mirror even if the disk failed completely. That evening, the disk started throwing up enough read errors that ZFS declared it degraded and pulled in one of the configured spares for that pool and reconstructed the mirror, exactly as it was supposed to.

However, this put that vdev and thus the entire pool into a DEGRADED state, which is sort of reasonably logical from the right perspective (the pool is still fully redundant, but it has degraded from the configuration you set up). And, as mentioned, we couldn't scrub the pool; attempts to do so didn't do anything except change the time the 'add-the-spare' resilver nominally completed at to the current time in the output of zpool status.

Written on 10 July 2009.
« The high-level version of how mounting NFS filesystems works
What can go wrong in making NFS mounts »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Jul 10 01:40:18 2009
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.