== What is going on with faulted ZFS spares Suppose that you have several pools with a shared spare disk. One day you reboot your machine, and suddenly '_zpool status_' for most of your pools starts reporting that your spare is faulted: # zpool status tank pool: tank [...] config: NAME STATE READ WRITE CKSUM [...] spares cXtYdZ FAULTED corrupted data Unless you are running a very recent version of OpenSolaris, you will probably be unable to _zpool remove_ these faulted spares. If this happens to you, ~~do not try re-adding the spare(s) to your pools~~. What has probably happened to you is a [[ZFS disk GUID ZFSGuids]] mismatch problem, where the pool configuration claims that the spare should have one GUID but the actual device has a valid ZFS label with another GUID. When the kernel discovers this situation, such as when it brings pools up during boot, it will throw up its hands and declare the spare faulted. Fortunately, the problem turns out to be easy to cure; all you have to do is _zpool export_ and then _zpool import_ each affected pool, because _zpool import_ rewrites the pool's spare configuration to have the spare device's current GUID during the import process. (If a spare device has no valid ZFS disk labels, _zpool import_ will fix that too. It's really a helpful command, perhaps a bit too helpful in a SAN environment.) Our theory about how this situation can happen naturally is that there is some sort of race when adding the same spare to multiple pools in close succession, such as if you do it from a script. (You can induce the problem artificially by adding a spare to one pool, destroying its [[ZFS disk labels ZFSGuids]] with _dd_, and then adding it to another pool, which will create new disk labels with a different GUID.) The two ways we have seen of having the kernel choke on this situation are to either reboot your machine or to add another spare device to the pools (this apparently causes the kernel to re-check all spare devices and thus notice the inconsistency and fault the affected spares). If you don't reboot your machine or add more spares, a system with this problem can run for months without anything noticing it (which is what happened to us). === Sidebar: what re-adding the spares does If you try to re-add the spares, you will get two unpleasant surprises; first, it will work, and second, _zpool import_ will no longer fix your problem. Remember how I said that [[ZFS really identifies disks by GUIDs ZFSGuids]]? Well, if you re-added the faulted spare, you're seeing a really vivid illustration of this in action. As far as ZFS is concerned, you have two completely separate spares, with different GUIDs, that just happen to think they should be found on the same device. Once this happens, _zpool import_ won't rewrite the GUID any more, presumably because that would create a situation where there's a duplicate spare.