The different ways that you can lose a ZFS pool
There are at least three different general ways that you can lose a ZFS pool.
The straightforward way that everyone knows about is for you to lose a top level vdev, ie to lose a non-redundant disk, or all of the disks in a mirror set, or enough disks in a raidzN (two disks for raidz1, three for a raidz2, etc). Losing a chunk of striped or concatenated storage is essentially instant death for basically any RAID system and ZFS is no exception here.
(I don't know and haven't tested if you can recover your pool should you return enough of the missing disks to service, or if ZFS immediately 'poisons' the remaining good disks the moment it notices the problem. I would hope the former, but ZFS has let me down before.)
The second way, which may have been fixed by now, is to lose a vdev that was a separate ZIL log device (and I think perhaps a L2ARC device) and then reboot or export the pool before removing the dead vdev from the pool configuration. This failure mode is caused by how ZFS validates that it has found the full and correct pool configuration without storing a copy of the pool configuration in the uberblock. Basically, each vdev in the pool has a ZFS GUID and the uberblock has a checksum of all of them together. If you try to assemble a pool with an incomplete set of vdevs, the checksum of their GUIDs will not match the checksum recorded in the uberblock and the ZFS code rejects the attempted pool configuration. This is all well and good until you lose a vdev of something without data on it (such as a ZIL log device) that ZFS still includes in the uberblock vdev GUID checksum.
(One unfortunate aspect of this design decision is that ZFS doesn't necessarily know which pieces of your pool are missing. All it knows is that you have an incomplete configuration because the GUID checksums don't match.)
The third way is to have corrupted metadata at the top of the pool. There are a number of ways that this can happen, but probably the most common one is running into a ZFS bug that causes it to write incorrect or bad data to disk (you can also accidentally misuse ZFS). I believe that ZFS can recover from a certain amount of damaged metadata that is relatively low down the ZFS metadata and filesystem tree; you'll lose access to some of your files, but the pool will stay intact. However, if there's damage something sufficiently close to the root of the ZFS pool metadata, that's it; ZFS throws up its hands (and sometimes panics your machine) despite most of your data being intact and often relatively findable.
(Roughly speaking there are two sorts of metadata damage, destroyed metadata and corrupted metadata. Destroyed metadata has a bad ZFS block checksum; corrupted metadata checksums correctly but has contents that ZFS chokes on, often with kernel panics.)
Update: what I said here about the leading causes of corrupted metadata is probably wrong. See ZFSLosingPoolsWaysII.
These days, ZFS has a recovery method for certain sorts of metadata corruption; how it works and what its limitations are is beyond the scope of this entry.
Comments on this page:
|
|