What ZFS messages about 'permanent errors in <0x95>:<0x0>' mean

June 9, 2018

If you use ZFS long enough (or are unlucky enough), one of the things you may run into are reports in zpool status -v of permanent errors in something (we've had that happen to us despite redundancy). If you're reasonably lucky, the error message will have a path in it. If you're unlucky, the error message will say something like:

errors: Permanent errors have been detected in the following files:

This is a mysterious and frustrating message. On the ZFS on Linux mailing list, Richard Elling recently shared some extremely useful information about what they mean in this message.

The short answer of what they mean is, to quote directly:

The first number is the dataset id (index) and the second is the object id. For filesystems, the object id can be the same as the file's "inode" as shown by "ls -i" But a few obect ids exist for all datasets. Object id 0 is the DMU dnode.

The dataset here may be a ZFS filesystem, a snapshot, or I believe a few other things. I believe that if it's still in existence, you'll normally get at least its name and perhaps the full path to the object. When it's not in existence any more (perhaps you deleted the snapshot or the whole filesystem in question since the scrub detected it), you get this hex ID and there's also no information about the path.

The reason the information is presented this way is that what the ZFS code in the kernel saves and returns to the zpool command is actually just the dataset and object ID. It's up to zpool to turn both of these into names, which it actually does by calling back into the kernel to find out what they're currently called, if the kernel knows. Inspecting the relevant ZFS code says that there are five cases:

  • <metadata>:<0x...> means corruption in some object in the pool's overall metadata object set.

  • <0x...>:<0x...> means that the dataset involved can't be identified (and thus ZFS has no hope of identifying the thing inside the dataset).

  • /some/path/name means you have a corrupted filesystem object (a file, a directory, etc) in a currently mounted dataset and this is its full current path.

    (I think that ZFS's determination of the path name for a given ZFS object is pretty reliable; if I'm reading the code right, it appears to be able to scan upward in the filesystem hierarchy starting with the object itself.)

  • dsname:/some/path means that the dataset is called dsname but it's not currently mounted, and /some/path is the path within it. I think this happens for snapshots.

  • dsname:<0x...> means that it's in the given dataset dsname (which may or may not be mounted), but the ZFS object in question can't have its path identified for various reasons (including that it's already been deleted).

Only things in ZFS filesystems (and snapshots and so on) have path names, so an error in a ZVOL will always be reported without the path. I'm not sure what the reported dataset names are for ZVOLs, since I don't use ZVOLs.

The final detail is that you may see this error status in 'zpool status -v' even after you've cleaned it up. To quote Richard Elling again:

Finally, the error buffer for "zpool status" contains information for two scan passes: the current and previous scans. So it is possible to delete an object (eg file) and still see it listed in the error buffer. It takes two scans to completely update the error buffer. This is important if you go looking for a dataset+object tuple with zdb and don't find anything...

PS: There are some cases where <xattrdir> will appear in the file path. If I'm reading the code correctly, this happens when the problem is in an extended attribute instead of the filesystem object itself.

(See also this, this, and this.)

PPS: Richard Elling's message was on the ZFS on Linux mailing list and about an issue someone was having with a ZoL system, but as far as I can see the core code is basically the same in Illumos and I would expect in FreeBSD as well, so this bit of ZFS wisdom should be cross-platform.

Written on 09 June 2018.
« How to run a mail sending service that will probably never send spam
People receiving email don't feel it should be their job to stop spam »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Jun 9 22:58:26 2018
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.