What ZFS messages about 'permanent errors in <0x95>:<0x0>' mean
If you use ZFS long enough (or are unlucky enough), one of the things
you may run into are reports in zpool status -v
of permanent errors
in something (we've had that happen to us despite redundancy). If you're reasonably lucky, the error message
will have a path in it. If you're unlucky, the error message will say
something like:
errors: Permanent errors have been detected in the following files:<0x95>:<0x0>
This is a mysterious and frustrating message. On the ZFS on Linux mailing list, Richard Elling recently shared some extremely useful information about what they mean in this message.
The short answer of what they mean is, to quote directly:
The first number is the dataset id (index) and the second is the object id. For filesystems, the object id can be the same as the file's "inode" as shown by "ls -i" But a few obect ids exist for all datasets. Object id 0 is the DMU dnode.
The dataset here may be a ZFS filesystem, a snapshot, or I believe a few other things. I believe that if it's still in existence, you'll normally get at least its name and perhaps the full path to the object. When it's not in existence any more (perhaps you deleted the snapshot or the whole filesystem in question since the scrub detected it), you get this hex ID and there's also no information about the path.
The reason the information is presented this way is that what the
ZFS code in the kernel saves and returns to the zpool
command is
actually just the dataset and object ID. It's up to zpool
to turn
both of these into names, which it actually does by calling back
into the kernel to find out what they're currently called, if the
kernel knows. Inspecting the relevant ZFS code
says that there are five cases:
<metadata>:<0x...>
means corruption in some object in the pool's overall metadata object set.<0x...>:<0x...>
means that the dataset involved can't be identified (and thus ZFS has no hope of identifying the thing inside the dataset)./some/path/name
means you have a corrupted filesystem object (a file, a directory, etc) in a currently mounted dataset and this is its full current path.(I think that ZFS's determination of the path name for a given ZFS object is pretty reliable; if I'm reading the code right, it appears to be able to scan upward in the filesystem hierarchy starting with the object itself.)
dsname:/some/path
means that the dataset is calleddsname
but it's not currently mounted, and/some/path
is the path within it. I think this happens for snapshots.dsname:<0x...>
means that it's in the given datasetdsname
(which may or may not be mounted), but the ZFS object in question can't have its path identified for various reasons (including that it's already been deleted).
Only things in ZFS filesystems (and snapshots and so on) have path names, so an error in a ZVOL will always be reported without the path. I'm not sure what the reported dataset names are for ZVOLs, since I don't use ZVOLs.
The final detail is that you may see this error status in 'zpool status
-v
' even after you've cleaned it up. To quote Richard Elling again:
Finally, the error buffer for "zpool status" contains information for two scan passes: the current and previous scans. So it is possible to delete an object (eg file) and still see it listed in the error buffer. It takes two scans to completely update the error buffer. This is important if you go looking for a dataset+object tuple with zdb and don't find anything...
PS: There are some cases where <xattrdir>
will appear in the file
path. If I'm reading the code correctly, this happens when the
problem is in an extended attribute instead of the filesystem object
itself.
(See also this, this, and this.)
PPS: Richard Elling's message was on the ZFS on Linux mailing list and about an issue someone was having with a ZoL system, but as far as I can see the core code is basically the same in Illumos and I would expect in FreeBSD as well, so this bit of ZFS wisdom should be cross-platform.
|
|