You can sort of use zdb
as a substitute for a proper ZFS fsck
One of the things said repeatedly and correctly about ZFS is that
it has no equivalent of fsck
. ZFS scrubs will check that all of
the blocks in your pool checksum correctly and that the pool's
metadata is generally intact, but that's it (as covered yesterday). A ZFS scrub will detect and repair damaged
on-disk data, but it will not do anything about mistakes and accidents
inside ZFS itself, including ACL attributes that are internally
inconsistent. These things are not
supposed to happen but they do, partly because the ZFS code has
bugs (see, for example, Illumos issue 9847).
If you look at a conventional filesystem's fsck
from the right
angle, it does two things; it finds corrupted portions of your
filesystem and tells you about them, then it fixes them for you as
much as it can or at least recovers as much data as possible. ZFS
doesn't have something that does the 'repair' portion of that and
probably never will, but it does have something that does at least
part of the first job, that of scanning your ZFS pool and finding
things wrong with how it is put together. That thing is ZDB.
ZDB started out life as a deliberately undocumented internal (Open)Solaris tool. Back in the days of Solaris 10, the only way to learn how to use it was to either run it to get vague help messages or read the source code (doing both was recommended). I'm not sure when it gained an actual manual page, but it was after we started using Solaris 10 on our first generation of ZFS NFS fileservers, but the manpage has apparently been there for a while now; some spelunking suggests that it may have shown up in early 2012 through Illumos issue 2088. By itself that was welcome, because ZDB is really your only tool to introspect the details of any oddities in your ZFS pools.
However, these days it has become more than just an internal debugging
tool. As suggested by the second paragraph of Illumos issue 9847
and also the ZDB manpage itself, ZDB has become the place that
people put at least some (meta)data consistency checking for ZFS
pools. Right now this appears to just be looking for space leaks
under the right circumstances (as part of 'zdb -b
' or 'zdb -c
').
However in the future it's possible that ZDB may do more consistency
checking if asked, because there's at least the camel's nose in the
tent and ZDB is not a bad place for it.
When I started writing this entry I was optimistically hoping that I'd find various sorts of consistency checking in ZDB. Unfortunately I'm wrong, although I think that you could use ZDB with some add-on tooling to do things like verify that all directory entries in a filesystem referred to live dnodes (since I believe you can dump all dnodes in a ZFS filesystem, including showing the ZAPs for directories; then you could post-process the dump). Possibly the ZFS developers feel that additional offline tools are the best choice for various reasons.
PS: As far as I know ZDB can't be used to repair space leaks and the like, but if you use it to discover a big one at least you know it's time to back up the pool, destroy it, and start over from scratch.
PPS: I continue to strongly believe that ZFS should have something that at least scans your pool for all sorts of correctness and consistency issues, because things keep happening in ZFS code that result in damaged filesystems. But so far no one considers this a high enough priority to develop tools for it, and I suppose I can't blame them; the large system solution to 'my filesystem is corrupted' is 'restore from last night's backups'. Certainly it would be our solution here.
|
|