Unfortunately, damaged ZFS filesystems can be more or less unrepairable

December 1, 2021

An unfortunate piece of ZFS news of the time interval is that Ubuntu 21.10 shipped with a serious ZFS bug that created corrupted ZFS filesystems (see the 21.10 release notes; via). This sort of ZFS bug happens from time to time and has likely happened as far back as Solaris ZFS, and there are two unfortunate aspects of them.

(For an example of Solaris ZFS corruption, Solaris ZFS could write ACL data that was bad in a way that it ignored but modern ZFS environments care about. This sort of ZFS issue is not specific to Ubuntu or modern OpenZFS development, although you can certainly blame Ubuntu for this particular case of it and for shipping Ubuntu 21.10 with it.)

The first unfortunate aspect is that many of these bugs normally panic your kernel. At one level it's great that ZFS is loaded with internal integrity and consistency checks that try to make sure the ZFS objects it's dealing with haven't been corrupted. At another level it's not so great that the error handling for integrity problems is generally to panic. Modern versions of OpenZFS has made some progress on allowing some of these problems to continue instead of panic, but there are still a lot left.

The second unfortunate aspect is that generally you can't repair this damage the way you can in more conventional filesystems. Because of ZFS's immutability and checksums, once something makes it to disk with a valid checksum, it's forever. If what made it to disk was broken or corrupted, it stays broken or corrupted; there's no way to fix it in place and no mechanism in ZFS to quietly fix it in a new version. Instead, the only way to get rid of the problem is to delete the corrupted data in some way, generally after copying out as much of the rest of your data as you can (and need to). If you're lucky, you can delete the affected file; if you're somewhat unfortunate, you're going to have to destroy the filesystem; if you're really unlucky, the entire pool needs to be recreated.

This creates two reasons to make regular backups (and not using 'zfs send', because that may well just copy the damage to your backups). The first reason is of course so that you have the backup to restore from. The second reason is because making a backup with tar, rsync, or another user level tool of your choice will read everything in your ZFS filesystems, which creates regular assurance that everything is free of corruption.

(ZFS scrubs don't check enough to find this sort of thing.)

PS: Even if you don't make regular backups, perhaps it's a good idea just to read all of your ZFS filesystems every so often by tar'ing them to /dev/null or similar things. I should probably do this on my home machine, which I am really bad at backing up.


Comments on this page:

If something causes your pool to panic on import and zpool import -Fn fails, it's always worth rootcausing with zdb -eL[mmmmm] or zdb -Ce or zdb -eu after you've installed the debugging symbols and filing an issue about it, as there's always a chance that some bug is the cause, and without rootcausing, the bug might not be encountered by anyone else.

By loreb at 2021-12-06 16:21:37:

Just a small note: GNU tar has dedicated code to check if it's writing to /dev/null, and if so they skip reading the input.

By robert at 2022-01-05 15:22:22:

Zfs should have some sort of fsck because otherwise we need to tar everything which could be TB of data in order to fix one or two damaged files like in issue described.

Very unfortunate

Absolutely minimum is to delete damaged file.

Written on 01 December 2021.
« Prometheus will make persistent connections to agents (scrape targets)
On servers maybe moving to M.2 NVMe drives for their system drives »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Dec 1 22:58:50 2021
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.