What checksums in your filesystem are usually actually doing

March 27, 2013

The usual way to talk about the modern trend of filesystems with inherent checksums (such as ZFS and btrfs) is to say that the checksums exist to detect data corruption in your files (and in the filesystem as a whole). In an environment with a certain amount of random bit flips, decaying media, periodic hardware glitches, and other sources of damage, it's no longer good enough to imagine that if you wrote it to disk you're sure to read it back perfectly (or to get a disk error). Filesystems with checksums are sentinels, standing on guard for you and letting you know when this has happened to your data.

But this is not quite what they do in practice (generally). This is because they perform this sentinel duty by denying you access to your data. In doing this they implicitly prioritize integrity over availability; better to not give you data at all than to give you data that at least seems damaged. The same is true but even more so if filesystem metadata seems damaged.

(This is similar to the tradeoff disk encryption makes for you.)

You may not be exactly happy with this tradeoff. Yes, it's nice to know if you're reading corrupt data, but sometimes you really want to see that data anyways just to see if you can reconstruct something. This goes even more so for filesystem metadata, especially core metadata; it's not hard to get into a situation where almost all of your data is intact and probably recoverable but the filesystem won't give it to you.

Old filesystems went the other way, and not just by not having any sort of checksums; they often came with quite elaborate recovery tools that would do almost everything they could to get something back. The results might be scattered in little incoherent bits all over the filesystem, but if you cared enough (ie it was important enough), you had a shot at assembling what you could.

(This is still theoretically possible with modern checksumming filesystems but at least some of them are very strongly of the opinion that the answer here is 'restore from backups (of course you have backups)' and so they don't supply any real sort of tools to help you out.)

My opinion is that filesystems ought to support an interface that allows you to get access to even data that fails checksums (perhaps through a special 'no error on checksum error' flag for open()). This wouldn't fix all of the problems (since it wouldn't help in the face of many metadata issues) but it would at least be something and a gesture to agreeing that integrity is not always the most important thing.


Comments on this page:

From 65.111.70.130 at 2013-03-27 23:35:57:

What about ZFS's multiple-copies feature, and auto-scrubbing if the checksum fails?

Wouldn't that be a good trade-off?

-- Goozbach

From 71.80.128.33 at 2013-03-27 23:58:28:

I don't even know what you're talking about. Checksumming is almost always combined with some sort of redundancy, to enable recovering an undamaged copy. With ZFS it just transparently recovers the file data from other disks in the zRAID set. With optical media, it has error-correcting codes, and besides that, can just keep re-reading until it gets some different data (or the user cleans the disk or swaps to another drive).

And yes, no data IS better than bad data. A file system doesn't really have any guaranteed way to pass messages to the upper layers of an operating system so the user is sure to see them... If your file isn't missing entirely, you're sure to assume it's full and complete (and then assume it's safe to delete your old backups of said file), no matter how many lines of noise and warnings are being spewed out to the log files.

By cks at 2013-03-28 01:05:42:

I'm looking at the situation when all redundancy and other measures are lost (all copies of the data are corrupt, perhaps because they were corrupted during the initial write, perhaps for other reasons; this has been known to happen). Given the limited IO interfaces we have failing on checksum errors in the normal case is reasonable but, as I mentioned, I think there should be a way around this. That way you could have both integrity and availability (and make whatever tradeoffs you need). Some people would restore from backups. Some people would carefully piece together what they could.

From 84.112.126.145 at 2013-03-28 15:20:18:

There is zdb.

From 72.200.82.252 at 2013-03-31 12:45:07:

Two possible workarounds that might be developable:

1. A ddrescue-like tool where it would let you read the file, telling you where the blocks that failed checksum are

2. A version compare tool where, if a previous version of the file at the same path in ZFS, it lets you get at both and diff them, although CoW would probably break this unless the unreadable block was in a recent modification.

Written on 27 March 2013.
« Reconsidering a ZFS root filesystem
Illumos-based distributions are currently not fully mature »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Mar 27 22:54:11 2013
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.