Why filesystems need to be where data is checksummed

January 9, 2015

Allegedly (and I say this because I have not looked for primary sources) some existing Linux filesystems are adding metadata checksums and then excusing their lack of data checksums by saying that if applications care about data integrity the application will do the checksumming itself. Having metadata checksums is better than having nothing and adding data checksums to existing filesystems is likely difficult, but this does not excuse their views about who should do what with checksums.

There are at least two reasons why filesystems should do data checksums. The first is that data checksums exist not merely to tell applications (and ultimately the user) when data becomes corrupt, but also to do extremely important things like telling which side of a RAID mirror is the correct side. Applications definitely do not have access to low-level details of things like RAID data, but the filesystem is at least in the right general area to be asking the RAID system 'do you happen to have any other copies of this logical block?' or the like.

The second reason is that a great many programs would never be rewritten to verify checksums. Not only would this require a massive amount of coding, it would require a central standard so that applications can interoperate in generating and checking these checksums, finding them, and so on and so forth. On Unix, for example, this would need support not just from applications like Firefox, OpenOffice, and Apache but also common programs like grep, awk, perl, and gcc. The net result would be that a great deal of file IO on Unix would not be protected by checksums.

(Let's skip lightly over any desire to verify that executables and shared libraries are intact before you start executing code from them, because you just can't do that without the kernel being very closely involved.)

When you are looking at a core service that should touch absolutely everything that does some common set of operations, the right place to put this service is in a central place so that it's implemented once and then used by everyone. The central place here is the kernel (where all IO passes through one spot), which in practice means in the filesystem.

(Perhaps this is already obvious to everyone; I'd certainly like to think that it is. But if there are filesystem developers out there who are seriously saying that data checksums are the job of applications instead of the filesystem, well, I don't know what to say. Note that I consider 'sorry, we can't feasibly add data checksums to our existing filesystem' to be a perfectly good reason for not doing so.)


Comments on this page:

There are at least two reasons why filesystems should do data checksums. The first is that data checksums exist not merely to tell applications (and ultimately the user) when data becomes corrupt, but also to do extremely important things like telling which side of a RAID mirror is the correct side.

Here you are assuming that the filesystem is also doing volume management. This is not the case for XFS and ext4. Regardless of whether or not volume management inside the filesystem is the right thing to do (I think it is, and you seem to agree given your posts about ZFS), you started this post with a "existing Linux filesystems". Am I nitpicking? A little bit. :)

You are certainly right that most applications will never be checksum-aware. Some never can be. For example, grep deals with binary streams. It doesn't understand how to parse the contents of the file.

XFS takes the approach of "block management" - it doesn't pretend to solve all problems, rather it views itself as high-performance layer on top of a block device that manages the blocks on a block device. To do so safely, it uses crc to checksum it's metedata. One of the major usecases for XFS is in high performance computing, there the applications optimize data layout for all sort of weird things, and they are in the best place to decide what needs to be checksumed and how strong the checksum should be.

By cks at 2015-01-09 11:41:33:

I don't think the filesystem has to be doing volume management to repair RAID mirrors; all it needs is some interfaces to the RAID layer to say 'give me all alternate copies of this block that you might have' or the like. A checksum-aware filesystem can then go through all of the copies to try to find one that verifies and then write it back out to resync all of the RAID copies. This find and repair sort of interface is of course useful for metadata as well as data, since metadata can get corrupted on one side of RAID mirror just as much as data can.

(This sort of reconstruction gets more complicated with RAID 5+, but I'm waving my hands.)

As for grep et al: they can certainly be 'checksum-aware' in that they should both verify and create checksums in an ideal world. Certainly users want them to be. If I'm grep'ing a bunch of files or compiling a bunch of code or whatever, I certainly want grep, gcc, et al to react if some bits have gotten silently corrupted on the disk.

Written on 09 January 2015.
« ZFS should be your choice today if you need an advanced filesystem on Unix
Autoplaying anything is a terrible decision, doubly so for video »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Jan 9 03:55:52 2015
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.