Why filesystems need to be where data is checksummed
Allegedly (and I say this because I have not looked for primary sources) some existing Linux filesystems are adding metadata checksums and then excusing their lack of data checksums by saying that if applications care about data integrity the application will do the checksumming itself. Having metadata checksums is better than having nothing and adding data checksums to existing filesystems is likely difficult, but this does not excuse their views about who should do what with checksums.
There are at least two reasons why filesystems should do data checksums. The first is that data checksums exist not merely to tell applications (and ultimately the user) when data becomes corrupt, but also to do extremely important things like telling which side of a RAID mirror is the correct side. Applications definitely do not have access to low-level details of things like RAID data, but the filesystem is at least in the right general area to be asking the RAID system 'do you happen to have any other copies of this logical block?' or the like.
The second reason is that a great many programs would never be
rewritten to verify checksums. Not only would this require a massive
amount of coding, it would require a central standard so that
applications can interoperate in generating and checking these
checksums, finding them, and so on and so forth. On Unix, for
example, this would need support not just from applications like
Firefox, OpenOffice, and Apache but also common programs like
gcc. The net result would be that a great deal
of file IO on Unix would not be protected by checksums.
(Let's skip lightly over any desire to verify that executables and shared libraries are intact before you start executing code from them, because you just can't do that without the kernel being very closely involved.)
When you are looking at a core service that should touch absolutely everything that does some common set of operations, the right place to put this service is in a central place so that it's implemented once and then used by everyone. The central place here is the kernel (where all IO passes through one spot), which in practice means in the filesystem.
(Perhaps this is already obvious to everyone; I'd certainly like to think that it is. But if there are filesystem developers out there who are seriously saying that data checksums are the job of applications instead of the filesystem, well, I don't know what to say. Note that I consider 'sorry, we can't feasibly add data checksums to our existing filesystem' to be a perfectly good reason for not doing so.)