Wandering Thoughts archives


Choices filesystems make about checksums

If you are designing integrity checksums into a new filesystem or trying to adding them to an existing one, there are some broad choices you have to make about them. These choices will determine both how easy it is to add checksums (especially to existing filesystems) and also how much good your checksums do. Unfortunately these two things pull in the opposite direction from each other.

Two big choices are: do you have checksums for just filesystem metadata or both data and metadata, and are your checksums 'internal' (stored with the object that they are a checksum of) or 'external' (stored not with the object but with references to it). I suppose you can also do checksums of just data and not metadata, but I don't think anyone does that yet (partly because in most filesystems the metadata is data too, as it has things like names and access permissions that your raw bits make much less sense without).

The best option is to checksum everything and to use external checksums. The appeal of checksumming everything is hopefully obvious. The advantage of external checksums is that they tell you more than internal checksums do. Internal checksums cover 'this object has been corrupted after being written' while external checksums also cover 'this is the wrong object', ie they let you check and verify the structure of your filesystem. With internal checksums you know that you are looking at, say, an intact directory, but you don't know if it's actually the directory you think you're looking at.

On the other hand, the easiest option to add to an existing filesystem is internal checksums of metadata only. To do this all you need to do is either find or claim some unused space for a single checksum in existing metadata structures like directory disk blocks or just add a checksum on the end of them as a new revision, which you can sometimes arrange so that almost no existing code cares and no existing on-disk data is invalidated. Doing only metadata is simpler because internal checksums present a problem for on-disk data, as there simply isn't any spare room in existing data blocks; they're all full of, well, user file data. In general adding internal checksums to data blocks means that, say, 4K of user data may no longer fit in a single on disk data block, which in practice will perturb a lot of assumptions made by user code.

(Almost all user code assumes that writing data in some power of two size is basically optimal and as a result does it all over the place. There are all sorts of bad things that happen if this is not the case.)

There are two problems with external checksums that give you big heartburn if you try to add them to existing filesystems. The first is that you have to store a lot more checksums. As an example, consider a disk block of directory entries, part of a whole directory. With internal checksums this disk block needs a single checksum for itself, while with external checksums it needs one checksum per directory entry it contains (to let you validate that the inode the directory entry is pointing to is the file you think it is).

(Another way to put this is that any time a chunk of metadata points to multiple sub-objects, external checksums require you to find room for one checksum per sub-object while internal checksums just require you to find room for one, for the chunk of metadata itself. It's extremely common for a single chunk of metadata to point to multiple sub-objects because this is an efficient use of space; directory blocks contain multiple directory entries per block, files have indirect blocks that point to multiple data blocks et al, and so on.)

The second is that you are going to have to update more checksums when things change. With external checksums, any time an object changes all references to it need to have their checksums updated to its new value, and then all references to the references probably need their checksums updated in turn, and so on until you get to the top of the tree. External checksums are a natural fit for copy on write filesystems (which are already changing all references up the tree) and probably a terrible fit for any filesystem that does in-place updates. And unfortunately (for checksums) most common filesystems today do in-place updates for various reasons.

PS: the upshot of this is that on the one hand I sympathize a fair bit with filesystems like ext4 and XFS that are apparently adding metadata checksums (that sound like they're internal ones) because they have a really hard job and it's better than nothing, but on the other hand I still want more.

tech/FilesystemChecksumOptions written at 01:01:45; Add Comment

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.