Consistency and durability in the context of filesystems

September 4, 2015

Here's something that I've seen trip people up more than once when they talk about filesystems. When we talk about what guarantees a filesystem provides to programs that write data to it, we can talk about two things and the difference between them can be important.

Durability is when you write something or change the filesystem and it's still there after the system crashes or loses power unexpectedly. Durability is what you need at a high level to say 'your email has been received' or 'your file has been saved'. As everyone hopefully knows, almost no filesystem provides durability by default for data that you write to files and many don't provide it for things like removing or renaming files.

What I'll call consistency is basically that the filesystem preserves the ordering of changes after a crash. If you wrote one thing then wrote a second thing and then had the system crash, you have consistency if the system will never wind up in a state where it still has the second thing but not the first. As everyone also hopefully knows, most filesystems do not provide data consistency by default; if you write data, they normally write bits of it to disk whenever they find it convenient without preserving your order. Some but not all filesystems provide metadata consistency by default.

(Note that metadata consistency without data consistency can give you odd results that make you unhappy. Consider 'create new file A, write data to A, remove old file B'; with metadata consistency and no data consistency or forced durability, you can wind up with an empty new file A and no file B.)

Durability and consistency are connected but one does not necessarily require the other except in the extreme case of total durability (which necessarily implies total consistency). In particular, it's entirely possible to have a filesystem that has total consistency but no durability at all. Such a filesystem may rewind time underneath applications after a crash, but it will never present you with an impossible situation that didn't exist at some pre-crash point; in the 'write A, write B, crash' case, you may wind up with nothing, A only, or A and B, but you will never wind up with just B and no A.

(Because of its performance impact, most filesystems do not make selective durability of portions of the filesystem impose any sort of consistency outside of those portions. In other words, if you force-flush some files in some order, you're guaranteed that your changes to those files will have consistency but there's no consistency between them and other things going on.)

Applications not infrequently use forced flushes to create either or both of durability (the DB committed the data it told you it did) and consistency (the DB's write log reflects all changes in the DB data files because it was flushed first). In some environments, turning off durability but retaining or creating consistency is an acceptable tradeoff for speed.

(And some environments don't care about either, because the fix procedure in the face of an extremely rare system crash is 'delete everything and restart from scratch'.)

Note that journaled filesystems always maintain consistent internal data structures but do not necessarily guarantee that consistency for what you see, even for metadata operations. A journaled filesystem will not explode because of a crash but it may still partially apply your file creations, renames, deletions and so on out of order (or at least out of what you consider order). However it's reasonably common for journaled filesystems to have fully consistent metadata operations, partly because that's usually the easiest approach.

(This has some consequences for developers, along the same lines as the SSD problem but more so since it's generally hard to test against system crashes or spot oversights.)

Written on 04 September 2015.
« How I've decided to coordinate multiple git repos for a single project
Why we aren't tempted to use ACLs on our Unix machines »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Sep 4 01:10:32 2015
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.