Keeping track of filesystem consistency

May 3, 2010

In light of my last entry, here is an interesting question: when do you know that a filesystem is consistent, and how much work does it take for the system to keep track of this?

First off, there are some easy cases, namely filesystems with journaling, strong ordering guarantees, or copy on write properties.

In general, copy on write and journaling filesystems are supposed to be consistent all of the time unless the kernel has detected that something is wrong and flagged the filesystem as damaged. Instead of these approaches, some regular filesystems carefully order their updates so that they are always consistent or at least sufficiently close to it (so called 'soft updates'). In all these cases, keeping track of the consistency itself is essentially free; the operating system mostly needs a flag in the filesystem to say that errors have been detected, and this will be rarely updated.

(Technically journaling filesystems are only consistent if you replay the journal after any crashes; if you look just at the filesystem state and ignore the journal, it may be inconsistent. This sometimes causes problems. The problem with soft updates is their complexity and also the need to clean up leaked space at some point, although there are promising ways around that.)

Once you get to regular traditional filesystems, things are much more difficult. The semi-traditional Unix view has been that filesystems are inherently inconsistent if they are mounted read-write; they are only (potentially) consistent if they were cleanly unmounted or mounted read-only. This has the virtue of being easy for the system to maintain.

You can do better than this, but it takes more work and in specific it takes more IO. The simple approach is to maintain a 'filesystem is consistent' flag in the filesystem; the operating system unsets this flag before it begins filesystem-changing IO and sets it again afterwards once things are quiet. However, this is going to happen a lot and each unset and set again cycle adds two seeks to your IO operations, especially at the start (if the filesystem is marked consistent, you absolutely must mark it inconsistent and flush that to disk before you do the other write IO). This is a not insignificant amount of extra work in both code and IO, and adds latency in some situations, which is one reason why I don't believe that any Unix systems have ever tried to do this.

(I don't know if other operating systems have tried such schemes. These days I'd expect everyone to just implement a journaling filesystem.)

Written on 03 May 2010.
« A brief history of fsck
Dear software packagers, startup scripts edition »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon May 3 01:45:19 2010
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.