Journaling filesystems and the fsync()
problem
Consider your ordinary journaling filesystem. For simplicity and reliability you have a single, global log in which you put transactions for all of your filesystem activity, instead of anything more complicated. One useful consequence of this global log is that you have now created a filesystem-wide global order of all filesystem events (sometimes called a 'total order'), which will be preserved even if you crash and restart.
(You implicitly had a total order before, but it didn't necessarily survive crashes.)
This sounds great until someone does an fsync()
to insure that changes
to their particular file are fully stable. That you have a global log
means that changes to their file are intermixed with other changes; your
log's total order means that you have to commit everything up to the
last modification point of their file, regardless of what any particular
change modifies.
On a sufficiently busy system, almost all of the changes in the journal
log will not be to the file being fsync()
'd. Flushing and committing
all of these unrelated changes is overhead that just serves to slow down
the fsync()
, sometimes by quite a lot.
You can get around this, but it generally requires a significantly more
complicated filesystem and journal design, which may or may not be
considered worth it in general. (Not that many applications actually use
fsync()
, and many of them are not all that speed sensitive. On the
other hand, the exceptions tend to be pretty important.)
|
|