2019-10-21
Filesystem size limits and the complication of when errors are detected
One of the practical things that further complicates limiting the size of things in filesystems is the issue of when people find out about this, or rather how this interacts with another very desirable property in modern filesystems.
For practical usability, you want people (by which I mean programs)
to be notified about all 'out of space' errors synchronously, when
they submit their write IOs, or more exactly when they get back the
results of submitting one (although this is effectively the same
thing unless you have an asynchronous write API and a program is
using it). Common APIs such as the Unix API theoretically allow you
to signal write errors later (for example on close()
in Unix),
but actually doing so will cause practical problems both for
straightforward programs that just write files out and are done
(such as editors) and more complicated programs that do ongoing IO.
Beyond carelessness, the problem is that write errors that aren't
tied to a specific write IO leave programs in the dark about what
exactly went wrong. If your program makes five write()
calls
at various offsets in the file and then gets an error later, the
file could be in pretty much any state as far as it knows; it has
no idea which writes succeeded, if any, and which failed.
Some write errors can't be detected until the IO is actually done
and have to be delivered asynchronously, but delivering as many as
possible as early as possible is quite desirable. And while 'the
disk exploded' write errors are uncommon and unpredictable, 'you're
out of space' is both predictable (in theory) and common, so you
very much want to deliver it to programs immediately if it's going
to happen at all.
By itself this is no problem, but then we get into the other issue. Modern filesystems have discovered that they very much want to delay block allocation until as late as possible, because delaying and aggregating it together across a bunch of pending writes gives you various performance improvements and good features (see eg the Wikipedia article). The natural place to detect and report being out of various sorts of space is during block allocation (because that's when you actually use space), but if this is asynchronous and significantly delayed, you don't have prompt reporting of out of space issues to programs. If you try to delay block allocation but perform size limit checking early, there's a multi-way tradeoff between basically duplicating block allocation, being completely accurate (so that there's little or no chance of a delayed failure during the later block allocation), and being fast.
In theory, the best solution to this problem is probably genuinely asynchronous write APIs that delay returning their results until your data has been written to disk. In practice asynchronous APIs leave you with state machines in your programs, and state machines are often hard to deal with (this is the event loop problem, and also).