Filesystem size limits and the complication of when errors are detected

October 21, 2019

One of the practical things that further complicates limiting the size of things in filesystems is the issue of when people find out about this, or rather how this interacts with another very desirable property in modern filesystems.

For practical usability, you want people (by which I mean programs) to be notified about all 'out of space' errors synchronously, when they submit their write IOs, or more exactly when they get back the results of submitting one (although this is effectively the same thing unless you have an asynchronous write API and a program is using it). Common APIs such as the Unix API theoretically allow you to signal write errors later (for example on close() in Unix), but actually doing so will cause practical problems both for straightforward programs that just write files out and are done (such as editors) and more complicated programs that do ongoing IO. Beyond carelessness, the problem is that write errors that aren't tied to a specific write IO leave programs in the dark about what exactly went wrong. If your program makes five write() calls at various offsets in the file and then gets an error later, the file could be in pretty much any state as far as it knows; it has no idea which writes succeeded, if any, and which failed. Some write errors can't be detected until the IO is actually done and have to be delivered asynchronously, but delivering as many as possible as early as possible is quite desirable. And while 'the disk exploded' write errors are uncommon and unpredictable, 'you're out of space' is both predictable (in theory) and common, so you very much want to deliver it to programs immediately if it's going to happen at all.

By itself this is no problem, but then we get into the other issue. Modern filesystems have discovered that they very much want to delay block allocation until as late as possible, because delaying and aggregating it together across a bunch of pending writes gives you various performance improvements and good features (see eg the Wikipedia article). The natural place to detect and report being out of various sorts of space is during block allocation (because that's when you actually use space), but if this is asynchronous and significantly delayed, you don't have prompt reporting of out of space issues to programs. If you try to delay block allocation but perform size limit checking early, there's a multi-way tradeoff between basically duplicating block allocation, being completely accurate (so that there's little or no chance of a delayed failure during the later block allocation), and being fast.

In theory, the best solution to this problem is probably genuinely asynchronous write APIs that delay returning their results until your data has been written to disk. In practice asynchronous APIs leave you with state machines in your programs, and state machines are often hard to deal with (this is the event loop problem, and also).

Comments on this page:

By sam at 2019-10-22 10:15:38:

There's a large, useful middle ground where the filesystem's able to quickly approximate the amount of free space by some admissible heuristic (or whatever resource is relevant) and then determines that the operations requested can't possibly go over any limits, so it can batch up the operations without having to care about returning useful quota errors (because there won't be any). This isn't a panacea: the slow path might be really slow, relatively speaking, and that could put systems into a limping state even before the limit's breached; filesystems also still should be reporting I/O errors individually rather than signalling an error on fsync or close (though of course the latter is what they do in practice, because it would be too convenient otherwise).

Written on 21 October 2019.
« A small irritation with Go's crypto/tls package
Groups of processes are a frequent and fundamental thing in Unix »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Oct 21 22:14:06 2019
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.