Filesystems and the problems of exposing their internal features

July 5, 2025

Modern filesystems often have a variety of sophisticated features that go well beyond standard POSIX style IO, such as transactional journals of (all) changes and storing data in compressed form. For certain usage cases, it could be nice to get direct access to those features; for example, so your web server could potentially directly serve static files in their compressed form, without having the kernel uncompress them and then the web server re-compress them (let's assume we can make all of the details work out in this sort of situation, which isn't a given). But filesystems only very rarely expose this sort of thing to programs, even through private interfaces that don't have to be standardized by the operating system.

One of the reasons for filesystems to not do this is that they don't want to turn what are currently internal filesystem details into an API (it's not quite right to call them only an 'implementation' detail, because often the filesystem has to support the resulting on-disk structures more or less forever). Another issue is that the implementation inside the kernel is often not even written so that the necessarily information could be provided to a user-level program, especially efficiently.

Even when exposing a feature doesn't necessarily require providing programs with internal information from the filesystem, filesystems may not want to make promises to user space about what they do and when they do it. One place this comes up is the periodic request that filesystems like ZFS expose some sort of 'transaction' feature, where the filesystem promises that either all of a certain set of operations are visible or none of them are. Supporting such a feature doesn't just require ZFS or some other filesystem to promise to tell you when all of the things are durably on disk; it also requires the filesystem to not make any of them visible early, despite things like memory pressure or the filesystem's other natural activities.

Sidebar: Filesystem compression versus program compression

When you start looking, how ZFS does compression (and probably how other filesystems do it) is quite different from how programs want to handle compressed data. A program such as a web server needs a compressed stream of data that the recipient can uncompress as a single (streaming) thing, but this is probably not what the filesystem does. To use ZFS as an example of filesystem behavior, ZFS compresses blocks independently and separately (typically in 128 Kbyte blocks), may use different compression schemes for different blocks, and may not compress a block at all. Since ZFS reads and writes blocks independently and has metadata for each of them, this is perfectly fine for it but obviously is somewhat messy for a program to deal with.

Written on 05 July 2025.
« Operating system kernels could return multiple values from system calls
The easiest way to interact with programs is to run them in terminals »

Page tools: View Source.
Search:
Login: Password:

Last modified: Sat Jul 5 22:21:33 2025
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.