Some ways to implement /dev/fd in Unix kernels

January 4, 2022

The idea of /dev/fd, which gives filesystem names to file descriptors, is the core of the modern implementation of process substitution. There are several ways to implement this idea in the Unix kernel, starting from an old, simple, and brute force method to the modern methods that generally use some form of virtual filesystem for reasons that we'll get to.

The simple but brute force way to implement /dev/fd is with a real directory containing a bunch of miscellaneous character devices, somewhat similar to /dev/null. Inside the kernel, the device driver for these miscellaneous devices can arrange to do the necessary magic when they're opened, including failing to open if your process doesn't have that particular file descriptor. This implementation has been possible for a very long time (since before V7 Unix), but it has two drawbacks. First, the /dev/fd directory has to contain character device inodes for all of the potentially available file descriptors, regardless of whether or not the current process has those file descriptors available. Second, you potentially need a lot of minor device numbers, since you need one minor device number for every potential file descriptor number.

Together, these two issues generally made this brute force approach unpopular and, I believe, pretty much never implemented in Unix. The closest people came was /dev/stdin, /dev/stdout, and /dev/stderr, which were sometimes implemented this way. Having only these three common file descriptors available wasn't anywhere near as useful, but it could be a lot more feasible.

The second possible approach is to have /dev/fd be a virtual filesystem but the nodes in the filesystem be miscellaneous character devices. Modern Unixes generally allow really large minor device numbers, so that side's not a problem, and as a virtual filesystem /dev/fd can materialize only the file descriptors that the current process actually has. I'm not certain if anyone actually implements /dev/fd this way. Although FreeBSD can sometimes have character devices appear in /dev/fd, I think that FreeBSD's fdescfs is implemented differently and the character device stat() result is basically an illusion.

(For FreeBSD fdescfs, see fdesc_vnops.c.)

The third approach is to have both /dev/fd and /dev/fd/N be completely virtual, as a full virtual filesystem or as part of one. Modern Linux effectively works this way; /dev/fd is a symbolic link to /proc/self/fd, which is a procfs directory with magical contents. Linux makes this very magical; the files in /proc/[pid]/fd are nominally symbolic links (which is what stat() and ls will report), but when you open them they have special behavior instead of being followed as normal symlinks would be.

(We'll wave our hands about how the virtual filesystem reaches into the depths of the kernel to get access to your process's file descriptors. Let's just assume that the kernel developers make it all work.)

Since both of the good approaches to /dev/fd need some sort of virtual filesystem, both of them had to wait for the idea of the virtual filesystem switch to be invented. Before the days of the VFS, the only possible implementation of /dev/fd was the unattractive brute force one of a real directory with a lot of character devices in it.

Comments on this page:

Interestingly, on macOS, /dev/fd appears to be implemented with character special devices, but they all have the same major/minor number:

> file /dev/fd/0 /dev/fd/1 /dev/fd/2
/dev/fd/0: character special (16/43)
/dev/fd/1: character special (16/43)
/dev/fd/2: character special (16/43)

I guess there must be some magic in devfs that routes them to different places.

By Alexander E. Patrakov at 2022-01-07 22:47:03:

Another possible approach is a pure user-space implementation in libc, so that opening /dev/fd/N is internally mapped not to the open() syscall, but to dup(). Not sure if anyone (possibly besides Cygwin) does this.

Written on 04 January 2022.
« The important Unix idea of the "virtual filesystem switch"
Some things about Dovecot, its index files, and the IMAP SELECT command »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Tue Jan 4 21:57:31 2022
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.