Unix's design issue of device numbers being in stat() results for files

June 24, 2020

Sometimes, you will hear the view that Unix's design is without significant issues, especially the 'pure' design of Research Unix (before people who didn't really understand Unix like Berkeley and corporate AT&T got their hands on it). Unfortunately that is not the case, and there are some areas where Research Unix made decisions that still haunt us to this day. For reasons beyond the scope of this entry, today's example is that part of the file attributes that you get from stat() system call and its friends is the 'device number' of the filesystem the file is on.

(To be specific, this is the st_dev field of the struct stat that stat() returns, which has been since V7's stat.h. The V6 stat() was even more explicit about what it was returning.)

In Unix, the user level file attributes you get back need some kind of locally unique identifier for the filesystem that the file is on, so the presence of some identifier is not a mistake. The identifier being different between two files is how you detect things like that you're at a filesystem mount point, that you can't use link(), or that two otherwise identical looking files are not actually hardlinked together because they're on different filesystems. It's also useful to have an identifier that can be matched up with things like a list of mounted filesystems.

However, early Unixes didn't make this merely some identifier, they made this specifically the device number of the underlying disk device that the filesystem was mounted from (hence its name as 'st_dev'). This had the unfortunate consequence of permanently joining two logically separate identifier namespaces, the namespace of (mounted) filesystems and the namespace of block devices.

Now, 40 odd years later, we have plenty of Unix filesystems that don't have underlying block devices (especially singular ones). Anything mounted using one of these filesystems needs to somehow make up a 'device number' for itself, and this device number can't be the same as any real block device. This generally requires Unixes to carve out a section of their overall block device numbers that's reserved for filesystems to do this with, in other words things that aren't actually block devices. Fortunately modern Unixes have generally made the namespace of device numbers be much larger than it used to be.

(Then because device numbers for block devices are generally stable, a certain amount of software expects the 'device number' returned as part of file attributes to also be stable, for any arbitrary filesystem. When the kernel and a filesystem has to make this number up on the fly, this is not always the case.)

At the same time, this is a good design for V7 itself, in the time and the context. V7 and its kernel were intended to be a small system, and in a small system you don't want to go doing extra work unless you absolutely have to, especially in the kernel. V7 could reuse the device number to be the filesystem identifier essentially for free, so that's what it did.

(V7's kernel took any number of shortcuts in the interests of having a simple implementation. For instance, a lot of things were stored in small fixed-sized arrays, because you would never have more than a modest number of processes, open files, or so on.)

Written on 24 June 2020.
« Sometimes it takes other people to show you some of your site's design flaws
What Prometheus Blackbox's TLS certificate expiry metrics are checking »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Jun 24 22:04:07 2020
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.