The history of file type information being available in Unix directories

August 25, 2018

The two things that Unix directory entries absolutely have to have are the name of the directory entry and its 'inode', by which we generically mean some stable kernel identifier for the file that will persist if it gets renamed, linked to other directories, and so on. Unsurprisingly, directory entries have had these since the days when you read the raw bytes of directories with read(), and for a long time that was all they had; if you wanted more than the name and the inode number, you had to stat() the file, not just read the directory. Then, well, I'll quote myself from an old entry on a find optimization:

[...], Unix filesystem developers realized that it was very common for programs reading directories to need to know a bit more about directory entries than just their names, especially their file types (find is the obvious case, but also consider things like 'ls -F'). Given that the type of an active inode never changes, it's possible to embed this information straight in the directory entry and then return this to user level, and that's what developers did; on some systems, readdir(3) will now return directory entries with an additional d_type field that has the directory entry's type.

On Twitter, I recently grumbled about Illumos not having this d_type field. The ensuing conversation wound up with me curious about exactly where d_type came from and how far back it went. The answer turns out to be a bit surprising due to there being two sides of d_type.

On the kernel side, d_type appears to have shown up in 4.4 BSD. The 4.4 BSD /usr/src/sys/dirent.h has a struct dirent that has a d_type field, but the field isn't documented in either the comments in the file or in the getdirentries(2) manpage; both of those admit only to the traditional BSD dirent fields. This 4.4 BSD d_type was carried through to things that inherited from 4.4 BSD (Lite), specifically FreeBSD, but it continued to be undocumented for at least a while.

(In FreeBSD, the most convenient history I can find is here, and the d_type field is present in sys/dirent.h as far back as FreeBSD 2.0, which seems to be as far as the repo goes for releases.)

Documentation for d_type appeared in the getdirentries(2) manpage in FreeBSD 2.2.0, where the manpage itself claims to have been updated on May 3rd 1995 (cf). In FreeBSD, this appears to have been part of merging 4.4 BSD 'Lite2', which seems to have been done in 1997. I stumbled over a repo of UCB BSD commit history, and in it the documentation appears in this May 3rd 1995 change, which at least has the same date. It appears that FreeBSD 2.2.0 was released some time in 1997, which is when this would have appeared in an official release.

In Linux, it seems that a dirent structure with a d_type member appeared only just before 2.4.0, which was released at the start of 2001. Linux took this long because the d_type field only appeared in the 64-bit 'large file support' version of the dirent structure, and so was only return by the new 64-bit getdents64() system call. This would have been a few years after FreeBSD officially documented d_type, and probably many years after it was actually available if you peeked at the structure definition.

(See here for an overview of where to get ancient Linux kernel history from.)

As far as I can tell, d_type is present on Linux, FreeBSD, OpenBSD, NetBSD, Dragonfly BSD, and Darwin (aka MacOS or OS X). It's not present on Solaris and thus Illumos. As far as other commercial Unixes go, you're on your own; all the links to manpages for things like AIX from my old entry on the remaining Unixes appear to have rotted away.

Sidebar: The filesystem also matters on modern Unixes

Even if your Unix supports d_type in directory entries, it doesn't mean that it's supported by the filesystem of any specific directory. As far as I know, every Unix with d_type support has support for it in their normal local filesystems, but it's not guaranteed to be in all filesystems, especially non-Unix ones like FAT32. Your code should always be prepared to deal with a file type of DT_UNKNOWN.

(Filesystems can implement support for file type information in directory entries in a number of different ways. The actual on disk format of directory entries is filesystem specific.)

It's also possible to have things the other way around, where you have a filesystem with support for file type information in directories that's on a Unix that doesn't support it. There are a number of plausible reasons for this to happen, but they're either obvious or beyond the scope of this entry.

Written on 25 August 2018.
« Incremental development in Python versus actual tests
How ZFS maintains file type information in directories »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Aug 25 00:31:39 2018
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.