A brief history of fiddling with Unix directories

March 19, 2015

In the beginning (say V7 Unix), Unix directories were remarkably non-special. They were basically files that the kernel knew a bit about. In particular, there was no mkdir(2) system call and the . and .. entries in each directory were real directory entries (and real hardlinks), created by hand by the mkdir program. Similarly there was no rmdir() system call and rmdir directly called unlink() on dir/.., dir/., and dir itself. To avoid the possibility of users accidentally damaging the directory tree in various ways, calling link(2) and unlink(2) on directories was restricted to the superuser.

(In part to save the superuser from themselves, commands like ln and rm then generally refused to operate on directories at all, explicitly checking for 'is this a directory' and erroring out if it was. V7 rm would remove directories with 'rm -r', but it deferred to rmdir to do the actual work. Only V7 mv has special handling for directories; it knew how to actually rename them by manipulating hardlinks to them, although this only worked when mv was run by the superuser.)

It took until 4.1 BSD or so for the kernel to take over the work of creating and deleting directories, with real mkdir() and rmdir() system calls. The kernel also picked up a rename() system call at the same time, instead of requiring mv to do the work with link(2) and unlink(2) calls; this rename() also worked on directories. This was the point, not coincidentally, where BSD directories themselves became more complicated. Interestingly, even in 4.2 BSD link(2) and unlink(2) would work on directories if you were root and mknod(2) could still be used to create them (again, if you were root), although I suspect no user level programs made use of this (and certainly rm still rejected directories as before).

(As a surprising bit of trivia, it appears that the 4.2 BSD ln lacked a specific 'is the source a directory' guard and so a superuser probably could accidentally use it to make extra hardlinks to a directory, thereby doing bad things to directory tree integrity.)

To my further surprise, raw link(2) and unlink(2) continued to work on directories as late as 4.4 BSD; it was left for other Unixes to reject this outright. Since the early Linux kernel source is relatively simple to read, I can say that Linux did from very early on. Other Unixes, I have no idea about. (I assume but don't know for sure that modern *BSD derived Unixes do reject this at the kernel level.)

(I've written other entries on aspects of Unix directories and their history: 1, 2, 3, 4.)

PS: Yes, this does mean that V7 mkdir and rmdir were setuid root, as far as I know. They did do their own permission checking in a perfectly V7-appropriate way, but in general, well, you really don't want to think too hard about V7, directory creation and deletion, and concurrency races.

In general and despite what I say about it sometimes, V7 made decisions that were appropriate for its time and its job of being a minimal system on a relatively small machine that was being operated in what was ultimately a friendly environment. Delegating proper maintenance of a core filesystem property like directory tree integrity to user code may sound very wrong to us now but I'm sure it made sense at the time (and it did things like reduce the kernel size a bit).

Written on 19 March 2015.
« The real speed advantage static rendering has over dynamic rendering
Unix's mistake with rm and directories »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Mar 19 00:28:49 2015
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.