Digging into BSD's choice of Unix group for new directories and files

May 5, 2017

I have to eat some humble pie here. In comments on my entry on an interesting chmod failure, Greg A. Woods pointed out that FreeBSD's behavior of creating everything inside a directory with the group of the directory is actually traditional BSD behavior (it dates all the way back to the 1980s), not some odd new invention by FreeBSD. As traditional behavior it makes sense that it's explicitly allowed by the standards, but I've also come to think that it makes sense in context and in general. To see this, we need some background about the problem facing BSD.

In the beginning, two things were true in Unix: there was no mkdir() system call, and processes could only be in one group at a time. With processes being in only one group, the choice of the group for a newly created filesystem object was easy; it was your current group. This was felt to be sufficiently obvious behavior that the V7 creat(2) manpage doesn't even mention it.

(The actual behavior is implemented in the kernel in maknode() in iget.c.)

Now things get interesting. 4.1c BSD seems to be where mkdir(2) is introduced and where creat() stops being a system call and becomes an option to open(2). It's also where processes can be in multiple groups for the first time. The 4.1c BSD open(2) manpage is silent about the group of newly created files, while the mkdir(2) manpage specifically claims that new directories will have your effective group (ie, the V7 behavior). This is actually wrong. In both mkdir() in sys_directory.c and maknode() in ufs_syscalls.c, the group of the newly created object is set to the group of the parent directory. Then finally in the 4.2 BSD mkdir(2) manpage the group of the new directory is correctly documented (the 4.2 BSD open(2) manpage continues to say nothing about this). So BSD's traditional behavior was introduced at the same time as processes being in multiple groups, and we can guess that it was introduced as part of that change.

When your process can only be in a single group, as in V7, it makes perfect sense to create new filesystem objects with that as their group. It's basically the same case as making new filesystem objects be owned by you; just as they get your UID, they also get your GID. When your process can be in multiple groups, things get less clear. A filesystem object can only be in one group, so which of your several groups should a new filesystem object be owned by, and how can you most conveniently change that choice?

One option is to have some notion of a 'primary group' and then provide ways to shuffle around which of your groups is the primary group. One problem with this is that it's awkward and error-prone to work in different areas of the filesystem where you want your new files and directories to be in different groups; every time you cd around, you may have to remember to change your primary group. If you move into a collaborative directory, better shift (in your shell) to that group; cd back to $HOME, or simply want to write a new file in $HOME, and you'd better remember to change back.

Another option is the BSD choice of inheriting the group from context. By far the most common case is that you want your new files and directories to be created in the 'context', ie the group, of the surrounding directory. If you're working in $HOME, this is your primary login group; if you're working in a collaborative area, this is the group being used for collaboration. Arguably it's a feature that you don't even have to be in that group (if directory permissions allow you to make new files). Since you can chgrp directories that you own, this option also gives you a relatively easy and persistent way to change which group is chosen for any particular area.

If you fully embrace the idea of Unix processes being in multiple groups, not just having one primary group and then some number of secondary groups, then the BSD choice makes a lot of sense. And for all of its faults, BSD tended to relatively fully embrace its changes (not totally, perhaps partly because it had backwards compatibility issues to consider). While it leads to some odd issues, such as the one I ran into, pretty much any choice here is going to have some oddities. It's also probably the more usable choice in general if you expect much collaboration between different people (well, different Unix logins), partly because it mostly doesn't require people to remember to do things.

(I know that on our systems, a lot of directories intended for collaborative work tend to end up being setgid specifically to get this behavior.)

Written on 05 May 2017.
« My views on using LVM for your system disk and root filesystem
The temptation of a Ryzen-based machine for my next office workstation »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri May 5 01:00:53 2017
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.