Wandering Thoughts archives

2015-09-21

When chroot() started to confine processes inside the new root

Writing about the somewhat surprising history of chroot() did leave me with one question: when did chroot() start to confine processes inside the new root directory hierarchy? This is an interesting moment because it marks the point where chroot() stops being a little hack to help emulation and instead turns into a security feature.

(The first use of chroot() as a security feature seems to be in the 4.2BSD ftpd, as covered in the first entry. I can't be completely sure of this because I can't find an easily searchable version of the tuhs.org 4.1c BSD tree.)

Early versions of chroot() appear to be trivially escapable by things like 'cd /; cd ..', which puts you in the parent of the nominal root directory. A version of the chroot() system call that did not allow this appears in 4.1c BSD; you can see the code in namei(). Unlike the 4BSD version of the same code, this code specifically checks to see if you are trying to look up '..' at the chroot root directory, and remaps the result if you are.

I don't know for sure why this change appeared in 4.1c BSD, but it's possible to speculate. The 4BSD namei() is essentially the same as the V7 namei(), but the 4.1c BSD namei() is significantly changed in several ways (for example, it has a lot more comments). 4.1c BSD is the first appearance of two significant changes related to namei(); it's when BSD introduced both a rename() system call and the BSD FFS. It also seems to have seen a significant reorganization of the kernel source code away from its previous V7-like appearance. So I suspect that when the BSD people were changing namei() around anyways because of other changes, they noticed and fixed the chroot escape. With the chroot escape fixed, it was then used as a security feature in the 4.2BSD ftpd.

(The history portion of the Wikipedia page on chroot is no help, because it's clearly wrong unless you creatively reinterpret what it's saying. chroot() was not 'added' to BSD at any point, because BSD inherited it from V7 from the start. This bit of history appears to come from the references section of FreeBSD's Jails: Confining the omnipotent root (via) from 2000 and may refer either to the addition of a chroot(2) manpage or the namei() changes.)

Sidebar: The peculiar history of chroot() documentation

In V7, as I discovered, chroot() is documented in the chdir() manpage. However, while 32V, 3BSD, and 4BSD all still have the chroot() system call, documentation for it has disappeared from their chdir() manpages. A chroot() manpage (re)appears only in 4.1c BSD.

The 32V chdir() manpage seems to be the V7 manpage with the chroot() documentation removed (and it definitely isn't the V6 chdir() manpage). It may be that the chroot() stuff was removed because the 32V people thought it was a hack that was better off not being documented, or maybe 32V got their manpages from an earlier version of V7 that didn't have the chroot() addition.

ChrootHistoryII written at 02:16:10; Add Comment

2015-09-12

How NFS deals with the pending delete problem

The pending delete problem is that in Unix it's valid to unlink() a file that you (or someone) has open(). If you do this, the processes with the file open must not lose access to it but the file also needs to vanish from the filesystem. If you ignore some issues this is easily handled in the kernel for local filesystems, but when I originally talked about this, I said that Sun had had to come up with a different solution for NFS. So let's talk about that.

The problem with pending deletes on NFS is that NFS is a stateless protocol. The server deliberately doesn't keep track of or even know whether or not clients have a file open; all it sees is a stream of requests for a NFS filehandle. This means that if you tell the server to delete the file, well, it's going to do that; it has no idea whether or not your client still has the file open and expects it to keep working. At the same time the clients can't not make the file go away when they get told to; users and programs that do 'unlink(fname)' are going to get peeved if it fails with 'file is in use' or if it doesn't actually go away.

The solution to this conflict is what sometimes gets called 'NFS silly renames'. When a NFS client is asked to unlink() a NFS file that it knows is still in active use, it doesn't tell the server to delete the file but instead renames it to .nfs<random>. When the last process closes the last file descriptor to the theoretically deleted file, the client kernel finally tells the NFS server to actually delete the lingering .nfs* file. This works surprisingly well and does most of what people expect when they unlink() an actively used file.

(One sign that it works very well is that most people who use NFS have never noticed this going on behind the scenes.)

Of course any number of things can go wrong with this scheme in corner cases (or not so corner cases). The obvious one is that if the client kernel crashes during this process there's nothing left to clean up the .nfs* files. As a result, many NFS servers come with scripts that run find on your filesystems to spot any lingering .nfs* files that are too old and delete them. Another problem is that this only works when everything is on the same client; if you have the file open() on one client and unlink() it on another, well, the second client is just going to tell the server to delete the file and now the first client has a stale filehandle. Such is life with a stateless network filesystem; people have learned to live with it.

(Before people get too down on NFS over this issue, I want to say that in general NFS is a remarkably good and successful Unix network filesystem. That it has minor drawbacks in no way detracts from its major successes.)

NFSPendingDeletes written at 00:38:46; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.