How NFS deals with the pending delete problem

September 12, 2015

The pending delete problem is that in Unix it's valid to unlink() a file that you (or someone) has open(). If you do this, the processes with the file open must not lose access to it but the file also needs to vanish from the filesystem. If you ignore some issues this is easily handled in the kernel for local filesystems, but when I originally talked about this, I said that Sun had had to come up with a different solution for NFS. So let's talk about that.

The problem with pending deletes on NFS is that NFS is a stateless protocol. The server deliberately doesn't keep track of or even know whether or not clients have a file open; all it sees is a stream of requests for a NFS filehandle. This means that if you tell the server to delete the file, well, it's going to do that; it has no idea whether or not your client still has the file open and expects it to keep working. At the same time the clients can't not make the file go away when they get told to; users and programs that do 'unlink(fname)' are going to get peeved if it fails with 'file is in use' or if it doesn't actually go away.

The solution to this conflict is what sometimes gets called 'NFS silly renames'. When a NFS client is asked to unlink() a NFS file that it knows is still in active use, it doesn't tell the server to delete the file but instead renames it to .nfs<random>. When the last process closes the last file descriptor to the theoretically deleted file, the client kernel finally tells the NFS server to actually delete the lingering .nfs* file. This works surprisingly well and does most of what people expect when they unlink() an actively used file.

(One sign that it works very well is that most people who use NFS have never noticed this going on behind the scenes.)

Of course any number of things can go wrong with this scheme in corner cases (or not so corner cases). The obvious one is that if the client kernel crashes during this process there's nothing left to clean up the .nfs* files. As a result, many NFS servers come with scripts that run find on your filesystems to spot any lingering .nfs* files that are too old and delete them. Another problem is that this only works when everything is on the same client; if you have the file open() on one client and unlink() it on another, well, the second client is just going to tell the server to delete the file and now the first client has a stale filehandle. Such is life with a stateless network filesystem; people have learned to live with it.

(Before people get too down on NFS over this issue, I want to say that in general NFS is a remarkably good and successful Unix network filesystem. That it has minor drawbacks in no way detracts from its major successes.)

Written on 12 September 2015.
« ZFS scrub rates, speeds, and how fast is fast
I should have started blocking web page elements well before now »

Page tools: View Source.
Search:
Login: Password:

Last modified: Sat Sep 12 00:38:46 2015
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.