How NFS deals with the pending delete problem
The pending delete problem is that in
Unix it's valid to unlink()
a file that you (or someone) has
open()
. If you do this, the processes with the file open must not
lose access to it but the file also needs to vanish from the
filesystem. If you ignore some issues this is easily handled in the
kernel for local filesystems, but when I originally talked about
this, I said that Sun had had to come
up with a different solution for NFS. So let's talk about that.
The problem with pending deletes on NFS is that NFS is a stateless
protocol. The server deliberately doesn't keep track of or even
know whether or not clients have a file open; all it sees is a
stream of requests for a NFS filehandle.
This means that if you tell the server to delete the file, well,
it's going to do that; it has no idea whether or not your client
still has the file open and expects it to keep working. At the same
time the clients can't not make the file go away when they get told
to; users and programs that do 'unlink(fname)
' are going to get
peeved if it fails with 'file is in use' or if it doesn't actually
go away.
The solution to this conflict is what sometimes gets called 'NFS
silly renames'. When a NFS client is asked to unlink()
a NFS file
that it knows is still in active use, it doesn't tell the server
to delete the file but instead renames it to .nfs<random>
. When
the last process closes the last file descriptor to the theoretically
deleted file, the client kernel finally tells the NFS server to
actually delete the lingering .nfs*
file. This works surprisingly
well and does most of what people expect when they unlink()
an
actively used file.
(One sign that it works very well is that most people who use NFS have never noticed this going on behind the scenes.)
Of course any number of things can go wrong with this scheme in
corner cases (or not so corner cases). The obvious one is that if
the client kernel crashes during this process there's nothing left
to clean up the .nfs*
files. As a result, many NFS servers come
with scripts that run find
on your filesystems to spot any lingering
.nfs*
files that are too old and delete them. Another problem is
that this only works when everything is on the same client; if you
have the file open()
on one client and unlink()
it on another,
well, the second client is just going to tell the server to delete
the file and now the first client has a stale filehandle. Such is life with a stateless
network filesystem; people have learned to live with it.
(Before people get too down on NFS over this issue, I want to say that in general NFS is a remarkably good and successful Unix network filesystem. That it has minor drawbacks in no way detracts from its major successes.)
|
|