2023-09-20
Restarting nfs-server on a Linux NFS (v3) server isn't transparent
A while back I wrote an article on enabling NFS v4 on an Ubuntu
22.04 fileserver (instead of just NFS v3),
where one of the final steps was to restart 'nfsd', the NFS server
daemon (sort of), with 'systemctl restart nfs-server
'. In that
article I said that as far as I could tell this entire process was
transparent to NFS v3 clients that were talking to the NFS server.
Unfortunately I have to take that back. Restarting 'nfs-server
'
will cause the NFS server to discard locks obtained by NFS v3
clients, without telling the NFS v3 clients anything about this.
This results in the NFS v3 clients thinking that they hold locks
while the NFS server believes that everything is unlocked and so
will allow another client to lock it.
(What happens with NFS v4 clients is more uncertain to me; they may more or less ride through things.)
On Linux, the NFS server is in the kernel and runs as kernel
processes, generally visible in process lists as '[nfsd]
'. You
might wonder how these processes are started and stopped, and the
answer is through a little user-level shim, rpc.nfsd
. What this
program actually does is write to some files in /proc/fs/nfsd that control
the portlist, the NFS versions offered, and the number of kernel
nfsd threads that are running. To restart (kernel) NFS service, the
nfs-server.service unit first stops it with 'rpc.nfsd 0', telling
the kernel to run '0' nfsd threads, and then starts it again by
writing some appropriate number of threads into place, which starts
NFS service. The nfs-server.service systemd unit also does some
other things.
(As a side note, you can see what NFS versions your NFS server is currently supporting by looking at /proc/fs/nfsd/versions. Sadly this can't be changed while there are NFS server threads running.)
If you restart the kernel NFS server either with 'systemctl restart
nfs-server
' or by hand by writing '0' and then some number to
/proc/fs/nfsd/threads, the kernel will completely drop knowledge
of all locks from NFS v3 clients. Unfortunately running 'sm-notify
' doesn't
seem to recover them; they're just gone.
Locks from NFS v4 clients suffer a somewhat less predictable and
certain fate. If the NFS v4 client is actively doing NFS operations
to the server, its locks will generally be preserved over a 'systemctl
restart nfs-server
'. If the client isn't actively doing NFS
operations and doesn't do any for a while, I'm not certain that its
locks will be preserved, and certainly they aren't immediately there
(they seem to only come back when the NFS v4 client re-attaches to
the server).
Looked at from the right angle, this makes sense. The kernel has to release locks from NFS clients when it stops being an NFS server, and a sensible signal that it's no longer an NFS server is when it's told to run zero NFS threads. However, it does seem to lead to an unfortunate result for at least NFS v3 clients.