Restarting nfs-server on a Linux NFS (v3) server isn't transparent

A while back I wrote an article on enabling NFS v4 on an Ubuntu 22.04 fileserver (instead of just NFS v3), where one of the final steps was to restart 'nfsd', the NFS server daemon (sort of), with 'systemctl restart nfs-server'. In that article I said that as far as I could tell this entire process was transparent to NFS v3 clients that were talking to the NFS server. Unfortunately I have to take that back. Restarting 'nfs-server' will cause the NFS server to discard locks obtained by NFS v3 clients, without telling the NFS v3 clients anything about this. This results in the NFS v3 clients thinking that they hold locks while the NFS server believes that everything is unlocked and so will allow another client to lock it.

(What happens with NFS v4 clients is more uncertain to me; they may more or less ride through things.)

On Linux, the NFS server is in the kernel and runs as kernel processes, generally visible in process lists as '[nfsd]'. You might wonder how these processes are started and stopped, and the answer is through a little user-level shim, rpc.nfsd. What this program actually does is write to some files in /proc/fs/nfsd that control the portlist, the NFS versions offered, and the number of kernel nfsd threads that are running. To restart (kernel) NFS service, the nfs-server.service unit first stops it with 'rpc.nfsd 0', telling the kernel to run '0' nfsd threads, and then starts it again by writing some appropriate number of threads into place, which starts NFS service. The nfs-server.service systemd unit also does some other things.

(As a side note, you can see what NFS versions your NFS server is currently supporting by looking at /proc/fs/nfsd/versions. Sadly this can't be changed while there are NFS server threads running.)

If you restart the kernel NFS server either with 'systemctl restart nfs-server' or by hand by writing '0' and then some number to /proc/fs/nfsd/threads, the kernel will completely drop knowledge of all locks from NFS v3 clients. Unfortunately running 'sm-notify' doesn't seem to recover them; they're just gone. Locks from NFS v4 clients suffer a somewhat less predictable and certain fate. If the NFS v4 client is actively doing NFS operations to the server, its locks will generally be preserved over a 'systemctl restart nfs-server'. If the client isn't actively doing NFS operations and doesn't do any for a while, I'm not certain that its locks will be preserved, and certainly they aren't immediately there (they seem to only come back when the NFS v4 client re-attaches to the server).

Looked at from the right angle, this makes sense. The kernel has to release locks from NFS clients when it stops being an NFS server, and a sensible signal that it's no longer an NFS server is when it's told to run zero NFS threads. However, it does seem to lead to an unfortunate result for at least NFS v3 clients.

