2023-08-22
We have a NFS v3 locking problem on our Ubuntu 22.04 servers
I've recently written about things like finding who owns NFS v3 locks on a Linux server, breaking NFS locks on 22.04, and experimenting with NFS v4, where I mentioned in an aside that NFS v4 seemed better regarded for file locking. All of this work has been quietly motivated by it becoming obvious to us that we have some sort of NFS (v3) file locking problem on our Ubuntu 22.04 ZFS fileservers.
Specifically, what we're seeing is stuck NFSv3 locks, where the NFS fileserver thinks that a NFS client holds a lock but the NFS client's kernel disagrees. This problem is new in Ubuntu 22.04 (we didn't see it on 18.04), and seems to occur mostly for our IMAP server as it accesses people's home directories. When it happens, our fileservers will claim that the IMAP server has a lock on some mailbox in someone's home directory, but the IMAP machine has no idea of it. At this point all further attempts to access the mailbox in question hang, because Dovecot attempts to get a lock first and that will fail.
(It's possible that other NFS clients are also seeing this issue but the symptoms are less obvious on them. On the other hand, I believe most of our NFS clients do very little NFS locking, and presumably the volume of lock activity is a factor in triggering this.)
Our habit with all of our NFS fileservers is to freeze their kernel version unless there's a compelling reason to go through the risks of an upgrade, so they're all behind the current 22.04 kernels; these days, this includes our IMAP server. On the other hand, the Ubuntu kernel source doesn't seem to have any changes to the relevant sections of code from the kernel versions we're running, and I didn't see anything in the changelogs. If upgrading the kernel fails to resolve the problem (and I suspect that it won't help), then the only other option I can see is moving to NFS v4 in the hopes that its locking won't have the same issues. This is a rather bigger change, and correspondingly is riskier, but at some point we may have no real choice.
There are no kernel messages being logged on either the IMAP machine or the ZFS fileservers. It's probably possible to use kernel instrumentation to trace NFS lock and unlock operations on both the server and clients in order to try to spot the point where an unlock either fails or isn't done, but since very few lock operations go wrong this would be a very high volume activity with relatively little signal.
(And in general NFS v3 locks aren't very inspectable on the NFS server; you have to resort to diving into the kernel internals to get what should be straightforward system management information. It's rather easier to get information about NFS v4 locks on the fileserver.)
(Our use of ZFS may be a contributing factor here, per the potential risks of using (Open)ZFS on Linux.)