Basic NFS v4 seems to just work (so far) on Ubuntu 22.04

July 6, 2023

I've been saying grumpy things about NFS v4 for a fairly long time now, and in response for a while people have been telling me that these days NFS v4 can look basically just like NFS v3. You can have your traditional Unix permissions model (the NFS without Kerberos one) and you don't have to reorganize your exports and so on. Recently I decided to give it a try on some scratch virtual machines running our standard Ubuntu 22.04 LTS setup, and to my pleasant surprise it does seem to just work.

To test, I installed Ubuntu's NFS server package, made a scratch directory in the same place we'd use for a real ZFS filesystem on a fileserver (which is not under /exports), put in exactly the same export options and permissions in /etc/exports.d/<file>.exports (including 'sec=sys'), and NFS mounted it on a test NFS client. Then I used it on the client as both a regular user and as 'root', testing with root squashing on (our normal setup) and off (used for some filesystems). All of this worked, with none of the various glitches that have happened to us in the past when we tried this sort of thing.

Part of the reason it worked this transparently is that the client and the server both had our standard /etc/resolv.conf and had their hostnames in a standard format (and have fully qualified domain names in the same subdomain). My understanding is that this matters because for 'sec=sys', NFS v4 clients and servers need to agree on a NFS v4 domain name to insure that login 'fred' on the client is the same as login 'fred' on the server. This 'domain name' can be set explicitly in idmapd.conf(5), but if you don't do this it's derived from the DNS domain names of the hosts involved. In a production deployment, we'd probably want to set this specifically in idmapd.conf just to avoid problems.

I suspect that there are other traps in actual use. One thing I've already noticed is that the kernel client code doesn't appear to log any messages if a NFS v4 server stops responding, unlike with NFS v3. These messages are useful for us for tracking NFS server problems and seeing when they start to go away. Possibly there's other signals we can tap into.

My interest is because NFS v4 seems to be better regarded in general and especially for file locking (which is integrated into the protocol in NFS v4 but is a separate thing in NFS v3). My impression is that the Linux kernel NFS people would rather you use NFS v4, and so NFS v4 is likely to get more bugs fixed and so on in the future. (Possibly this is incorrect.)


Comments on this page:

From 193.219.181.219 at 2023-07-07 03:53:35:

My interest is because NFS v4 seems to be better regarded in general and especially for file locking (which is integrated into the protocol in NFS v4 but is a separate thing in NFS v3).

A bit more than that; the reboot lock recovery is completely re-done (the client loses locks as soon as it disconnects, not when it notifies about a reboot, so there shouldn't be any more stuck locks).

Depending on client version you might want to make sure clients mount with vers=4.2, as e.g. Debian 11 still used to default to version 4.0 even though it had support for 4.2. Personally I want 4.2 for reflinks (OpenZFS now practically has reflinks with "block cloning"), server-side copy, and better sparse-file support; in your case the "Courteous Server" feature might be relevant.

(which is not under /exports)

For reference, it's nfs-utils 1.2.2 and Linux 2.6.33 (~2010) that removed the requirement to define a 'fsid=root' export for NFSv4 (by implementing a virtual pseudo root).

My understanding is that this matters because for 'sec=sys', NFS v4 clients and servers need to agree on a NFS v4 domain name to insure that login 'fred' on the client is the same as login 'fred' on the server.

Not exactly for 'sec=sys' – that's still UID-based as before – but for the reverse; it affects how stat() results (i.e. `ls -l`) are reported back from servers to clients. (And I suppose that means it also affects chown/chgrp? I am not sure.) That is, it affects NFS but not the underlying SunRPC.

I believe the intent of NFSv4 adding idmapping was to help non-'sec=sys' authentication (Kerberos and the now-dead SPKM), i.e. when the UIDs aren't guaranteed to match but when clients still want to see something sensible in ls -l output.

(NFSv4 used to have more non-sys mechanisms than Kerberos – namely SPKM that was supposed to be a simple public-key based auth method, in the same way that SSH is, without requiring the infrastructure that Kerberos does; but somehow it never succeeded.)

One thing I've already noticed is that the kernel client code doesn't appear to log any messages if a NFS v4 server stops responding, unlike with NFS v3

Those messages come from the SunRPC layer, so they should not disappear entirely... but my wild guess is that you were using NFSv3 via UDP before, whereas NFSv4 requires TCP, and the default timeo=/retrans= timeouts are much higher when TCP is in use (as far as I understand, it's because TCP handles retransmissions on its own so RPC doesn't need to). So I suppose you only see the messages when TCP gives up after a good ~10-15 minutes.

By cks at 2023-07-07 13:43:50:

Our NFS v3 mounts are also using TCP. I'll do some more testing to see about the 'NFS server not responding' messages, since my initial test environment may have been a bit odd.

Written on 06 July 2023.
« The mere 'presence' of an URL on a web server is not a good signal
Our experience with nftables and 'iptables' on Ubuntu 22.04 »

Page tools: View Source, View Normal.
Search:
Login: Password:

Last modified: Thu Jul 6 22:43:35 2023
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.