Linux NFS clients (normally) make only one TCP connection to each fileserver

October 8, 2022

Suppose that you have a Linux NFSv3 client that mounts a large number of filesystems from a much smaller number of fileservers, which naturally means it mounts a bunch of filesystems from each fileserver. Modern NFS is TCP based, and TCP requires one or more connections in order to make things go. In this situation, you might wonder how many connections the kernel makes; for example, is it one connection per filesystem, or one connection per N filesystems, or so on.

As I found out some years ago when I looked at 'xprt:' data for NFS mounts in /proc/self/mountstats, and as I confirmed now with ss on an Ubuntu 22.04 system, the answer is that Linux normally makes only one TCP connection to each fileserver and multiplexes all NFS IO to all filesysems over it. This implicitly serializes all NFS IO to a fileserver through this single TCP connection, although the client can submit multiple NFS requests at once (conceptually) and the fileserver may answer them in any order.

Does this TCP level serialization matter? I don't know. Since multiple NFS requests and replies can be packed into a single TCP packet and this NFS TCP connection will normally have a large TCP window size, I suspect a single TCP connection won't limit how many NFS requests a single client can send to a fileserver at once or within a short period of time short of really exceptional situations. On the other hand, I believe that this means that the initial TCP level handling of incoming NFS requests (or replies) all happens serially on a single CPU. I would hope that the kernel NFS and 'sunrpc' code can then fan out NFS RPC requests across multiple CPUs, but I don't know for sure.

(This also only matters if you're concerned about a single very active NFS client that issues a lot of requests against a lot of filesystems, such as an NFS client that's an active IMAP server.)

Of course the extent that how many CPUs your fileserver is using to handle NFS requests matters depends on how fast the rest of the process of handling them is. Once upon a time that was often a slow thing as it involved 'spinning rust' HDDs, but these days some NFS fileservers may be using NVMe SSDs that can handle truly huge IO rates if fed enough of IO requests fast enough. And a NFS client needs to be making requests that can be done in parallel in the first place, likely ones involving different filesystems and different disks.

(I've vaguely known about the single connection per fileserver for a while, but the consequence of TCP serialization of NFS RPCs didn't occur to me until recently.)

Comments on this page:

This matters if LAG is involved. A single TCP stream will hash onto only one link of a LAG. Using the “nconnect” NFS option will create multiple TCP connections. This makes it likely that some of the TCP connections will hash onto other links, allowing you to more fully utilize the available bandwidth.

Written on 08 October 2022.
« How old various Unix signals are
Research Unix V7's (comparatively) long time gap from V6 »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Oct 8 23:17:47 2022
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.