2018-03-26
xprt:
data for NFS mounts in /proc/self/mountstats
is per-fileserver, not per-mount
A while back I wrote about all of the handy NFS statistics that
appear in mountstats
for all of your NFS
mounts, including the xprt:
NFS RPC information.
For TCP mounts, this includes the local port and at the time I said:
port
: The local port used for this particular NFS mount. Probably not particularly useful, since on our NFS clients all NFS mounts from the same fileserver use the same port (and thus the same underlying TCP connection).
I then blithely talked about all of the remaining statistics as if they were specific to the particular NFS mount that the line was for. This turns out to be wrong, and the port number is in fact vital. I can demonstrate how vital by a little exercise:
$ fgrep xprt: /proc/self/mountstats | sort | uniq -c | sort -nr 105 xprt: tcp 903 1 1 0 62 97817460 97785284 11122 2101962256388 0 574 10700678 55890249 82 xprt: tcp 1005 1 1 0 0 48538448 48536496 1788 48292655827 0 810 26226830 53362451 [...]
It's not a coincidence that we have 105 NFS filesystems mounted
from one fileserver and 82 from another. It turns out that at least
with TCP based NFS mounts, all NFS mounts from the same fileserver
will normally share the same RPC xprt transport, and it is the
xprt transport's statistics that are being reported here. As a
result, all of that xprt:
NFS RPC information is for all NFS
RPC traffic to the entire fileserver, not just the NFS RPC traffic
for this specific mount.
(For TCP mounts, the combination of the local port plus the
mountaddr=
IP address will identify which xprt transport a given
NFS mount is using. On our systems all NFS mounts from a given
fileserver use the same port
and thus the same xprt transport, but this may not always be the
case. Also, each different fileserver is using a different local
port, but again I'm not sure this is guaranteed.)
If the system is sufficiently busy doing NFS (and has enough NFS
mounts), it's possible to see slightly different xprt:
values for
different mounts from a given fileserver that are using the same
xprt transport. This isn't a true difference; it's just an artifact
of the fact that the information for mountstats
isn't being
gathered all at once. If things update sufficiently frequently and
fast, an early mount will report slightly older xprt:
values than
a later mount.
If you want to get a global view of RPC to a given fileserver, this
is potentially convenient. If you want to get a per-mount view,
it's inconvenient. For instance, to get the total number of NFS
requests sent by this mount or the total bytes sent and received
by it, you can't just look at the xprt:
stats; instead you'll
need to add up the counts from the per-operation statistics. Much of the information you want can be
found by summing up per-operation stats this way, but I haven't
checked to see if all of it can be.
There are probably clever things that can be done by combining
and contrasting the xprt
global stats and the per-mount stats
you can calculate. I haven't tried to wrangle those metrics yet, though.
PS: The way that I found this is that the current version of
nfsiostat
does its sorting for -s
based on the xprt:
statistics, which
gave us results that were sufficiently drastically off that it was
obvious something was wrong.
(I suppose I should file a bug report about this with the nfs-utils
people. My last bug report experience there
went pretty smoothly and the current nfsd(7)
manpage is now
accurate.)