The per-NFS-operation data for NFS mounts in /proc/self/mountstats

October 6, 2013

The last big chunk of statistics in mountstats is a bunch of separate information for each NFS operation. Since the exact operations are version-specific, the list of operations depends on the NFS version in use; however, the statistics are the same for all protocols (and all operations).

A typical example (for NFS v3) looks like this:

per-op statistics
        NULL: 0 0 0 0 0 0 0 0
     GETATTR: 192686 192688 0 22726724 21580832 2800 79242 83598
     SETATTR: 10782 10783 0 1699192 1552608 1139 18919 20152
[....]

Operations are actually numerical codes and here they're printed in that numerical order, although with their standard names instead of their numbers. For each operation the actual numbers are, in order:

  1. operations: How many requests we've done for this operation.
  2. transmissions: How many times we've actually transmitted a RPC request for this operation. As you might have gathered from the last entry, this can exceed the operation count due to timeouts and retries.
  3. major timeouts: How many times a request has had a major timeout. Major timeouts produce 'nfs: server X not responding, still trying' messages. You can have timeouts and retries without having major timeouts (and my example lines do).
  4. bytes sent: This includes not just the RPC payload but also the RPC headers and so on. It closely matches the on-the-wire size.
  5. bytes received: As with bytes sent, this is the full size.
  6. cumulative queue time: How long (in milliseconds) all requests spent queued for transmission before they were sent.
  7. cumulative response time: How long (in milliseconds) it took to get a reply back after the request was transmitted. The kernel comments call this the RPC RTT.
  8. cumulative total request time: How long (in milliseconds) all requests took from when they were initially queued to when they were completely handled. The kernel calls this the RPC execution time.

At a first guess I think you'll want to divide bytes sent by transmissions but bytes received by operations if you're computing the average request size. Doing it differently will only matter in the face of retransmissions due to retries.

As you can see from my example lines, the total request time is a bit larger than the sum of the queue time and the response time. I don't know enough about the Linux NFS RPC code to know where this extra time may come from.

All of this is extremely useful for performance monitoring, especially if you sum it up across all operations to get a global view of how RPC for a given mount is performing. Most NFS RPC operations have relatively obvious names but it's helpful to know that COMMIT is part of NFS v3 writes. The other issue is that in NFS v3 it's possible to set different flags on WRITE operations (some can be made synchronous) and the per-operation statistics don't break these out separately. You may thus see wildly different WRITE performance numbers at different times depending on whether they are synchronous (and generally slow) or asynchronous.

(In fact I don't think anything in the overall mountstats data lets you tell these two sorts of WRITEs apart, even to the extent of finding out how much data is being written with synchronous WRITEs and how much with asynchronous ones. This is an unfortunate omission since asynch writes are likely to be much faster ones.)

PS: there may be other issues waiting with NFS v4 operations. We don't use NFS v4 here so I've never looked at it.

Sidebar: where to find things in the kernel

The code that prints the data is the rpc_print_iostats() routine in net/sunrpc/stats.c. The data is explained in comments in include/linux/sunrpc/metrics.h .


Comments on this page:

By Anonymous at 2014-03-12 09:15:46:

Thanks for the excellent series on NFS mountstats! I haven't find anywhere as complete information as here (actually, as you state, haven't found basically any information at all).

There's only one thing which I think could be clarified a bit:

What is the relationship between /proc/self/mountstats and /proc/net/rpc/nfs? There are some hints like http://www.spinics.net/lists/linux-nfs/msg17186.html that nfsstat (which is using those files) is "legacy" and I've seen cases where "nfsstat -cv" reports wildly differently than mountstats but since you've been studying this a lot, do you have any additional insights on this?

By dsoooner at 2016-05-06 12:43:07:

I suspect that sync writes and async writes are actually handled the same way inside the kernel - i suspect the async/sync aspect is how that write api is exposed to the userspace apps

By cks at 2016-05-06 13:41:32:

The kernel definitely sets or passes through sync versus async write flags to the NFS WRITE operation. The actual code handling of the NFS WRITE may be mostly the same (since both are expected to get a reply from the server), but it cannot be entirely the same; the client has to keep track of outstanding asynchronous NFS WRITEs that have not yet been confirmed by the server. I'm sure it makes the stats code simpler to not track them separately (since they don't have different NFS operation codes, just different flags), but it's clearly less useful.

Written on 06 October 2013.
« The xprt: data for NFS mounts in /proc/self/mountstats
What reloading a Python module really does »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Oct 6 01:18:24 2013
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.