The xprt: data for NFS mounts in /proc/self/mountstats

October 5, 2013

The xprt: line in mountstats reports various bits of overall NFS RPC information for the particular NFS mount in question. The information reported differs somewhat depending on the protocol in question, so I'm going to start by discussing the TCP stats. These look like (for the 1.0 format):

xprt:   tcp 695 1 1 0 16 96099368 96091328 6383 341933213458 1504192

The number fields are, in order:

  1. port: The local port used for this particular NFS mount. Probably not particularly useful, since on our NFS clients all NFS mounts from the same fileserver use the same port (and thus the same underlying TCP connection).
  2. bind count: I believe that this is basically how many times this mount has had to start talking to the NFS server from scratch. Normally 1. I don't know if it goes up if the NFS server reboots.
  3. connect count: How many times the client has made a TCP connection to the NFS server.
  4. connect 'idle' time: How long (in jiffies, an internal kernel measure of time, not seconds) this NFS mount has spent waiting for its connection(s) to the server to be established.
  5. idle time: How long (in seconds) since the NFS mount saw any RPC traffic.
  6. sends: How many RPC requests this mount has sent to the server.
  7. receives: How many RPC replies this mount has received from the server. Not every request necessarily gets a reply, due to timeouts and retries.
  8. bad XIDs: Every RPC request and reply has an XID. This counts how many times the NFS server has sent us a reply with an XID that didn't correspond to any outstanding request we knew about.
  9. cumulative 'active' request count: Every time we send a request, we add the current difference between sends and receives to this number. Since we've just sent a request, this is always going to be at least one. This number is not as useful as you think it should be.
  10. cumulative backlog count: Every time we send a request, we add the current backlog queue size to this counter.

    Recent versions of the kernel report a format version of 1.1 instead of 1.0. This adds three more counters:

  11. maximum RPC slots ever used: The maximum number of simultaneously active RPC requests that this mount has ever had.
  12. cumulative sending queue count: Every time we send a request, we add the current size of the sending queue to this counter.
  13. cumulative pending queue count: Every time we send a request, we add the current size of the pending queue to this counter.

(For all of the 'cumulative' counters you must divide by the number of requests to get the actual average, either for all time or over a particular time interval.)

What we would like with sends and receives is that the difference between them is the number of currently outstanding requests. Unfortunately it doesn't work that way, because requests that time out and are retried only increment sends, not receives. If you have any NFS server retries at all, the nominal 'currently outstanding requests' figure will drift steadily upwards due to this. In turn this make the cumulative active request count not useful except when measured over time intervals in which no timeouts and retries happen (which implies that you cannot use it to determine an 'average outstanding requests to date' figure; on some of our client machines for some filesystems, this can wind up claiming that the average is over a thousand).

(Note that the absence of 'nfs: server X not responding, still trying' messages doesn't mean that no requests were retried over that time. The kernel only starts logging those messages if requests are significantly delayed and retried.)

Now let's talk about RPC slots. Every NFS mount can have only so many active RPC requests at once, regardless of their state; each request is said to take up a slot. If something tries to submit an RPC request and there is no slot, the task goes to sleep on the backlog queue (how often this happens has changed between kernel versions). Once a request has been given a slot, it gets put on the queue to be sent (since many processes can try to send things at once, sending must be serialized). As far as I can tell, RPC requests wind up on the pending queue when they are waiting for one of a number of things, apparently including buffer space and a reply from the server (for requests that expect a reply, although I don't understand the logic in the code). It's possible that exactly what winds up in the pending queue when has changed over different kernel versions.

At least in our kernels, all NFS mounts from a single fileserver use the same local port number. This means that they all use the same underlying TCP connection (since the TCP connection is identified by the quad of local port, remote port, local IP, and remote IP and the last three of those are necessarily the same for all mounts). In turn this implies that all NFS RPC requests for all mounts from a single fileserver must serialize to be sent to the network, making the per mount cumulative sending queue sizes not exactly independent from each other.

For the local RPC transport there is no local port (field #1); all other fields are the same. For the udp RPC transport there is no connection count, connect idle time, or idle time (fields #3, #4, and #5); all other fields are the same. This omission of fields is quite annoying since it means you get to shuffle your code around instead of just ignoring certain fields for certain transports (they could perfectly well report the particular statistic as zero).

Update (2018): In modern Linux kernels (at least), xprt: data is basically per-fileserver, not per-mount. See this entry for more details.

Sidebar: Where this is in the kernel code

As mentioned in the introduction to this series, the code that prints this information is in net/sunrpc/xprtsock.c and the stat structure it takes information from is in include/linux/sunrpc/xprt.h. Note that the elements from this structure are not printed in order; you need to look at the code to see how they all emerge.


Comments on this page:

By Anonymous at 2014-03-13 06:39:01:

I am seeing 0 for field10 - if it's cumulative as described I'd expect it being much higher than that (and field9 is currently ~14M).

By cks at 2014-03-13 14:27:49:

I don't know why you're seeing what you're seeing in field 10, but the kernel source very definitely uses it as a cumulative field. See the code in xprt_transmit() in net/sunrpc/xprt.c where it increments xprt->stat.bklog_u, specifically:

xprt->stat.bklog_u += xprt->backlog.qlen;
Written on 05 October 2013.
« The bytes and events data for NFS mounts in /proc/self/mountstats
The per-NFS-operation data for NFS mounts in /proc/self/mountstats »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Oct 5 00:32:26 2013
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.