The challenge of analyzing NFS packet traces

June 9, 2010

Suppose that you are having NFS performance problems on a single NFS client machine (for simplicity), where every so often the system's load spikes dramatically, and you want to track down what's going on. In an ideal world the client's operating system would have built in monitoring that you could use to analyze this. We do not live in that ideal world.

(Maybe someday DTrace and SystemTap and so on will ensure that we do, especially if the kernel NFS server and client code is instrumented properly. I'm not holding my breath.)

This leaves you to do NFS performance monitoring without the cooperation of the operating system, what I'll call passive NFS performance monitoring. In theory there's a straightforward way to do this, since NFS is a network protocol; we can just capture the NFS packets as they go by and then decode them (and match them up with each other) in order to get a full trace of NFS operations and how long each of them took. With this in hand you can proceed to do higher-level analysis and hopefully turn up all sorts of interesting and useful things.

(This is actually better data than you can usually get for local IO, since you get not just a detailed IO trace but also why the IO was done, in the form of the NFS operations themselves.)

Decoding NFS traffic is not a trivial thing, since NFS is a multi-layered protocol with its own peculiar encoding system (XDR over RPC), but it's generally a solved one; there are lots of programs that already do this, and you can borrow code from any number of them (making it fit into your program is only a small matter of programming). But doing a basic decode of NFS packets is the easy part; the real problem for analysis is our friend NFS filehandles.

If you're running NFS in the usual configuration, the RPC packet itself will tell you what user is doing the operation (in the form of the RPC authenticator, which will have their Unix UID) and the basic NFS packet tells you what the operation is. But you also want to know at least what server filesystem they're trying to do something to, and that information is only available in the NFS filehandle.

(That user X did a directory lookup on server Z and it was slow tells you much less than that X looked up something in their home directory filesystem instead of the mail spool.)

Of course, the NFS filehandle is opaque and server-specific. Every server puts information about what filesystem a filehandle is for in the filehandle somewhere (they have to), but they don't put it in the same place and they can encode it completely differently. Additionally, filehandles do not come with a handy identifier for what sort of NFS server they come from; your filehandle decoder is either going to have to guess or just be told (by you) that server X is operating system Y (on hardware Z, just for extra fun, since some operating systems encode filehandles in native byte order).

(If you want to see what sort of madness results from this, look at nfswatch's parsenfsfh.c. Note that it is of course incomplete; it is missing at least Solaris 10.)

Next, I have thoughtfully elided a step here. Decoding the filehandle will give you some sort of number that the NFS server uses as a filesystem identifier. But there is no certainty that the NFS server will give you any convenient way of finding out this identifier for its filesystems; the filehandle identifier may or may not be the same as the fsid in a GETATTR request, and the server kernel may or may not expose any other way of getting it. I know of at least one NFS server where the only good way of finding out this information for some filesystems is to use the kernel debugger.

I rather suspect that all of this heartburn is why general passive NFS trace analysis and performance monitoring programs are kind of thin on the ground. Nfswatch is the only one that I really know of.

(Note that nfswatch can be run on the NFS client as well as on the server, and there is some use for this. You may have to hack it a bit to decode NFS filehandles for the server type that you care about, mind you.)

(PS: this is the kind of entry where I will be overjoyed if people immediately tell me that there are these five handy tools that I've never heard of before.)

Written on 09 June 2010.
« Focusing on what you actually need in a program
What an iSCSI Enterprise Target kernel message really means »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Jun 9 01:42:51 2010
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.