Wandering Thoughts archives

2008-04-24

What Linux's RPC queue dump means, sort of

Since I went digging through the kernel source code yesterday, here is the meaning of the fields in the RPC queue dump that you get any time you write to /proc/sys/sunrpc/rpc_debug.

As far as I've been able to work out, the useful fields are:

-pid- An internal RPC sequence number; it has nothing to do with process PIDs, despite the name.
proc The RPC procedure number being invoked, in decimal; you can find which NFS action is which number from places like include/linux/nfs3.h in the Linux kernel source.
flgs The RPC flags in hex; see include/linux/sunrpc/sched.h for the values.
status Either 0 or a (negated) errno if the task has hit an error.
-client- An opaque identifier for the client (literally, the pointer to the RPC client structure in the kernel).
-prog- What RPC facility is being invoked, in decimal; NFS is RPC 'program' 100003. You can find out the program numbers most easily with rpcinfo -p.
-timeout The timeout, in jiffies.
-rpcwait What the task is waiting for, if anything.

(The --rqstp-, -action-, and --exit-- fields are better off ignored if you are not debugging the kernel; they are pointers to internal kernel structures. About all you can do with them is see which tasks have the same one.)

Unfortunately, there is no way to map from a particular RPC request to the processes that are waiting on it; the -pid- field is mostly useful for matching things up with any other debugging messages that you have the RPC system produce.

linux/RPCDumpMeaning written at 23:14:06; Add Comment

A brief mention of some tools for debugging Linux NFS client issues

Someone here recently asked for tips on debugging a mysterious Linux NFS client hang. I didn't have any answers, but I did happen to know where to look for some Linux-specific tools. (The person had already exhausted the abilities of things like tcpdump to help.)

The most obvious thing is to use the magic SysRq to get a dump of the kernel call stacks of all processes (the t command). Once you find the hanging processes in all of the output, you can usually see what operations they're hanging on, both high level and somewhat low level.

(Here's where I observe that it's a pity that there's no way to ask for a magic SysRq dump of a specific process. Hopefully someone will now tell me that I'm wrong.)

The Linux NFS client also has its own debugging hooks, accessible through /proc/sys/sunrpc; unfortunately, they're rather underdocumented and magical. What you want are the files rpc_debug and nfs_debug, each of which is a bitmap of flags that control which RPC or NFS events get logged; you write a decimal integer to them to set the bitmap's value, or a 0 to turn off all logging.

(In addition, writing any number to rpc_debug will give you a cryptic dump of RPC 'task' information. Having just read through a bunch of kernel source code, my opinion is that there is almost nothing useful in it unless you are a kernel hacker. If you really want this dump and nothing else, write a 0 to rpc_debug.)

The values for the various things you can get reports of are found in the kernel source in include/linux/sunrpc/debug.h (the RPCDBG_ #defines) and include/linux/nfs_fs.h (the NFSDBG_ #defines). You can use a suitably large value like 32767 to turn everything on.

Note that this can produce a lot of kernel messages very fast, especially if you turn on lots of things. Also, one of the big reasons this stuff is not documented is that it is primarily intended for kernel hackers, so to understand the results you may need to go dig in the kernel NFS and RPC code (in fs/nfs and net/sunrpc respectively).

(There are similar debug files for the NFS server and for the NLM. Exploring these is left as an exercise for the reader.)

linux/NFSClientDebuggingBits written at 00:33:59; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.