A brief mention of some tools for debugging Linux NFS client issues
Someone here recently asked for tips on debugging a mysterious Linux
NFS client hang. I didn't have any answers, but I did happen to know
where to look for some Linux-specific tools. (The person had already
exhausted the abilities of things like tcpdump
to help.)
The most obvious thing is to use the magic SysRq
to get a dump of the kernel call stacks of all processes (the t
command). Once you find the hanging processes in all of the output, you
can usually see what operations they're hanging on, both high level and
somewhat low level.
(Here's where I observe that it's a pity that there's no way to ask for a magic SysRq dump of a specific process. Hopefully someone will now tell me that I'm wrong.)
The Linux NFS client also has its own debugging hooks, accessible
through /proc/sys/sunrpc
; unfortunately, they're rather
underdocumented and magical. What you want are the files rpc_debug
and nfs_debug
, each of which is a bitmap of flags that control which
RPC or NFS events get logged; you write a decimal integer to them to set
the bitmap's value, or a 0
to turn off all logging.
(In addition, writing any number to rpc_debug
will give you a
cryptic dump of RPC 'task' information. Having just read through a bunch
of kernel source code, my opinion is that there is almost nothing useful
in it unless you are a kernel hacker. If you really want this dump and
nothing else, write a 0
to rpc_debug
.)
The values for the various things you can get reports of are found in
the kernel source in include/linux/sunrpc/debug.h
(the RPCDBG_
#defines) and include/linux/nfs_fs.h
(the NFSDBG_
#defines).
You can use a suitably large value like 32767 to turn everything on.
Note that this can produce a lot of kernel messages very fast,
especially if you turn on lots of things. Also, one of the big reasons
this stuff is not documented is that it is primarily intended for kernel
hackers, so to understand the results you may need to go dig in the
kernel NFS and RPC code (in fs/nfs
and net/sunrpc
respectively).
(There are similar debug files for the NFS server and for the NLM. Exploring these is left as an exercise for the reader.)
|
|