A really annoying gap in system observability

March 31, 2011

The other day I had a problem; new gnome-session processes weren't working right on one of our major login servers. They didn't quite hang outright; instead, strace showed that they seemed to spend all of their time talking very slowly to one file descriptor instead of responding to other Gnome processes that were trying to talk to them.

(When this happens, the other Gnome processes are not very happy and your entire Gnome session basically hangs.)

This left me with a big question: what was on the other end of that file descriptor?

Answering this question turned out to be absurdly difficult, and that is the problem. At least theoretically, 'observability' of systems is one of the next big things; everyone is burning with enthusiasm for tools like Solaris's DTrace and Linux's SystemTap. Yet vendors (and Linux people) have almost completely neglected basic observability tools for tasks like simply seeing what processes are connected to.

On Linux, lsof can be said to be officially supported and it was able to tell me that the particular file descriptor was a Unix domain socket; however it couldn't tell me what the other end was connected to, and I'm not sure that that information is exported by the kernel. On other Unixes like Solaris, lsof isn't even officially supported by the vendor; to the extent that it works (it's often incomplete), it works only because people have put heroic amounts of effort into reverse engineering portions of the Solaris kernel and obtaining information by force and trickery.

Frankly, this is absurd. Tools like lsof and lslk have been a vital part of the sysadmin arsenal for more than fifteen years. Yet it's still the case that no one (or at least almost no one) officially supports them and makes sure that they can get the complete information that sysadmins need, or even makes sure that sysadmins can get the same information through other tools.

In 2011, in the era of observability as a big thing, it should be trivial for sysadmins to find out information like 'what is this process talking to' or 'what is using resource X', or even 'who is using what resources of type Y'. That it is not says sad things about vendors (and open source developers).

Sidebar: how I answered my question

I used brute force. I hacked my own X environment to strace my gnome-session from the moment it was started; this let me see the moment when the relevant file descriptor was created and connect()'d (it turned out to be talking to the system DBus daemon). However, this workaround was only possible because I could start the program on demand and it hung reliably; had I been dealing with a long-running daemon that was malfunctioning like this, I would have been out of luck.

PS: it turns out that you do not want to restart the system DBus daemon out from underneath gnome-session. If you do, all existing gnome-session processes immediately exit, taking every user's session with them.

Comments on this page:

From at 2011-03-31 08:33:08:

netstat almost does it.

Sockets created at the same time appear together, with consecutive values in the I-Node column. (And socket()/connect()/listen() is usually near-instantaneous). So you could probably have guessed it by looking at netstat.

The other approach would be to gank gdb on it and run getpeername() on the socket. Totally possible (no source or debuginfo required), but not exactly user-friendly or unintrusive.

From at 2011-03-31 12:16:39:

All of the information you're looking for is readily available in /proc.

Let's assume, for example, that "atop" is hanging and my strace hackery says it's spinning on fd 3. Track it down thusly:

# pgrep atop

# ls -l /proc/27627/fd/
total 0
lrwx------ 1 root root 64 2011-03-31 11:13 0 -> /dev/null
lrwx------ 1 root root 64 2011-03-31 11:13 1 -> /dev/null
lrwx------ 1 root root 64 2011-03-31 11:13 2 -> /dev/null
lr-x------ 1 root root 64 2011-03-31 11:13 3 -> /var/log/account/pacct
l-wx------ 1 root root 64 2011-03-31 11:13 4 -> /var/log/atop.log

Bingo! fd 3 is /var/log/account/pacct.

/proc is a wonderous treasure trove of information.

From at 2011-03-31 12:24:05:

D'oh. Forgot you were talking about sockets.

By cks at 2011-03-31 13:44:18:

File descriptor information being available in /proc is a relatively Linux specific thing, and even the information Linux makes available is incomplete. Plain files are the relatively simple case.

From at 2011-04-02 07:40:11:

It turns out that someone wrote a systemtap script that's like lsof on fire. It really just digs through kernel structures, and is not systemtap event-handling-like in its behaviour. See:

# stap .../examples/process/pfiles.stp -x 2808
 2808: ssh
 Current rlimit: 64 file descriptors
  0: S_IFCHR mode:0620 dev:0,10 ino:3 uid:500 gid:100 rdev:136,0
  1: S_IFCHR mode:0620 dev:0,10 ino:3 uid:500 gid:100 rdev:136,0
  3: S_IFSOCK mode:0777 dev:0,6 ino:55059 uid:500 gid:100 rdev:0,0
       sockname: AF_INET  port: 54499
       peername: AF_INET  port: 22
  4: S_IFSOCK mode:0777 dev:0,6 ino:55080 uid:500 gid:100 rdev:0,0
       sockname: AF_UNIX
       peername: AF_UNIX /tmp/.X11-unix/X0
       peercred pid: 1974
By trs80 at 2011-04-02 10:12:51:

Solaris has pfiles which is the supported lsof equivalent. Just like prstat is their answer to top.

From at 2011-04-06 08:35:04:

Here's a sad blog post from 2004, in which a Sun engineer asks how to best replicate lsof functionality in Solaris - why not just bundle and support lsof? NIH? (Could be licensing reasons, I guess.)

Written on 31 March 2011.
« A realization about code complexity and clarity
A slightly unobvious trap with 'from module import *' »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Mar 31 00:27:11 2011
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.