Wandering Thoughts archives

2014-09-17

In praise of Solaris's pfiles command

I'm sure that at one point I was introduced to pfiles through a description that called it the Solaris version of lsof for a single process. This is true as far as it goes and I'm certain that I used pfiles as nothing more than this for a long time, but it understates what pfiles can do for you. This is because pfiles will give you a fair amount more information than lsof will, and much of that information is useful stuff to know.

Like lsof, pfiles will generally report what a file descriptor maps to (file, device, network connection, and Solaris IPC 'doors', often with information about what process is on the other end of the door). Unlike on some systems, the pfiles information is good enough to let you track down who is on the other end of Unix domain sockets and pipes. Sockets endpoints are usually reported directly; pipe information generally takes cross-correlating with other processes to see who else has an S_IFIFO with the specific ino open.

(You would think that getting information on the destination of Unix domain sockets would be basic information, but on some systems it can take terrible hacks.)

Pfiles will also report some socket state information for sockets, like the socket flags and the send and receive buffers. Personally I don't find this deeply useful and I wish that pfiles also showed things like the TCP window and ACK state. Fortunately you can get this protocol information with 'netstat -f inet -P tcp' or 'netstat -v -f inet -P tcp' (if you want lots of details).

Going beyond this lsof-like information, pfiles will also report various fcntl() and open() flags for the file descriptor. This will give you basic information like the FD's read/write status, but it goes beyond this; for example, you can immediately see whether or not a process has its sockets open in non-blocking mode (which can be important). This is often stuff that is not reported by other tools and having it handy can save you from needing deep dives with DTrace, a debugger, or the program source code.

(I'm sensitive to several of these issues because my recent Amanda troubleshooting left me needing to chart out the flow of pipes and to know whether some sockets were nonblocking or not. I could also have done with information on TCP window sizes at the time, but I didn't find the netstat stuff until just now. That's how it goes sometimes.)

PfilesPraise written at 02:03:57; Add Comment

2014-09-05

A DTrace script to help figure out what process IO is slow

I recently made public a dtrace script I wrote, which gives you per file descriptor IO breakdowns for a particular process. I think it's both an interesting, useful tool and probably not quite the right approach to diagnose this sort of problem, so I want to talk about both the problem and what it tells you. To start with, the problem.

Suppose, not entirely hypothetically, that you have a relatively complex multi-process setup with data flowing between the various processes and the whole thing is (too) slow. Somewhere in the whole assemblage is a bottleneck. Basic monitoring tools for things like disk IO and network bandwidth will give you aggregate status over the entire assemblage, but they can only point out the obvious bottlenecks (total disk IO, total network bandwidth, etc). What we'd like to do here is peer inside the multi-process assemblage to see which data flows are fast and which are slow. This per-data-flow breakdown is why the script shows IO on a per file descriptor basis.

What the DTrace script's output looks like is this:

s fd   7w:   10 MB/s  waiting ms: 241 / 1000   ( 10 KB avg *   955)
p fd   8r:   10 MB/s  waiting ms:  39 / 1000   ( 10 KB avg *   955)
s fd  11w:    0 MB/s  waiting ms:   0 / 1000   (  5 KB avg *     2)
p fd  17r:    0 MB/s  waiting ms:   0 / 1000   (  5 KB avg *     2)
s fd  19w:   12 MB/s  waiting ms: 354 / 1000   ( 10 KB avg *  1206)
p fd  21r:   12 MB/s  waiting ms:  43 / 1000   ( 10 KB avg *  1206)
  fd 999r:   22 MB/s  waiting ms:  83 / 1000   ( 10 KB avg *  2164)
  fd 999w:   22 MB/s  waiting ms: 595 / 1000   ( 10 KB avg *  2164)
IO waits:  read:  83 ms  write: 595 ms  total: 679 ms 

(This is a per-second figure averaged over ten seconds and file descriptor 999 is for the total read and write activity. pfiles can be used to tell what each file descriptor is connected to if you don't already know.)

Right away we can tell a fair amount about what this process is doing; it's clearly copying two streams of data from inputs to outputs (with a third one not doing much). It's also spending much more of its IO wait time writing the data rather than waiting for there to be more input, although the picture here is misleading because it's also making pollsys() calls and I wasn't tracking the time spent waiting in those (or the time spent in other syscalls).

(The limited measurement is partly an artifact of what I needed to diagnose our problem.)

What I'm not sure about this DTrace script is if it's the most useful and informative way to peer into this problem. Its output points straight to network writes being the bottleneck (for reasons that I don't know) but that discovery seems indirect and kind of happenstance, visible only because I decided to track how long IO on each file descriptor took. In particular it feels like there are things I ought to be measuring here that would give me more useful and pointed information, but I can't think of what else to measure. It's as if I'm not asking quite the right questions.

(I've looked at Brendan Gregg's Off-CPU Analysis; an off-cpu flamegraph analysis actually kind of pointed in the direction of network writes too, but it was hard to interpret and get too much from. Wanting some degree of confirmation and visibility into this led me to write fdrwmon.d.)

DTraceFDIOVolScript written at 23:34:57; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.