I find Systemtap oddly frustrating

May 30, 2013

I currently have a ZFS on Linux performance mystery with sequential NFS writes. One of the things that I want to do to diagnose it is to get a trace of NFS client activity so that I can see exactly what is slow and when. In theory I could reconstruct this from sufficient analysis of the TCP stream; in practice I couldn't make Wireshark do this with some brief poking and this seemed like a good time to learn Systemtap (after all, DTrace can definitely do this sort of stuff with effort).

The result has been surprisingly frustrating, especially when I compare it with my DTrace experience. Before DTrace fans start celebrating too much I think that one reason that DTrace was less frustrating for me is that it so obviously threw me to the wolves very rapidly. DTrace had only a massive manual and within a very short time of poking around with it was apparent that it had nothing to help with NFS activity tracing and I was going to have to read Solaris kernel source.

Systemtap has a lot more attempts at helpful documentation than DTrace does but so far none of them have been led me to solve my problem. I still keep reading, because how can I resist a beginner's guide? After all, I am a Systemtap beginner.

This feeds into the additional frustration that is tapsets. Tapsets are the rough equivalent of DTrace providers, except that DTrace providers are limited, hardcoded into the kernel, and documented. Systemtap tapsets can basically be programmed in Systemtap itself, building interesting advanced capabilities on top of basic ones, and you have the source code. The tantalizing source code that may be most of the documentation you have on what an interesting looking tapset might be able to do for you.

(Things provided by standard tapsets are documented here.)

There are other, lesser frustrations. I can boil them all down to Systemtap having a lot of nice features that it doesn't bother to carry all the way through (both in the core Systemtap and especially in tapsets). DTrace is limited in comparison but at least it's pretty honest about its limitations.

(All of this is a very personal reaction to Systemtap, born of the annoyance I'm currently feeling every time I try to spend more time on my NFS monitoring project. I'm sure that there are plenty of people who are very happy with SystemTap.)


Comments on this page:

From 64.235.151.250 at 2013-05-30 09:34:41:

I find myself similarly frustrated by Linux in general. Linux has a lot of neat features, but they tend to stop short of a nice usable system without too much tinkering.

- Josef "Jeff" Sipek

From 86.147.71.103 at 2013-06-04 03:57:46:

I do use DTrace and SystemTap (occasionally). While SystemTap in theory offers more, I find working with DTrace a much better experience. Especially if a system is under high load and I need to quickly query different metrics - it takes forever to wait for SystemTap for it to compile a probe, load a kernel module, etc. Something which takes a one liner in couple of seconds with DTrace suddenly takes 20s or more with SystemTap. Then I find it essentially unrealistic to be able to work with one-liners approach with SystemTap... Again, SystemTap is really good in theory but its usability leaves much to desire. And unfortunately it can still crash your OS...

btw: Solaris 11 do have NFS related providers

From 99.247.237.31 at 2013-06-28 06:54:24:

Hi, Chris. You're right, the systemtap tapsets are of an uneven quality and depth - what you see is what you get. It would help if you reported gaps, so we can work on them; plus, it being all script-based, you can write/extend your own.

Can you give an example of features not carried all the way through "in the core"?

By the way, here's a one-liner to measure nfs operation times:

  probe nfs.fop.return, nfs.aop.return
  {time=gettimeofday_ms()-@entry(gettimeofday_ms()); 
   times[name]<<<time}
  global times

Or for a different twist, use profiling/fntimes.stp from the distributed samples:

  # stap .../profiling/fntimes.stp 'nfs.fop'

- FChE

Written on 30 May 2013.
« How you should package local-use configuration files
Understanding the MongoDB code that people are laughing at »

Page tools: View Source, View Normal.
Search:
Login: Password:

Last modified: Thu May 30 00:39:43 2013
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.