Using SystemTap to trace the system calls of setuid programs on Linux

July 31, 2009

Suppose that you have a setuid program that is failing mysteriously and you want to see what it's doing. With normal programs you can use strace, but not even root can strace a setuid program (if you try, the program runs non-setuid).

(Yes, strace has the -u option, but it doesn't help if the setuid program is being run as part of a whole chain of processes in a specific environment and you can't just run it directly. It would be nice if root could use 'strace -f ...' for this, but alas it doesn't work.)

On a Solaris system you could use DTrace for this. SystemTap is the rough Linux equivalent and, although much less polished and not as well documented, it does work. Here is the crude SystemTap script that I used:

probe syscall.* {
  en = execname();
  ui = uid();
  eui = euid();
  if (en == "<redacted>") {
    printf("%s(%d): %s(%s)", en, pid(), name, argstr);
    if (ui != eui) {
      printf(" as %d/%d ", ui, eui);
    } else {
      printf(" as %d ", ui);

probe syscall.*.return {
  en = execname();
  if (en == "<redacted>") {
    printf("= %s\n", retstr);

This produces output with system call arguments and return values helpfully decoded for you; it looks like:

<redacted>(14087): open("/etc/passwd", O_RDONLY) as 2315/0 = 3
<redacted>(14087): close(1) as 2315/0 = -9 (EBADF)

(In some ways this is nicer than DTrace. But the lack of documentation on what sort of information you can get about system calls and so on really hurts; I had to read the source for the syscall tapset in order to find out about name, argstr, retstr, and so on.)

Note that, despite the presence of the PID in the output, this isn't really useful for tracing if more than one instance of the program is running at once. That would take more SystemTap magic than I know so far (or worse output and some postprocessing). Also, since stap is kind of slow you'll want to run it with the -v flag so that you know when it's actually finished checking, compiling, and enabling your tracing.

One of the things that the documentation isn't very clear about is that the execname() function returns the bare command name of the current process and not its full path. (There is probably a way to extract the full path if you need it. I didn't, so I didn't go digging.)

All in all, I would have to score my first real exposure to SystemTap as a reasonably pleasant experience. Although there were a bunch of frustrating bits, it did work, it gave me what I wanted to know, and it wasn't particularly difficult to do or to work out how to do it (and it didn't take particularly long).

Comments on this page:

From at 2009-08-05 15:55:39:

Hi, Chris, thanks for trying systemtap.

The "stapprobes" man page includes some documentation on the syscall.* probes, including the argstr/returnstr values.

To do multi-process tracing look nice, try:

global calldata
probe syscall.* { 
    if (...) {
       calldata[tid()] = sprintf(...) }
       /* instead of printf */
probe syscall.*.return {
    if (...) {
       printf("%s = %s\n", calldata[tid()], returnstr)
       delete calldata[tid()] }
Written on 31 July 2009.
« How fast various ssh ciphers are
What you can't do before you drop setuid permissions »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Jul 31 22:26:55 2009
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.