Wandering Thoughts archives

2016-09-20

Today I learned that you want to use strace -fp

There I was, with an Ubuntu 16.04 system where rsyslogd seemed to have stopped writing to an important syslog file for no clear reason. When in doubt (or in a hurry, or both), I reach for a big hammer:

# strace -p $(pidof rsyslogd)
strace: Process 21950 attached
select(1, NULL, NULL, NULL, {342, 910468}

And there rsyslogd sat even while I sent in syslog messages, not coming out of that select(). Since this was a 16.04 machine, clearly this was the perfidious work of systemd and its journal, right? Especially since I could see the syslog messages in 'journalctl -f'. I even wrote some angry tweets to that effect.

(People reading this may spot an obvious clue here that I missed at the time when I was rushing around and feeling under pressure, angry, and a bit panicked.)

Well, I was wrong. I was being fooled by a dubious strace default, namely that if you use strace -p PID on a multi-threaded program, it silently traces only one thread. How do you know whether or not a program is a multi-threaded one? Well, strace certainly won't tell you here. One way is to look in /proc/PID/task to see if there's more than one thing there.

What you really want is 'strace -f -p PID'. As the manual page says:

[...] Note that -p PID -f will attach to all threads of process PID if it is multi-threaded, not only the thread with thread_id = PID.

Of course -f has other effects that you may not want if the process forks and execs other programs (or simply forks independent children), but hopefully you know this in advance. Also, hopefully threaded programs that fork and exec are not all that common. If you absolutely have to avoid using -f due to its other effects, I believe that you can strace -p each thread separately using the PIDs you'll find in /proc/PID/task. This isn't going to work if the process as a whole starts new threads, though, since you'll have to manually start strace'ing those too.

So that's the story of how I learned that I probably now want to use strace's -f argument along with -p, unless I have a strong reason not to. These days you can't necessarily predict what's going to be multi-threaded, and probably more and more things are going to be so.

PS: The clue in strace's output that I missed is that the select() is not actually paying attention to any file descriptors, and so is being done just for the timeout. Unless rsyslogd was terribly badly broken and outright not listening for incoming syslog messages, there had to be something else in a select() or read() or the like on rsyslogd's server socket. Ie, there had to be another thread. But I didn't pay attention to the actual select() arguments because I didn't expect them to be a clue to what was going on; I didn't think rsyslogd would be that broken and the failure I was interested in was 'why aren't messages getting written to this configured file'.

(There's a lesson here about debugging.)

linux/StraceUseFWithP written at 23:56:57; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.