Today I learned that you want to use strace -fp
There I was, with an Ubuntu 16.04 system where rsyslogd seemed to have stopped writing to an important syslog file for no clear reason. When in doubt (or in a hurry, or both), I reach for a big hammer:
# strace -p $(pidof rsyslogd) strace: Process 21950 attached select(1, NULL, NULL, NULL, {342, 910468}
And there rsyslogd sat even while I sent in syslog messages, not
coming out of that select()
. Since this was a 16.04 machine,
clearly this was the perfidious work of systemd and its journal,
right? Especially since I could see the syslog messages in 'journalctl
-f
'. I even wrote some angry tweets to that effect.
(People reading this may spot an obvious clue here that I missed at the time when I was rushing around and feeling under pressure, angry, and a bit panicked.)
Well, I was wrong. I was being
fooled by a dubious strace
default, namely that if you use
strace -p PID
on a multi-threaded program, it silently traces
only one thread. How do you know whether or not a program is a
multi-threaded one? Well, strace
certainly won't tell you here.
One way is to look in /proc/PID/task
to see if there's more than
one thing there.
What you really want is 'strace -f -p PID
'. As the manual page
says:
[...] Note that
-p PID -f
will attach to all threads of process PID if it is multi-threaded, not only the thread with thread_id = PID.
Of course -f
has other effects that you may not want if the process
forks and execs other programs (or simply forks independent children),
but hopefully you know this in advance. Also, hopefully threaded
programs that fork and exec are not all that common. If you absolutely
have to avoid using -f
due to its other effects, I believe that
you can strace -p
each thread separately using the PIDs you'll
find in /proc/PID/task
. This isn't going to work if the process
as a whole starts new threads, though, since you'll have to manually
start strace
'ing those too.
So that's the story of how I learned that I probably now want to
use strace's -f
argument along with -p
, unless I have a strong
reason not to. These days you can't necessarily predict what's going
to be multi-threaded, and probably more and more things are going
to be so.
PS: The clue in strace
's output that I missed is that the select()
is not actually paying attention to any file descriptors, and so
is being done just for the timeout. Unless rsyslogd was terribly
badly broken and outright not listening for incoming syslog messages,
there had to be something else in a select()
or read()
or the
like on rsyslogd's server socket. Ie, there had to be another thread.
But I didn't pay attention to the actual select()
arguments because
I didn't expect them to be a clue to what was going on; I didn't
think rsyslogd would be that broken and the failure I was interested
in was 'why aren't messages getting written to this configured
file'.
(There's a lesson here about debugging.)
|
|