2018-02-21
Sorting out what exec
does in Bourne shell pipelines
Today, I was revising a Bourne shell script. The original shell script
ended by running rsync
with an exec
like this:
exec rsync ...
(I don't think the exec
was there for any good reason; it's a
reflex.)
I was adding some filtering of errors from rsync
, so I fed its
standard error to egrep
and in the process I removed the exec
,
so it became:
rsync ... 2>&1 | egrep -v '^(...|...)'
Then I stopped to think about this, and realized that I was working
on superstition. I 'knew' that
combining exec
and anything else didn't work, and in fact I had
a memory that it caused things to malfunction. So I decided to
investigate a bit to find out the truth.
To start with, let's talk about what we could think that exec
did
here (and what I hoped it did when I started digging). Suppose that
you end a shell script like this:
#!/bin/sh [...] rsync ... 2>&1 | egrep -v '...'
When you run this shell script, you'll wind up with a hierarchy of
three processes; the shell is the parent process, and then generally
the rsync
and the egrep
are siblings. Linux's pstree
will
represent this as 'sh───2*[sleep]
', and my favorite tool shows it like so:
pts/10 | 17346 /bin/sh thescript pts/10 | 17347 rsync ... pts/10 | 17348 egrep ...
If exec
worked here the way I was sort of hoping it would, you'd
get two processes instead of three, with whatever you exec
'd
(either the rsync
or the egrep
) taking over from the parent
shell process. Now that I think about it, there are some reasonably
decent reasons to not do this, but let's set that aside for now.
What I had a vague superstition of exec
doing in a pipeline was
that it might abruptly truncate the pipeline. When it go to the
exec
the shell just did what you told it to, ie exec
the process,
and since it had turned itself into a process it didn't go on to
set up the rest of the pipeline. That would make 'exec rsync
... | egrep
' be the same as just 'exec rsync ...
', with the
egrep
effectively ignored. Obviously you wouldn't want that,
hence me automatically taking the exec
out.
Fortunately this is not what happens. What actually does happen is
not quite that the exec
is ignored, although that's what it looks
like in simple cases. To understand what's going on, I had to start
by paying careful attention to how exec
is described, for example
in Dash's manpage:
Unless command is omitted, the shell process is replaced with the specified program [...]
I have emphasized the important bit. The magic trick is what 'the shell process' is in a pipeline. If we write:
exec rsync ... | egrep -v ...
When the shell gets to processing the exec
, what it considers
'the shell process' is actually the subshell running one step of
the pipeline, here the subshell that exists to run rsync
. This
subshell is normally invisible here because for simple commands
like this, the (sub)shell will immediately exec()
rsync
anyway;
using exec
just instructs this subshell to do what it was already
going to do.
We can cause the shell to actually materialize a subshell by putting multiple commands here:
(/bin/echo hi; sleep 120) | cat
If you look at the process tree for this, you'll probably get:
pts/9 | 7481 sh pts/9 | 7806 sh pts/9 | 7808 sleep 120 pts/9 | 7807 cat
The subshell making up the first step of the pipeline could end by
just exec()
ing sleep
, but it doesn't (at least in Dash and
Bash); once the shell has decided to have a real subshell here, it
stays a real subshell.
If you use exec
in the context of such an actual subshell, it
will indeed replace 'the shell process' of the subshell with the
command you exec
:
$ (exec echo hi; echo ho) | cat hi $
The exec
replaced the entire subshell with the first echo
, and
so it never went on to run the second echo
.
(Effectively you've arranged for an early termination of the subshell.
There are probably times when this is useful behavior as part of a
pipeline step, but I think you can generally use exit
and what you're
actually doing will be clearer.)
(I'm sure that I once knew all of this, but it fell out of my mind until I carefully worked it out again just now. Perhaps this time around it will stick.)
Sidebar: some of this behavior can vary by shell
Let's go back to '(/bin/echo hi; sleep 120) | cat
'. In Dash
and Bash, the first step's subshell sticks around to be the parent
process of sleep
, as mentioned. Somewhat to my surprise, both the
Fedora Linux version of official ksh93
and FreeBSD 10.4's sh
do
optimize away the subshell in this situation. They directly exec
the sleep
, as if you wrote:
(/bin/echo hi; exec sleep 120) | cat
There's probably a reason that Bash skips this little optimization.