Sorting out what exec does in Bourne shell pipelines

February 21, 2018

Today, I was revising a Bourne shell script. The original shell script ended by running rsync with an exec like this:

exec rsync ...

(I don't think the exec was there for any good reason; it's a reflex.)

I was adding some filtering of errors from rsync, so I fed its standard error to egrep and in the process I removed the exec, so it became:

rsync ... 2>&1 | egrep -v '^(...|...)'

Then I stopped to think about this, and realized that I was working on superstition. I 'knew' that combining exec and anything else didn't work, and in fact I had a memory that it caused things to malfunction. So I decided to investigate a bit to find out the truth.

To start with, let's talk about what we could think that exec did here (and what I hoped it did when I started digging). Suppose that you end a shell script like this:

#!/bin/sh
[...]
rsync ... 2>&1 | egrep -v '...'

When you run this shell script, you'll wind up with a hierarchy of three processes; the shell is the parent process, and then generally the rsync and the egrep are siblings. Linux's pstree will represent this as 'sh───2*[sleep]', and my favorite tool shows it like so:

pts/10   |      17346 /bin/sh thescript
pts/10    |     17347 rsync ...
pts/10    |     17348 egrep ...

If exec worked here the way I was sort of hoping it would, you'd get two processes instead of three, with whatever you exec'd (either the rsync or the egrep) taking over from the parent shell process. Now that I think about it, there are some reasonably decent reasons to not do this, but let's set that aside for now.

What I had a vague superstition of exec doing in a pipeline was that it might abruptly truncate the pipeline. When it go to the exec the shell just did what you told it to, ie exec the process, and since it had turned itself into a process it didn't go on to set up the rest of the pipeline. That would make 'exec rsync ... | egrep' be the same as just 'exec rsync ...', with the egrep effectively ignored. Obviously you wouldn't want that, hence me automatically taking the exec out.

Fortunately this is not what happens. What actually does happen is not quite that the exec is ignored, although that's what it looks like in simple cases. To understand what's going on, I had to start by paying careful attention to how exec is described, for example in Dash's manpage:

Unless command is omitted, the shell process is replaced with the specified program [...]

I have emphasized the important bit. The magic trick is what 'the shell process' is in a pipeline. If we write:

exec rsync ... | egrep -v ...

When the shell gets to processing the exec, what it considers 'the shell process' is actually the subshell running one step of the pipeline, here the subshell that exists to run rsync. This subshell is normally invisible here because for simple commands like this, the (sub)shell will immediately exec() rsync anyway; using exec just instructs this subshell to do what it was already going to do.

We can cause the shell to actually materialize a subshell by putting multiple commands here:

(/bin/echo hi; sleep 120) | cat

If you look at the process tree for this, you'll probably get:

pts/9    |      7481 sh
pts/9     |     7806 sh
pts/9      |    7808 sleep 120
pts/9     |     7807 cat

The subshell making up the first step of the pipeline could end by just exec()ing sleep, but it doesn't (at least in Dash and Bash); once the shell has decided to have a real subshell here, it stays a real subshell.

If you use exec in the context of such an actual subshell, it will indeed replace 'the shell process' of the subshell with the command you exec:

$ (exec echo hi; echo ho) | cat
hi
$

The exec replaced the entire subshell with the first echo, and so it never went on to run the second echo.

(Effectively you've arranged for an early termination of the subshell. There are probably times when this is useful behavior as part of a pipeline step, but I think you can generally use exit and what you're actually doing will be clearer.)

(I'm sure that I once knew all of this, but it fell out of my mind until I carefully worked it out again just now. Perhaps this time around it will stick.)

Sidebar: some of this behavior can vary by shell

Let's go back to '(/bin/echo hi; sleep 120) | cat'. In Dash and Bash, the first step's subshell sticks around to be the parent process of sleep, as mentioned. Somewhat to my surprise, both the Fedora Linux version of official ksh93 and FreeBSD 10.4's sh do optimize away the subshell in this situation. They directly exec the sleep, as if you wrote:

(/bin/echo hi; exec sleep 120) | cat

There's probably a reason that Bash skips this little optimization.

Written on 21 February 2018.
« How switching to uMatrix for JavaScript blocking has improved my web experience
Github and publishing Git repositories »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Feb 21 22:30:27 2018
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.