Shooting myself in the foot by using exec in a shell script

November 24, 2017

Once upon a time, we needed a simple shell script that did some setup stuff and then ran a program that did all of the work; this shell script was going to run from crontab once a day. Because I'm sometimes very me, I bummed the shell script like so:


exec realprog --option some-args

Why not use exec? After all, it's one less process, even if it doesn't really matter.

Later, we needed to run the program a second time with some different arguments, because we'd added some more things for it to process. So we updated the script:


exec realprog --option some-args
exec realprog --switch other-args

Yeah, there's a little issue here. It probably looks obvious written up here, but I was staring at this script just last week, knowing that for some reason it hadn't generated the relatively infrequent bit of email we'd expected, and I didn't see it then; my eyes and my mind skipped right over the implications of the repeated exec. I only realized what I was seeing today, when I took another, more determined look and this time it worked.

(The implication is that anything after the first exec will never get run, because the first exec replaces the shell that's running the shell script with realprog.)

This was a self-inflicted injury. The script in no way needed the original exec; putting it in there anyway in the name of a trivial optimization led directly to the error in the updated script. If I hadn't tried to be clever, we'd have been better off.

(I'm not going to call this premature optimization, because it wasn't. I wasn't trying to optimize the script, not particularly; I was just being (too) clever and bumming my shell script for fun because I could.)

PS: In the usual way of things, I felt silly for not having spotted the repeat exec issue last week. But we see what we're looking for, and I think that last week my big question before I checked the script was 'have we updated the script to work on the other-args'. We had a line to run the program on them, so the answer was 'yes' and I stopped there. Today I knew things had once again not happened, so I asked myself 'what on earth could be stopping the second command from running?' and the light suddenly dawned.

Comments on this page:

Are you advocating to leave out such an exec as a general rule? I’d still want it in there if the process is expected to hang around for a long time, e.g. at the end of a runit run script. Making the shell go away a few milliseconds soon is pointless under most circumstances, but a few weeks is different, no?

By cks at 2017-11-24 08:38:48:

Whoops, I should have been clearer. I think that there are good uses of exec and they certainly include long running processes. My reason for this isn't necessarily getting rid of the extra process; instead it's because doing so makes it much easier to safely edit the shell script.

(Freedom to edit the script isn't a good reason in this case, because the script runs briefly once a day at a time when we aren't going to be editing it anyway.)

Ah! That’s not an issue I would ever have thought of – but now that you mention it, it makes complete sense. I forget shell isn’t like other scripting languages in that regard. I briefly considered not asking my question, since the answer seemed too obvious to bother – now I’m glad I did regardless. Thanks.

By cks at 2017-11-24 10:55:23:

Another time when exec is necessary is if it's necessary for the long running process to take over the process ID of the shell script. One common case is if it's running under some sort of process supervision system that will send signals to the 'main process' and so you need these signals to go to the program, not the shell script.

By S.A. at 2017-12-02 16:42:09:

So a bunch of posts ago you talked about using the Bash linter as a useful thing to do; out of curiosity why wasn't that a tool you habitually reached for after not being able to see the error quickly?

P.S. I'm not knocking you here, I habitually forget to use it, even after spending too much time debugging.

By cks at 2017-12-02 17:14:27:

The simple answer is that I didn't think the error was in the shell script at all; I thought it had to be somewhere in the situations where the shell script was run, or some sort of temporary glitch or the like. Once I was willing to think that the shell script had some sort of error, I saw the double exec problem right away. Since my first look at the script stopped at 'yeah, the script runs this, clearly it's not the problem', I didn't think to run shellcheck against it; in my mind it was obviously correct.

This is of course a great example of how we see what we expect to see when we're looking for bugs (or anything), not necessarily what is really there.

(So, you might ask, why not just run shellcheck against every shell script I look at, just in case? Unfortunately, I assume that shellcheck is going to complain about things in almost all of our pre-shellcheck shell scripts, because they're just full of things like unquoted variable use. Updating them to pass shellcheck is not a priority and not something I want to get diverted into when I'm looking for why something isn't working.)

Written on 24 November 2017.
« Understanding a tricky case of Bourne shell redirection and command parsing
Sequential scrubs and resilvers are coming for (open-source) ZFS »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Nov 24 00:24:13 2017
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.