2012-04-22
I may be wrong about my simple answer being the right one
In a recent entry I wrote about how I had misinterpreted an error message from bash about a script failing, and I also mentioned in passing that if I had paid attention to the structure of the error message I would have known that I was wrong. I take that back. Detailed investigation has now left me more confused than I was before and less confidant of what exactly my co-worker's problem was (and absolutely sure that paying attention to the structure of the error message does not really help). The problem is related to bash being too smart for its own good in error messages; because of bash's smartness but not huge smartness, we cannot tell what the actual error is.
As a reminder, here's bash's error message:
bash: /a/local/script: /bin/sh: bad interpreter: No such file or directory
You would think that this means that /bin/sh
is not present; after
all, it is the straightforward interpretation of the error, plus bash
has actually gone out of its way to
give you a more detailed error message. Unfortunately, that is the wrong
interpretation of the error message. What bash is really reporting is two
separate facts:
- /bin/sh is the listed interpreter for /a/local/script
- when bash attempted to
exec()
the script, the kernel told itENOENT
, 'No such file or directory'.
Bash does not mean that /bin/sh is missing; it never bothers to
check that (and arguably it can't do so reliably). This matters because
as we saw in my previous entry, the
kernel will also report ENOENT
if the ELF interpreter for a binary is
missing. Now, you guessed it, if your script has a #!
line that points
to a binary which has a missing ELF interpreter:
bash: /tmp/exmpl: /tmp/a.out: bad interpreter: No such file or directory
(/tmp/a.out
exists and is nominally executable, but I binary edited it
to have a nonexistent ELF interpreter.)
So in my co-worker's case, we can't definitively conclude that /bin/sh
was temporarily missing. All we know is that for some reason the
exec()
returned ENOENT
, and that there are at least two potential
reasons for it. A /bin/sh
symlink being missing is still probably the
most likely explanation, but on a system that's under unusual stresses
things start getting rather uncertain here.
(I am far from certain that I could predict all of the reasons that the
Linux kernel would return ENOENT
on exec()
without actually tracing
the kernel code. And even then I'm not sure, since there's a lot of deep
bits involved and thus a lot of code to really understand.)