I may be wrong about my simple answer being the right one

April 22, 2012

In a recent entry I wrote about how I had misinterpreted an error message from bash about a script failing, and I also mentioned in passing that if I had paid attention to the structure of the error message I would have known that I was wrong. I take that back. Detailed investigation has now left me more confused than I was before and less confidant of what exactly my co-worker's problem was (and absolutely sure that paying attention to the structure of the error message does not really help). The problem is related to bash being too smart for its own good in error messages; because of bash's smartness but not huge smartness, we cannot tell what the actual error is.

As a reminder, here's bash's error message:

bash: /a/local/script: /bin/sh: bad interpreter: No such file or directory

You would think that this means that /bin/sh is not present; after all, it is the straightforward interpretation of the error, plus bash has actually gone out of its way to give you a more detailed error message. Unfortunately, that is the wrong interpretation of the error message. What bash is really reporting is two separate facts:

  • /bin/sh is the listed interpreter for /a/local/script
  • when bash attempted to exec() the script, the kernel told it ENOENT, 'No such file or directory'.

Bash does not mean that /bin/sh is missing; it never bothers to check that (and arguably it can't do so reliably). This matters because as we saw in my previous entry, the kernel will also report ENOENT if the ELF interpreter for a binary is missing. Now, you guessed it, if your script has a #! line that points to a binary which has a missing ELF interpreter:

bash: /tmp/exmpl: /tmp/a.out: bad interpreter: No such file or directory

(/tmp/a.out exists and is nominally executable, but I binary edited it to have a nonexistent ELF interpreter.)

So in my co-worker's case, we can't definitively conclude that /bin/sh was temporarily missing. All we know is that for some reason the exec() returned ENOENT, and that there are at least two potential reasons for it. A /bin/sh symlink being missing is still probably the most likely explanation, but on a system that's under unusual stresses things start getting rather uncertain here.

(I am far from certain that I could predict all of the reasons that the Linux kernel would return ENOENT on exec() without actually tracing the kernel code. And even then I'm not sure, since there's a lot of deep bits involved and thus a lot of code to really understand.)


Comments on this page:

From 84.190.60.20 at 2012-04-22 09:33:06:

One common problem I found at my previous job where we were doing web hosting - people would occasionally edit or download scripts from Windows machines which would leave the Windows line-endings in place, even on the shebang line. This has the effect of having an additional carriage return on the end of the interpreter description (occasionally represented as #!/bin/sh^M or #!/usr/bin/perl^M) but normally (at least if memory serves) you wouldn't see it in the output - the execution would just fail.

So that's at least one other plausible cause - although if it was only for a brief period it's not as likely.

- Oliver

Written on 22 April 2012.
« Bash's superintelligent errors about exec failures
My perspective on the 'Bring Your Own Device' controversy »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Apr 22 03:12:22 2012
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.