2012-04-21
Bash's superintelligent errors about exec failures
Let's take a closer look at bash's error message from yesterday, because if you pay close attention something really interesting is going on. Here's the error message again:
bash: /tmp/exmpl: /bin/shx: bad interpreter: No such file or directory
On first blush, if you don't think about it too much, this looks
perfectly sensible: bash is reporting that when it tried to exec()
the /tmp/exmpl, the kernel told it that there was a problem with the
script's interpreter.
But, wait. When the exec() fails, all that kernel can tell bash is the
errno number. In this case the kernel returns ENOENT, which creates
the 'No such file or directory' portion of the error message. So how
does bash know that the reason that trying to run /tmp/exmpl failed is
because /bin/shx doesn't exist?
Here, have another example error message from bash:
bash: /tmp/a.out: /lib64/ld-ZZZZZ-x86-64.so.2: bad ELF interpreter: No such file or directory
(I did some binary editing to create that failure.)
That's right. When an exec() fails, bash opens the executable and
parses it to try to identify what went wrong. It recognizes shell
scripts, which is easy, but it also parses ELF binaries to find things
like the name of the ELF interpreter, so it can check that. Let me say
that again: bash knows how to parse ELF binaries so that it can give
you good error messages. I must applaud bash's attempt to be almost
as user friendly as possible, but at the same time I think it went more
than a little bit overboard.
(If you trace the system calls it's using, you can clearly see it selectively reading several bits of the ELF binary.)
If you try these same things in other, simpler shells, they will simply
report something like '/tmp/exmpl: No such file or directory', ie
they are simply doing a straightforward translation of what the kernel
told them (even if it is a rather puzzling message).
(zsh will report the interpreter problem with a shell script but not with a binary, which strikes me as a reasonable amount for a complex shell to do. Checking for this error with shell scripts is not that difficult and it does happen periodically.)
2012-04-18
Why you should never use file (or libmagic) to identify files
Every so often, someone needs their program to figure out what sort of
thing a file is; is it text, or HTML, or a JPEG image, or Postscript, or
whatever? When this happens it must be very tempting to use the file
program to classify things, especially since some versions of file will
give you a MIME type for the file (instead of just a text label).
Here, presented in the traditional illustrated form, is why you do not want to do this:
; file example example: Netpbm PGM image text ; cat example P238: An introduction Lorem ipsum dolor sit amet.
File is exceedingly generous with classifications. It does not
verify that your target file contains anything like a valid instance of
the file type; instead, it checks for signatures. Over time, lots of
people have added lots of signatures for lots of file formats. A certain
number of these signatures are very minimal and so will match lots of
things. This creates misclassifications where unknown file formats and
plain data can match a minimal signature if things are just right (or
just wrong, from some perspectives).
People and programs who use file to identify and classify files are
operating under a mistaken impression of what it really says. File does
not say 'this is definitely a <whatever>'; instead it merely says 'this
kind of looks like a <whatever> to me'. The difference is important.
Some of you might think that this is theoretical and will never come up in real life. I regret to inform you that our CUPS print system just did this to someone, causing their plain text files to get fed to an image converter (which choked, meaning no printouts for this person).
(CUPS is probably not literally running file, but these days file is
just a wrapper around the libmagic shared library. Which exists so that
people can use it for exactly this purpose, sadly.)
Note that this is not merely a Linux issue. The version of file on,
eg, a not all that current FreeBSD machine will also misidentify this
plaintext file as a Netpbm PGM image.