My twitch about adding a shim in front of a (shell script) interpreter
In a comment on my entry on finding people's use of /usr/bin/python, including in '#!' lines in scripts, Alex Shpilkin asked a good question:
As far as I can see, your solution cannot look backwards in time, but in that case, is there any reason not to replace the /usr/bin/python symlink by a small program that logs whatever details you want then execs /usr/bin/python2? [...]
One answer is that while I'm generally reasonably comfortable to put such a shim in front of ordinary programs, I'm relatively twitchy about doing that to something that's the target of '#!' lines in scripts. I don't know if things would go wrong in practice, but I can imagine a number of things that could go badly in theory.
To start with, you'd want to be very careful that the shim program
didn't change the environment the script and its interpreter will
be running in. You don't want to add, remove, or change environment
variables, or accidentally leak file descriptors from your logging
to the interpreter, or really anything else that might be observable.
Hopefully you can declare it out of scope to run your shim with
environment variables that change the behavior of the early runtime
environment (eg various '
LD_' environment variables), because
making that not affect your shim but pass through to the real
interpreter is going to be fun. And obviously you'd want to make it
so that no matter what went wrong in your attempt to do logging,
your shim went on to exec the real interpreter.
A subtle issue is that from the perspective of the program running the script, its exec succeeded the moment your shim got loaded, even if your shim then can't exec the real interpreter (for example, because trying to load it exceeds some resource limit). If the program would have done something different when the exec of the script failed, well, it's too late. This is probably not too likely, though; an exec usually doesn't fail if the program is there.
More generally, I don't believe that Unixes (ie, kernels) guarantee that the kernel's internal exec of a '#!' interpreter is exactly the same as a user level exec() with the same command line environments. I can imagine a security sensitive Unix applying special marking to such (direct) interpreter processes, and I can also imagine a kernel passing additional information to a (direct) interpreter process through mechanisms such as Linux's auxiliary vector (also).
(In fact in a quick check, on Linux such a shim program causes
AT_EXECFN to change. In the normal case it's the filename
of the '#!' script, but in the shim case it's the filename of
the real interpreter, such as /usr/bin/python2. Whether or not
your interpreter cares about this may depend.)
Given all of this, tracing execs from outside is safer and in theory easier (assuming that whatever method of tracing you use can be made to give you this information). It's exactly and precisely a real exec of a '#!' script's interpreter; you're merely arranging to log additional information extracted from the system through a side channel.