My twitch about adding a shim in front of a (shell script) interpreter

January 19, 2023

In a comment on my entry on finding people's use of /usr/bin/python, including in '#!' lines in scripts, Alex Shpilkin asked a good question:

As far as I can see, your solution cannot look backwards in time, but in that case, is there any reason not to replace the /usr/bin/python symlink by a small program that logs whatever details you want then execs /usr/bin/python2? [...]

One answer is that while I'm generally reasonably comfortable to put such a shim in front of ordinary programs, I'm relatively twitchy about doing that to something that's the target of '#!' lines in scripts. I don't know if things would go wrong in practice, but I can imagine a number of things that could go badly in theory.

To start with, you'd want to be very careful that the shim program didn't change the environment the script and its interpreter will be running in. You don't want to add, remove, or change environment variables, or accidentally leak file descriptors from your logging to the interpreter, or really anything else that might be observable. Hopefully you can declare it out of scope to run your shim with environment variables that change the behavior of the early runtime environment (eg various 'LD_' environment variables), because making that not affect your shim but pass through to the real interpreter is going to be fun. And obviously you'd want to make it so that no matter what went wrong in your attempt to do logging, your shim went on to exec the real interpreter.

A subtle issue is that from the perspective of the program running the script, its exec succeeded the moment your shim got loaded, even if your shim then can't exec the real interpreter (for example, because trying to load it exceeds some resource limit). If the program would have done something different when the exec of the script failed, well, it's too late. This is probably not too likely, though; an exec usually doesn't fail if the program is there.

More generally, I don't believe that Unixes (ie, kernels) guarantee that the kernel's internal exec of a '#!' interpreter is exactly the same as a user level exec() with the same command line environments. I can imagine a security sensitive Unix applying special marking to such (direct) interpreter processes, and I can also imagine a kernel passing additional information to a (direct) interpreter process through mechanisms such as Linux's auxiliary vector (also).

(In fact in a quick check, on Linux such a shim program causes AT_EXECFN to change. In the normal case it's the filename of the '#!' script, but in the shim case it's the filename of the real interpreter, such as /usr/bin/python2. Whether or not your interpreter cares about this may depend.)

Given all of this, tracing execs from outside is safer and in theory easier (assuming that whatever method of tracing you use can be made to give you this information). It's exactly and precisely a real exec of a '#!' script's interpreter; you're merely arranging to log additional information extracted from the system through a side channel.

Comments on this page:

By John Wiersba at 2023-01-23 11:00:33:

Reminds me of leaky abstractions. I think you've covered a lot of the gotchas, but it's exactly this kind of thing (replacing an shebang interpreter with a shim) that makes (or at least would make) unix/linux an attractive environment to play in.

By Sam James at 2023-01-29 23:08:12:

This came up recently on the bug-bash mailing list wrt nixpkg's use of wrappers to hijack e.g. dejangu for test suites.


Written on 19 January 2023.
« Some things on SSDs and their support for explicitly discarding blocks
An instruction oddity in the ppc64 (PowerPC 64-bit) architecture »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Jan 19 23:30:16 2023
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.