2007-12-19
Why setuid scripts are fundamentally a bad idea
The real problem with setuid scripts on Unix is not that writing secure shell scripts is challenging and obscure, it is that they are fundamentally insecure because of how the kernel runs them. While the kernel runs programs by directly loading them into memory, it runs scripts by running the script's interpreter with the filename of the script, leaving it up to the interpreter to read and execute the script itself. As is normal on Unix, there is nothing that keeps what file the filename points to the same between these two steps.
In other words, there is no way to guarantee that what the interpreter reads is the same script that the kernel gave setuid permissions to; it might be some other script that an attacker put in place in the time between the kernel starting the (setuid) interpreter and the interpreter opening and reading the file.
Since this is a direct consequence of sensible and long-standing
decisions about how to run scripts, Unix can't work around the problem
in general without creating incompatibilities. Nor can the problem be
fixed in the interpreters alone by having them fstat()
the opened
script's file descriptor and refusing to work unless it has appropriate
privileges, because this breaks exec()
'ing scripts from a setuid
program.
The best solution would be for the kernel to directly pass the file
descriptor of the script that it already has to the interpreter. The
command line filename would remain, but in fd-aware interpreters would
only be used for $0
or the equivalent. However, this would require new
fd-aware interpreters, which would be specific to the Unix variant that
did this, and the demand for general setuid script support is low (to
put it one way).
How x86 Linux executes ELF programs
Yesterday I said that the kernel directly executes programs in place. Because I feel like walking through the details, here is what the kernel does to start ELF programs on x86 Linux; for simplicity, I'm going to talk about 32-bit programs.
- First, the kernel maps the program's text, data, and BSS into memory. Almost all
programs require these to be mapped at fixed addresses
starting from 0x08048000 (128 Mb) and going on up.
- if the program is dynamically linked, the kernel also maps
the dynamic linker's text, data, and BSS into memory. Dynamic
linkers are generally willing to be loaded anywhere in memory,
so they get wedged into the first spot the kernel considers
available.
(ELF executables specify the full path of their dynamic linker, which is confusingly called the 'ELF interpreter' in various places.)
- the kernel sticks an 'auxiliary table' of various information on the top of the stack.
- the environment and the arguments are copied into the stack.
If the program is statically linked, the kernel sets the user-level program counter for the process to the start address in the program's ELF header, which is somewhere after 0x08048000. When the kernel returns back to user space, the program will wind up running directly.
(What the start address is depends on how much stuff has to go at the start of the program's text area, so it varies from program to program.)
If the program is dynamically linked, the kernel instead sets the program counter to the start address of the dynamic linker, and the process will start running the dynamic linker's code directly. The dynamic linker uses information in the auxiliary table to find the real program's code and data, and eventually start it.
(From this we can see how calling the dynamic linker an 'interpreter' is a misnomer; it works nothing like an interpreter for a script, although it is a regular ELF executable.)
Technically, you could make dynamically linked ELF executables that contained no actual machine code but instead had a 'dynamic linker' that actually was an interpreter. However, this would be tricky to pull off, because dynamic linkers cannot themselves be dynamically linked, so your interpreter would need either to not use any shared libraries (including the normal C and Unix runtime) or to bootstrap the regular dynamic linker somehow.