2011-06-30
Some ways to test if a program securely runs other programs
Suppose that you have a program that can run other programs, and you want to find out if it securely runs the other programs. In an ideal world, the program's documentation would tell you, and you could trust it. Sadly we do not live in an ideal world.
First you need a test environment where you can control what external program your program runs and force it to actually run the program. In many cases (such as testing daemons, web servers, MTAs, and so on), the easiest test environment is a virtual machine. Next you need a program that logs its arguments in some appropriate place.
Let's say that we're testing a daemon with a configuration option
called 'av_scanner' and that our program to report arguments
is called argreporter.
Our first test is whether straight shell metacharacters have any effect:
av_scanner = /opt/argreporter 'test' >/tmp/canary
After you get the daemon to run your configured 'AV scanner', check
argreporter's logs; if it is run securely, it should have seen exactly
these two arguments. If it was run through the shell, it will have seen
one argument that won't have quotes (and /tmp/canary will exist).
Variants of this are possible; for example, some programs will helpfully
run the av_scanner command line through the shell if they see shell
metacharacters.
If your program fails this check, you can stop now. Otherwise, though,
we still need to test the substitutions that your program does. For
this, we need to find some substitution that introduces a space; the
best one is a variable substitution, because those are usually the
simplest. Suppose that we have a $recipients variable that has a
space separated list of destinations; then:
av_scanner = /opt/argreporter $recipients
The logs should show that argreporter was called with one argument and that the argument had spaces in it. If it was called with several arguments and each argument was a single destination, your program makes substitutions before breaking the command line up into arguments and you've just seen why this is somewhere between annoying and dangerous.
(Another important test is what happens with empty or blank
substitutions. You want these to result in an argument of ''
instead of the argument just disappearing.)
2011-06-28
How to securely run programs from inside your program on Unix
Every so often, people feel that their program needs to be able to run
another program under some circumstances; for instance, you're writing
a mailer and you want to be able to run an external virus scanner to
see if a newly received message has a virus. A certain amount of these
people decide to use system() or popen() for this; as the saying
goes, now they have two problems.
(The only time it's ever okay to use these routines is if the entire command line to run is set statically in your configuration file, with no runtime expansion, substitution, or insertion of anything at all.)
Hopefully everyone understands why: both of these routines pass your
command line to the shell to have it interpret the line (that's why they
can take a 'command line' as an argument, instead of an array of command
arguments). This means that any shell metacharacters that appear in
substitutions or expansions will be interpreted by the shell, generally
with disastrous results. So the first rule of securely running other
programs is that you must execute programs yourself, using an interface
(such as the exec*() family) that lets you directly control what the
command line arguments are.
This means that you're responsible for breaking up your 'command line' into command line arguments, and as it happens there's a right and a wrong way to do this. The right way is to tokenize your command line before you perform substitutions.
Suppose that you have a hypothetical command line:
/opt/scanner $spoolfile $sender $recipient
If you're directly running /opt/scanner, this looks safe; it will get
called with three command line arguments and will be happy. But suppose
that someone manages to sneak a space into $sender. If you expand
the variables in this command line before breaking it up into arguments,
the scanner is suddenly getting called with four arguments, which is not
good. (And even without that it could be called with two arguments if
$sender is sometimes allowed to be empty.)
The proper way to do this is to break this template command line up into
arguments, then expand each argument separately. No extra arguments will
be created no matter what spaces or other characters are introduced by
the expansion, and if an expansion is empty you won't delete arguments
(the scanner will instead see an empty $sender argument). This is
exactly what you want to happen.
(My off the cuff instinct is that tokenization before expansion is in fact what you want for general string substitutions, but I haven't thought deeply about the issue. It certainly is what you want in this specific situation.)
Some people will say that they really do need the shell because they need to do things like redirection and substituting shell variables. The right answer is for these people to write a little shell script around their real program, not for your program to use the shell to run everything. A similar answer applies to people who need string substitution to retokenize. In both cases the principle is that if you have security you can always give it away, but if you don't have security to start with you cannot get it back.
(If people often need redirection or shell variables or retokenization after substitution, you have a deeper design issue. Find out why they keep needing to do these things in your system and fix it so that they don't.)
2011-06-19
The Unix shell initialization problem and how shells should work
In general, there are at least three sorts of things that you may want to do when a shell is started: one time environment initializations that can be inherited in subshells, per shell things that cannot be inherited, and anything that you want to do when logging in (you might split this into interactive and non-interactive things under some situations). Depending on what a shell is being started for, you want it to do all, some, or none of these things.
A regular login shell does all of them and then prompts you for
commands; an interactive subshell started as part of your login session
should just do the per shell setup and then prompt for commands. A shell
started when you do 'ssh host command' should do the environment
initialization and then run the command you gave it. An xdm login
shell should do the environment initialization, definitely
skip the interactive login bits, and then either run a command or
execute a shell script (depending on how xdm is set up). And a shell
started via a shell script with '#!/path/to/shell' should normally do
none of these.
(If your shell runs per user dotfiles when executing scripts, it is really hard to create reliable scripts; any user can innocently screw up your script's execution environment in many ways and then your script breaks for them and only them, often oddly. One consequence of this is that if your shell can import things like functions from the environment, you need a switch to turn this behavior off (or just default the import to off when running commands from a file).)
The way Unix shells should work is there should be an agreed on set of command line switches (respected by all shells) to turn on each of the three sorts of things. Programs that needed shells to do certain sorts of setup work could then supply the appropriate switches (and there would be default behaviors, such as doing per shell initialization when started with stdin as a tty). Everyone would be able to be explicit about what was going on and what was being done when; as a bonus we could easily solve the xdm problem, among other issues.
In the real world of Unix as it is today, shells resort to heuristics to tell these situations apart, which occasionally fail. And then we have xdm and its hacks. The situation is extremely unlikely to change now, although in theory a Linux distribution (or several of them) could sit down and bang some heads together over this.
By the way, I maintain that a direct corollary of this whole issue is that your shell should give startup files direct information about what environment the shell is running in (or thinks it's running in). This is especially the case if the shell always or nearly always runs some file. If you do not make this information available you force the authors of startup files to examine the environment and try to reverse engineer your heuristics, which is a losing proposition for any number of reasons.
(The minimum information you provide should be what startup files the shell is planning to run, but more information is better here.)
2011-06-18
My xdm heresy
My xdm heresy is that I don't like it and don't use it and never have
(at least not on my main machines, the ones
that run my full environment). This has nothing to do with xdm itself
(I don't like any of the xdm alternatives either) and everything to do
with how much of a hack the xdm model of 'logging in' is.
Unix has long had a well defined model of how you log in and establish
your environment, where the shell that was directly started by login
(and later sshd and so on) was a special 'login shell' that did
additional initialization. Unfortunately, Unix tied together two
separate ideas in the process of doing this, partly out of necessity in
the early days: non-interactive things that you wanted to do once to set
up your environment, and interactive things that you wanted to do when
you logged in (in the old days, this could include environment setup
things such as asking you what sort of serial terminal you were using
and initializing it).
This is a bad fit for what xdm wants to do, to put it one way; xdm is
effectively a non-interactive login. Xdm isn't the first program to
run into this issue; the wheels started coming off the classic Unix
login model the moment rsh introduced the ability to run commands on a
machine without actually logging in.
What xdm should have done is try to solve the problem once and for all, ideally by introducing the idea of a non-interactive 'login' shell that merely initialized the environment and then ran more commands for you. Failing that, it should have introduced a standard for 'this is how xdm will invoke something to set up your login environment'. Instead it simply ignored the problem. (Given that rsh had also ducked addressing the problem, xdm was following a well-blazed path.)
Ignoring the problem was and is a hack. It works (and worked) sometimes but not always, and of course it created a whole series of workarounds with their own new set of issues. The centrality of this hack and the resulting uncertainty in how xdm works (or doesn't work, depending on local factors) irritates me. As a result I prefer to get around the whole issue by logging in on the text console and then starting X by hand, and I've done it this way for a very long time.
(Starting X by hand may be difficult, but at least I don't feel like my entire environment is built on changeable quicksand. Also, it makes me grumpy to hack my own environment up to cope with the fallout from xdm's bad hack.)
(For an example of the sort of problems that this xdm hack and attempts to work around it cause, see for example this.)