Wandering Thoughts archives

2011-07-23

What good secure string expansion on Unix should look like

In yesterday's entry, I covered some options for how to make string expansion and tokenization of command lines aware of each other. Before I pick what I think is the best approach, let's take a step back and talk about what results we want.

Consider the following hypothetical example:

av_scanner = cmdline:/opt/avscanner ${if isset{$heloname} {-h $heloname}} $recipients %s

Assuming that %s expands to a single argument, the straightforward reading of what we want to happen is for /opt/avscanner to be invoked with four arguments if $heloname is set and with only two if $heloname is unset. The various alternate interpretations and results are all absurd in various ways.

I think that the simple way to achieve this is to perform string expansion before tokenization but to mark the result of variable expansions as being all in a single token. You don't quite want variable expansion to force token boundaries (otherwise '-h$somevar' would wind up actually meaning '-h $somevar', and that's absurd in its own way), but you don't want the tokenizer to split things inside variable expansions. Fortunately getting this right is only a small matter of programming.

(Possibly you want to expose an explicit operator to group several expansions together as a single non-breakable entity. You could call it '${arg ...}'.)

If you want to tokenize before expansion, clearly the tokenizer needs to be language aware. Roughly speaking, I think what you wind up wanting to do is parse the string into an AST that is composed partly of tokenized literal text, partly of language operators, and partly of variable expansions. Then you evaluate the AST to generate a stream of tokenized text, where a straightforward variable expansion like $heloname or $recipients always gives you a single token regardless of what the contents are.

(I have ripped this idea off from my understanding of the general approach that web frameworks usually take to parsing and evaluating their page templates.)

Sidebar: an alternate tokenization approach

An alternate tokenization approach is to say that the AST should include explicit token boundary markers instead of pre-tokenized text (and whitespace normally turns into such a boundary marker). Then the AST evaluation produces a stream that is a mixture of token boundary markers and text chunks; you take the stream and fuse all text between two boundary markers together into a single argument. This naturally handles cases like '-h$somevar' and '$var1$var2'; in both cases there is no token boundary marker in the middle, so although the AST has two separate nodes the end result fuses the text from both nodes together into a single argument.

SecureStringExpansionII written at 01:49:08; Add Comment

2011-07-22

String expansion and securely running programs on Unix

One of the corollaries of how to securely run programs on Unix is that a general purpose, generic string expansion system is a bad fit with securely running programs. The problem is that there is a fundamental clash of goals between the two systems: a generic string expansion system wants to treat everything as a generic string to be expanded (regardless of what it actually is), and a secure system for running programs wants to tokenize everything using simple rules.

At this point I am going to pick on Exim for illustrative examples. Unfortunately, Exim tries to have it both ways at once and thus is a great source for showing the problems that this causes, no matter how much I like it otherwise. Please note that the problems here are generic; any program that takes either approach (or both at once as Exim does) will have the same issues.

First up is Exim's av_scanner setting. This is not expanded at all unless it starts with a '$', at which point the entire string must be expanded before Exim knows how to tokenize it:

av_scanner = ${if bool{true} {cmdline:/opt/avscanner $recipients %s}}

If you are concerned about arbitrary characters appearing in $recipients, there is no way to make this secure (as discussed before).

Second, the command setting for running things in pipes. This tokenizes things before string expansion, but it does the tokenization purely on a textual basis. As the documentation notes, this causes serious problems:

command = /some/path ${if eq{$local_part}{postmaster} {xx} {yy}}

Since tokenization is expansion-blind, this fails because all the string expansion evaluator winds up seeing is '${if' (which is a clear syntax error). To get this to work you have to force the tokenizer to treat the entire string expansion as a single token by 'quoting' it.

(The documentation does not quite put it the way that I have here.)

A side effect of tokenization before expansion is that a single string expansion can only ever expand to a single argument. (You may or may not be able to expand to nothing instead of a '' empty argument, depending on the implementation.)

What this points out is that command line tokenization and string expansion need to be aware of each other. Once the dust settles, either string expansion needs to be able to mark hard token boundaries (so that $recipients can be marked as a single token regardless of contents) or tokenization needs to know about the string expansion language (so that ${if ...} can be parsed into a single token despite the presence of internal spaces or other special characters).

(I have opinions on the answer here, but this entry is already long enough as it is.)

PS: if you want to be secure with minimal effort, it's clear that you need to do tokenization before expansion and provide some sort of 'quoting' mechanism to glue a string expansion expression into a single token. This is secure while being merely inconvenient and annoying to people writing configuration files. Simple expansion before tokenization cannot be made secure at all, as previously discussed.

SecureStringExpansion written at 01:07:08; Add Comment

By day for July 2011: 22 23; before July; after July.

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.