What good secure string expansion on Unix should look like
July 23, 2011
In yesterday's entry, I covered some options for how to make string expansion and tokenization of command lines aware of each other. Before I pick what I think is the best approach, let's take a step back and talk about what results we want.
Consider the following hypothetical example:
I think that the simple way to achieve this is to perform string
expansion before tokenization but to mark the result of variable
expansions as being all in a single token. You don't quite want variable
expansion to force token boundaries (otherwise '
(Possibly you want to expose an explicit operator to group several
expansions together as a single non-breakable entity. You could call it
If you want to tokenize before expansion, clearly the tokenizer needs to
be language aware. Roughly speaking, I think what you wind up wanting to
do is parse the string into an AST that is composed partly of tokenized
literal text, partly of language operators, and partly of variable
expansions. Then you evaluate the AST to generate a stream of tokenized
text, where a straightforward variable expansion like
(I have ripped this idea off from my understanding of the general approach that web frameworks usually take to parsing and evaluating their page templates.)
Sidebar: an alternate tokenization approach
An alternate tokenization approach is to say that the AST should include
explicit token boundary markers instead of pre-tokenized text (and
whitespace normally turns into such a boundary marker). Then the AST
evaluation produces a stream that is a mixture of token boundary markers
and text chunks; you take the stream and fuse all text between two
boundary markers together into a single argument. This naturally handles
cases like '
Written on 23 July 2011.
* * *