Why I don't like implicit string interpolation
All (computer) languages of any practical use wind up needing to format
strings somehow. There are many mechanisms for this and you can find
languages that use pretty much any of them, so every so often someone
creating a language will settle on implicit string interpolation as
theirs (this is where you write strings as something like
"a: $a b:
$b" and the actual string result has the variables substituted in). In
my opinion this is invariably a mistake; to crib from the famous general
slam, now the language has two problems.
What's wrong with this? In no particular order:
- it requires escaping some (literal) string content all of the time.
In our example, you must escape all
$s that you want to be literal in your strings.
(Many languages attempt to get around this by introducing a way of writing string literals that means 'this should not be subject to implicit expansion; everything in it should be taken literally'. The need for such a thing should act as a big warning sign.)
- it's potentially quite explosive if you interpolate string values that
are stored in variables. If you're lucky your programs just break
because of bad interpolations caused by a user, say, including a
$in something that you didn't escape properly when you put it into a string variable. If you're not that lucky, you get security vulnerabilities.
- if you only interpolate string literals (in order to get around the
previous problem), you cut off a number of things that people
really do want to do in practice. Believe it or not, people really
do dynamically create (and use) formatting strings in real code.
- fundamentally it's almost certainly optimizing the wrong thing. In most programs there are many more literal strings than there are strings that are formatting instructions, yet you've made it more work to write and use literal strings in order to make string formatting easier.
(I have nothing against string interpolation if it must be invoked
explicitly. In fact I quite like things like Python's use of '
%' as an
operator for this purpose and yes, I know it's deprecated in Python 3 in
favour of an explicit function call.)
There are a very small number of programming languages where the last issue is in fact not the case, where most of the strings are basically formatted output. In these reasonably unusual languages, implicit string interpolation can make sense (if done carefully, because you still have the other issues). But if you are putting together any sort of general purpose language, this almost certainly does not apply.
(I'm on the fence if the Bourne shell qualifies, but then the Bourne shell has a number of problems as a language.)
If you want to have one syntax (perhaps one set of quote characters) for non-interpolated strings and another syntax (another set of quote characters) for interpolated strings, I have two suggestions.
First, you should pick the less common and more awkward quote
character(s) to be what's used for interpolated strings. Strings that
aren't getting interpolated are the common case, so they should get the
best way of being written. As a practical issue you should avoid using
" for interpolated strings, because it's become the generic syntax for
(Yes, this means that I think the Bourne shell made a mistake here.)
Second is the issue of internal implementation. You have at least two choices: you can quote all special characters in strings that are supposed to be uninterpolated, or you can specially mark strings that are supposed to be interpolated. You should do the latter, because then the default state of a string entity is uninterpolated. This is much safer in the presence of things like external modules and libraries that wind up passing or returning strings into your language; they have to go out of their way to make a string interpolated, as opposed to having to go out of their way to make it safe.
Comments on this page:Written on 28 June 2012.