Why I don't like implicit string interpolation

June 28, 2012

All (computer) languages of any practical use wind up needing to format strings somehow. There are many mechanisms for this and you can find languages that use pretty much any of them, so every so often someone creating a language will settle on implicit string interpolation as theirs (this is where you write strings as something like "a: $a b: $b" and the actual string result has the variables substituted in). In my opinion this is invariably a mistake; to crib from the famous general slam, now the language has two problems.

What's wrong with this? In no particular order:

  • it requires escaping some (literal) string content all of the time. In our example, you must escape all $s that you want to be literal in your strings.

    (Many languages attempt to get around this by introducing a way of writing string literals that means 'this should not be subject to implicit expansion; everything in it should be taken literally'. The need for such a thing should act as a big warning sign.)

  • it's potentially quite explosive if you interpolate string values that are stored in variables. If you're lucky your programs just break because of bad interpolations caused by a user, say, including a $ in something that you didn't escape properly when you put it into a string variable. If you're not that lucky, you get security vulnerabilities.

  • if you only interpolate string literals (in order to get around the previous problem), you cut off a number of things that people really do want to do in practice. Believe it or not, people really do dynamically create (and use) formatting strings in real code.

  • fundamentally it's almost certainly optimizing the wrong thing. In most programs there are many more literal strings than there are strings that are formatting instructions, yet you've made it more work to write and use literal strings in order to make string formatting easier.

(I have nothing against string interpolation if it must be invoked explicitly. In fact I quite like things like Python's use of '%' as an operator for this purpose and yes, I know it's deprecated in Python 3 in favour of an explicit function call.)

There are a very small number of programming languages where the last issue is in fact not the case, where most of the strings are basically formatted output. In these reasonably unusual languages, implicit string interpolation can make sense (if done carefully, because you still have the other issues). But if you are putting together any sort of general purpose language, this almost certainly does not apply.

(I'm on the fence if the Bourne shell qualifies, but then the Bourne shell has a number of problems as a language.)

Sidebar: The ' versus " issue

If you want to have one syntax (perhaps one set of quote characters) for non-interpolated strings and another syntax (another set of quote characters) for interpolated strings, I have two suggestions.

First, you should pick the less common and more awkward quote character(s) to be what's used for interpolated strings. Strings that aren't getting interpolated are the common case, so they should get the best way of being written. As a practical issue you should avoid using " for interpolated strings, because it's become the generic syntax for strings.

(Yes, this means that I think the Bourne shell made a mistake here.)

Second is the issue of internal implementation. You have at least two choices: you can quote all special characters in strings that are supposed to be uninterpolated, or you can specially mark strings that are supposed to be interpolated. You should do the latter, because then the default state of a string entity is uninterpolated. This is much safer in the presence of things like external modules and libraries that wind up passing or returning strings into your language; they have to go out of their way to make a string interpolated, as opposed to having to go out of their way to make it safe.

Written on 28 June 2012.
« One root of my problem with GNU Emacs
More about my issues with DTrace's language »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Jun 28 00:53:18 2012
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.