Wandering Thoughts archives

2012-06-28

Why I don't like implicit string interpolation

All (computer) languages of any practical use wind up needing to format strings somehow. There are many mechanisms for this and you can find languages that use pretty much any of them, so every so often someone creating a language will settle on implicit string interpolation as theirs (this is where you write strings as something like "a: $a b: $b" and the actual string result has the variables substituted in). In my opinion this is invariably a mistake; to crib from the famous general slam, now the language has two problems.

What's wrong with this? In no particular order:

  • it requires escaping some (literal) string content all of the time. In our example, you must escape all $s that you want to be literal in your strings.

    (Many languages attempt to get around this by introducing a way of writing string literals that means 'this should not be subject to implicit expansion; everything in it should be taken literally'. The need for such a thing should act as a big warning sign.)

  • it's potentially quite explosive if you interpolate string values that are stored in variables. If you're lucky your programs just break because of bad interpolations caused by a user, say, including a $ in something that you didn't escape properly when you put it into a string variable. If you're not that lucky, you get security vulnerabilities.

  • if you only interpolate string literals (in order to get around the previous problem), you cut off a number of things that people really do want to do in practice. Believe it or not, people really do dynamically create (and use) formatting strings in real code.

  • fundamentally it's almost certainly optimizing the wrong thing. In most programs there are many more literal strings than there are strings that are formatting instructions, yet you've made it more work to write and use literal strings in order to make string formatting easier.

(I have nothing against string interpolation if it must be invoked explicitly. In fact I quite like things like Python's use of '%' as an operator for this purpose and yes, I know it's deprecated in Python 3 in favour of an explicit function call.)

There are a very small number of programming languages where the last issue is in fact not the case, where most of the strings are basically formatted output. In these reasonably unusual languages, implicit string interpolation can make sense (if done carefully, because you still have the other issues). But if you are putting together any sort of general purpose language, this almost certainly does not apply.

(I'm on the fence if the Bourne shell qualifies, but then the Bourne shell has a number of problems as a language.)

Sidebar: The ' versus " issue

If you want to have one syntax (perhaps one set of quote characters) for non-interpolated strings and another syntax (another set of quote characters) for interpolated strings, I have two suggestions.

First, you should pick the less common and more awkward quote character(s) to be what's used for interpolated strings. Strings that aren't getting interpolated are the common case, so they should get the best way of being written. As a practical issue you should avoid using " for interpolated strings, because it's become the generic syntax for strings.

(Yes, this means that I think the Bourne shell made a mistake here.)

Second is the issue of internal implementation. You have at least two choices: you can quote all special characters in strings that are supposed to be uninterpolated, or you can specially mark strings that are supposed to be interpolated. You should do the latter, because then the default state of a string entity is uninterpolated. This is much safer in the presence of things like external modules and libraries that wind up passing or returning strings into your language; they have to go out of their way to make a string interpolated, as opposed to having to go out of their way to make it safe.

programming/AgainstStringInterpolation written at 00:53:18; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.