Wandering Thoughts archives

2012-06-28

Why I don't like implicit string interpolation

All (computer) languages of any practical use wind up needing to format strings somehow. There are many mechanisms for this and you can find languages that use pretty much any of them, so every so often someone creating a language will settle on implicit string interpolation as theirs (this is where you write strings as something like "a: $a b: $b" and the actual string result has the variables substituted in). In my opinion this is invariably a mistake; to crib from the famous general slam, now the language has two problems.

What's wrong with this? In no particular order:

  • it requires escaping some (literal) string content all of the time. In our example, you must escape all $s that you want to be literal in your strings.

    (Many languages attempt to get around this by introducing a way of writing string literals that means 'this should not be subject to implicit expansion; everything in it should be taken literally'. The need for such a thing should act as a big warning sign.)

  • it's potentially quite explosive if you interpolate string values that are stored in variables. If you're lucky your programs just break because of bad interpolations caused by a user, say, including a $ in something that you didn't escape properly when you put it into a string variable. If you're not that lucky, you get security vulnerabilities.

  • if you only interpolate string literals (in order to get around the previous problem), you cut off a number of things that people really do want to do in practice. Believe it or not, people really do dynamically create (and use) formatting strings in real code.

  • fundamentally it's almost certainly optimizing the wrong thing. In most programs there are many more literal strings than there are strings that are formatting instructions, yet you've made it more work to write and use literal strings in order to make string formatting easier.

(I have nothing against string interpolation if it must be invoked explicitly. In fact I quite like things like Python's use of '%' as an operator for this purpose and yes, I know it's deprecated in Python 3 in favour of an explicit function call.)

There are a very small number of programming languages where the last issue is in fact not the case, where most of the strings are basically formatted output. In these reasonably unusual languages, implicit string interpolation can make sense (if done carefully, because you still have the other issues). But if you are putting together any sort of general purpose language, this almost certainly does not apply.

(I'm on the fence if the Bourne shell qualifies, but then the Bourne shell has a number of problems as a language.)

Sidebar: The ' versus " issue

If you want to have one syntax (perhaps one set of quote characters) for non-interpolated strings and another syntax (another set of quote characters) for interpolated strings, I have two suggestions.

First, you should pick the less common and more awkward quote character(s) to be what's used for interpolated strings. Strings that aren't getting interpolated are the common case, so they should get the best way of being written. As a practical issue you should avoid using " for interpolated strings, because it's become the generic syntax for strings.

(Yes, this means that I think the Bourne shell made a mistake here.)

Second is the issue of internal implementation. You have at least two choices: you can quote all special characters in strings that are supposed to be uninterpolated, or you can specially mark strings that are supposed to be interpolated. You should do the latter, because then the default state of a string entity is uninterpolated. This is much safer in the presence of things like external modules and libraries that wind up passing or returning strings into your language; they have to go out of their way to make a string interpolated, as opposed to having to go out of their way to make it safe.

AgainstStringInterpolation written at 00:53:18; Add Comment

2012-06-27

One root of my problem with GNU Emacs

I have a GNU Emacs problem. It is at least sort of one of the reasons that I don't use GNU Emacs as much as I could.

The origin of my problem is that back what is now two decades ago (in the days of GNU Emacs 18) I was very into GNU Emacs; in fact, you could say that I was immersed in it. Then, slowly and for various reasons, I drifted partly away from GNU Emacs and especially had little contact (at least that I remember) with versions after version 18.

(I have a vague and possibly incorrect memory that many of the systems I used stayed on GNU Emacs 18 for close to a decade for various reasons.)

Those of you with GNU Emacs experience probably understand the problem now. You see, when you are deeply immersed in GNU Emacs, one of the things that happens is that you build up a bunch of personal ELisp code and .emacs customizations; you almost can't not do this. I was no exception and since I was young and enthusiastic, I wound up with quite a pile. All of which was written for GNU Emacs 18.

For those of you who have never used it (which would be almost everyone), let me assure you that GNU Emacs has changed a lot since version 18, including the ELisp environment and the effective GNU Emacs 'API' that your ELisp uses. If I had stayed immersed in progression of GNU Emacs versions over the two decades since version 18, this would be no problem; as an active user, I would have been on top of the waves of changes and progressively forward-ported or modified all of my pile of ELisp to accommodate them.

Instead, I am more like Rip van Winkle. The past two decades of GNU Emacs evolution plus my neglect have left me nursing a (reduced) tottering pile of ELisp code and .emacs hackery, much of which I barely remember any more. My code and customizations are probably doing things the wrong way now and sometimes probably the wrong thing to boot, but I'm too out of touch to know what the right thing is any more. In turn this mess feeds my disengagement from GNU Emacs (and discourages me from trying to fix any of the bits that irritate me about editing various things in GNU Emacs).

I suspect that the right solution to this mess is a rewrite from scratch; throw out my existing, hacked up .emacs and all of my remaining ELisp code, then redo everything that I still need or want. The good part of forgetting much of what I once knew about GNU Emacs programming is that I will have no choice but to look everything up in the current documentation (and thus I will get current information about how to do things right).

The downside of this, and why I shy away from the very idea of it, is that it will make using GNU Emacs painful for a while and require me to spend a bunch of time immersing myself in it again, all in order to return me to more or less where I am today.

(This would be a much easier sale if I was convinced that GNU Emacs was the editor I wanted to use all the time, but I'm not. And just pragmatically I'm always going to use vi a bunch no matter what editor I actually like best.)

PS: this problem would not be improved by using an IDE instead of GNU Emacs. If anything, an IDE would make it worse and GNU Emacs has actually been remarkably stable over those two decades. I'm pretty sure that there's no active, currently maintained IDE where you could basically ignore the past decade of new versions; instead, I'd expect that you'd pretty much have to start all over from scratch if you tried to jump from a decade old version to the current one.

MyEmacsProblem written at 03:01:06; Add Comment

2012-06-24

My take on fancy editors for programming

Recently on Twitter there was a little conversation (1, 2, 3, 4) about smart editors for programming, by which people meant editors that support things like auto-indentation and syntax highlighting. I have to confess that I have a very ambivalent attitude towards such fancy (or smart) editors and that I often wind up not using them.

In theory an editor with auto-indentation, syntax highlighting, and so on is a great thing. In practice, the implementations of all of these are often not what I consider well done; syntax highlighting often looks bad or unreadable and auto-indentation can behave in ways that irritate me (for instance, using the tab key in a comment had better insert a real tab). When things are well done it's great and I love it, but there's a very fine line between something this nice and something that is sufficiently non-nice that I don't want to touch it and will switch to a stupid editor just to get away from it. Stupid editors are not as nice as a fancy smart editor, but they have the great advantage that they don't get in my way and don't make me want to claw my eyes out.

To be concrete my default editor for programming and development work is GNU Emacs, in which I am a lapsed expert (I once was very into it but have let that slip over the past long while). However the only syntax highlighting it does that I've been able to stand is what it does for Python; for everything else, including C, it has a set of colours and a division of what it highlights (and how) that range from simply ugly to completely unpleasantly bizarre. Its autoindentation smarts work just about right for Python and decently right for C (after I spent some time years ago customizing it), but tend to get in my way for HTML and shell scripts. Thus I have a split in my editing usage; I edit Python in full fancy GNU Emacs, C in GNU Emacs with syntax highlighting turned off, and generally everything else (HTML, shell scripts, Makefiles, etc) in a stupid editor (often vi).

Some of this I could change with customization; for example, I could modify syntax highlighting colours for C in an attempt to find a good colour set that doesn't look like a fruit salad explosion. But this sort of customization can only take me so far because it's limited to what the GNU Emacs mode in question was designed to let me easily alter. Completely changing how the C mode or the shell script mode highlights things is not something I can do short of surgery to the ELisp code. Similar things apply to the fundamental auto-indentation models used for the various languages; I'm probably not going to be able to persuade the shell script mode that hitting tab inside a comment should insert a real tab without some significant ELisp.

This isn't just a limitation of GNU Emacs, it's a limitation of how all fancy editors work (if anything GNU Emacs is better at this than other editors). All of them are going to have limits on what you can easily customize about their behavior and all of them may not work right for you as a result, for some or all of the languages they support. Sometimes the right answer is a plain editor.

(I suppose I could work out how to turn off all of GNU Emacs's smarts instead so I can use it as a plain text editor, but honestly there doesn't seem to be much point to that. I have a windowing system and multiple windows, so it's not like I have to do everything inside a single application.)

FancyProgrammingEditors written at 01:46:27; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.