My favorite way of marking continued lines

January 16, 2013

One of the things you often want when designing configuration files and little domain specific languages is some way of splitting a single long logical line into several physical ones. In other words you want some way of marking line continuations. Over the years people have come up with a huge assortment of ways to do this; you can have a language with explicit terminators and just ignore newlines, you can put backslashes at the end of incomplete lines, and so on.

(Some languages have several different ways of continuing lines, depending on the specific context. Try not to do this in yours if you have a choice.)

As it happens I have a favorite way of doing this and I think it's the best way. It is the 'RFC 822' method (so named because it's how mail headers are handled), where a logical line is continued by indented physical lines. Here is an example:

This is a single
     logical line
     once everything is
This is a new logical line

The drawback of this approach is that it becomes harder to make indentation significant in your language. I'd argue that this is not an important drawback for configuration files or small DSLs since you should avoid generally significant indentation because it makes your language parser (much) harder.

The advantage of this approach to me is that it results in continued lines looking right or at least looking obvious. It's a very common formatting convention to indent continued lines anyways (even or especially when not required by the language) and making the indentation significant for this means that you can't wind up with indented lines that aren't actually continued (because, for example, you accidentally left out a \ at the end of the previous line; I've done this more than once in things like Makefiles).

Sidebar: parsing lines in this approach

I believe that the simplest way to parse the resulting language is in a two level process. At the first level you read physical lines, strip blank lines and comments, fold multiple physical lines into a single logical line, and deliver that line to the second level. The second level then parses your actual language. This requires a little bit of care in your first level and you'll need a little pushback stack for lines (since you're going to over-read by one physical line when reading a logical line and the physical line won't always be something you can just discard).

This is not quite a traditional lexer/parser split because your first level doesn't attempt to break up the logical lines into their components, but I try to avoid writing any sort of actual lexer for configuration files and small DSLs. If your situation is complex enough for a real lexer you probably want to handle the entire process in the lexer.

Written on 16 January 2013.
« How I drafted (okay, wrote) an entry in public by accident
More on my favorite way of marking continued lines »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Jan 16 22:53:28 2013
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.