2013-01-16
My favorite way of marking continued lines
One of the things you often want when designing configuration files and little domain specific languages is some way of splitting a single long logical line into several physical ones. In other words you want some way of marking line continuations. Over the years people have come up with a huge assortment of ways to do this; you can have a language with explicit terminators and just ignore newlines, you can put backslashes at the end of incomplete lines, and so on.
(Some languages have several different ways of continuing lines, depending on the specific context. Try not to do this in yours if you have a choice.)
As it happens I have a favorite way of doing this and I think it's the best way. It is the 'RFC 822' method (so named because it's how mail headers are handled), where a logical line is continued by indented physical lines. Here is an example:
This is a single logical line once everything is reassembled This is a new logical line
The drawback of this approach is that it becomes harder to make indentation significant in your language. I'd argue that this is not an important drawback for configuration files or small DSLs since you should avoid generally significant indentation because it makes your language parser (much) harder.
The advantage of this approach to me is that it results in continued
lines looking right or at least looking obvious. It's a very
common formatting convention to indent continued lines anyways (even
or especially when not required by the language) and making the
indentation significant for this means that you can't wind up with
indented lines that aren't actually continued (because, for example, you
accidentally left out a \
at the end of the previous line; I've done
this more than once in things like Makefiles).
Sidebar: parsing lines in this approach
I believe that the simplest way to parse the resulting language is in a two level process. At the first level you read physical lines, strip blank lines and comments, fold multiple physical lines into a single logical line, and deliver that line to the second level. The second level then parses your actual language. This requires a little bit of care in your first level and you'll need a little pushback stack for lines (since you're going to over-read by one physical line when reading a logical line and the physical line won't always be something you can just discard).
This is not quite a traditional lexer/parser split because your first level doesn't attempt to break up the logical lines into their components, but I try to avoid writing any sort of actual lexer for configuration files and small DSLs. If your situation is complex enough for a real lexer you probably want to handle the entire process in the lexer.
How I drafted (okay, wrote) an entry in public by accident
Since I tweeted about this recently, I might as well explain myself.
Sometimes there are small drawbacks to the perpetually popular file based approached to blog engines. One of them is the question of how you handle draft entries, ie entries that you're in the process of writing and that you aren't yet ready to publish. The hairshirt approach is to not do anything about them at all; your blog is only for published entries and you have to write drafts entirely outside of it. This is simple but has one to three drawbacks, depending on whether or not you write in HTML.
If you write in HTML, it only has the problem that links to other entries you've written probably have to use a totally different style in your draft than what would be ideal in a published entry. How much different depends on where you draft and preview your entry. If you write in a simple markup language you also have the problem of how you render your draft into HTML form (a job that the blog engine usually does for you one way or another). If you write in a wiki-language that has short names for links within your site you may have a third problem of getting those links to resolve properly in the rendering process.
Thus it's very attractive to have a private area of your file based blog where you can write drafts 'inside' the blog. This handles the rendering (if necessary) and displaying for you, allows you to use the exact same content and markup that you will use for the published entry, and with some moderate magic can resolve all links correctly. Wandering Thoughts is no exception; I long ago created a sort of access-restricted drafts area within CSpace (the (slightly) larger wiki environment that contains the blog).
But this means that there's an important thing you need to do before you
type something like 'vi BlogspotWebFail
', that being make sure that
you're in your drafts directory.
If you skip this step and you're outside the blog's directory hierarchy entirely, there's no particular harm done; you just won't get previews. But if you happen to be inside your blog's directory hierarchy, well, a straightforward file based blog engine will happily go 'oh, a file, this is an entry' and make it publicly visible. Bonus points are awarded if you happen to be drafting the entry within your blog's directory hierarchy but at a different place than the final entry will go.
This is what I managed to do by accident. As a result, I wrote this entry in public, in the wrong place, and of course a version (or perhaps several versions) of the entry propagated into my syndication feed and on to at least one planet site (and a phantom version of it is likely still there in some people's feed readers, partly since syndication feeds don't have any way of retracting entries).
(I might have noticed earlier than I did if I'd tried to preview the entry while I was writing it, but I just wrote it in a big burst. Instead I only noticed at the end when I went to spellcheck it in another window; since the other window was in the drafts directory, I got a 'no such file' error and then a sudden sinking feeling.)
So. Yeah. Sorry about that, for anyone who saw oddities in the syndication feed for a while.