Parsing versus rewriting: how to tell them apart

March 16, 2012

In light of my entry on wikitext transitions, you might ask how you tell when you have a parser instead of a rewriting engine. What is the line? It's not as simple as using or not using regular expressions; there are several wikitext parsers that are strongly regular expression based, some of them quite cleverly.

As it happens, I think there is a simple, easy to make distinction: a parser has separate input and output text spaces, at least on a conceptual basis, while a rewriter only has a single space that is both input and output. A rewriter transforms in place; a parser moves things from input to output (or more exactly consumes input and creates output, even if the output is a copy of the input).

(Note that a parser may have multiple levels of examination and the input and output spaces do not necessarily have to be in separate buffers. You could imagine a parsing engine that rewrote things 'in place', with a current position pointer that it always moved forward.)

This leads to a feverish thought experiment: can you somehow transform a regular expression based rewriter into a parser by making it output all of its transformed text into a new area, instead of replacing the text in place? After all, regexp replacement generally does in-place rewriting because that's the default and easiest way to operate.

I don't think such a rewrite would be easy; you would probably have to have multiple levels of regular expression rewrites, for example, and there are a bunch of tricky issues that in-place rewrites let you ignore (for example, in-place rewrites automatically handle all of the boring text that doesn't need to have anything done to it). But this idea might give some sort of structure for an attack on a particular rewrite engine.

(It also makes me wonder what a regular expression substitution engine that was not built around in-place rewrites would look like. Rob Pike's sam text editor had some support for something that was kind of like this, for doing multiple and potentially conflicting substitutions to the same block of text at once, but I never got it to work particularly well.)

Written on 16 March 2012.
« Part of the cleverness of Unix permissions (a little thought)
The standard trajectory of a field »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Mar 16 23:10:40 2012
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.