Why I write recursive descent parsers (despite their issues)
Today I read Laurence Tratt's Which Parsing Approach? (via), which has a decent overview of how parsing computer languages (including little domain specific languages) is not quite the well solved problem we'd like it to be. As part of the article, Tratt discusses how recursive descent parsers have a number of issues in practice and recommends using other things, such as a LR parser generator.
I have a long standing interest in parsing, I'm reasonably well aware of the annoyances of recursive descent parsers (although some of the issues Tratt raised hadn't occurred to me before now), and I've been exposed to parser generators like Yacc. Despite that, my normal approach to parsing any new little language for real is to write a recursive descent parser in whatever language I'm using, and Tratt's article is not going to change that. My choice here is for entirely pragmatic reasons, because to me recursive descent parsers generally have two significant advantages over all other real parsers.
The first advantage is that almost always, a recursive descent parser is the only or at least easiest form of parser you can readily create using only the language's standard library and tooling. In particular, parsing LR, LALR, and similar formal grammars generally requires you to find, select, and install a parser generator tool (or more rarely, an additional package). Very few languages ship their standard environment with a parser generator (or a lexer, which is often required in some form by the parser).
(The closest I know of is C on Unix, where you will almost always find some version of lex and yacc. Not entirely coincidentally, I've used lex and yacc to write a parser in C, although a long time ago.)
By contrast, a recursive descent parser is just code in the language. You can obviously write that in any language, and you can build a little lexer to go along with it that's custom fitted to your particular recursive descent parser and your language's needs. This also leads to the second significant advantage, which is that if you write a recursive descent parser, you don't need to learn a new language, the language of the parser generator, and also learn how to hook that new language to the language of your program, and then debug the result. Your entire recursive descent parser (and your entire lexer) are written in one language, the language you're already working in.
If I was routinely working in a language that had a well respected de facto standard parser generator and lexer, and regularly building parsers for little languages for my programs, it would probably be worth mastering these tools. The time and effort required to do so would be more than paid back in the end, and I would probably have a higher quality grammar too (Tratt points out how recursive descent parsers hide ambiguity, for example). But in practice I bounce back and forth between two languages right now (Go and Python, neither of which have such a standard parser ecology), and I don't need to write even a half-baked parser all that often. So writing another recursive descent parser using my standard process for this has been the easiest way to do it every time I needed one.
(I've developed a standard process for writing recursive descent parsers that makes the whole thing pretty mechanical, but that's a discussion for another entry or really a series of them.)
PS: I can't comment about how easy it is to generate good error messages in modern parser generators, because I haven't used any of them. My experience with my own recursive descent parsers is that it's generally straightforward to get decent error messages for the style of languages that I create, and usually simple to tweak the result to give clearer errors in some specific situations (eg, also).
|
|