A module that I want in the standard library: Parsing

October 18, 2010

Given what I've written before, it will probably not surprise you that I have a little list of modules that I think should be in the standard library. The first one of them is a decent parsing or parser generator module.

There are two reasons in combination for this. First off, various sorts of parsing is an important part of the work that a decent number of programs do. Full bore language parsers are somewhat uncommon, but (at least in my work) it's reasonably common to be parsing (or at least looking at) less complex things, including the output of other programs. The Python standard library currently has a number of specialized parsers (for XML, for certain standard sorts of configuration files, and so on), but it has no general parser framework.

But wait, you might be saying, Python has a general parser framework in the form of regular expressions. That is the second reason that a real parser module is needed; to quote Jamie Zawinski, using regular expressions for parsing means that you now have two problems. Trying to do parsing with regular expressions is the classic example of using a bad solution to the problem. Regular expressions do not help you much if you want to do a real job of parsing something, but they make it very easy to assemble an incomplete, difficult to maintain, and fragile 'parser' that is likely to mishandle any input that it doesn't expect (and there have been lots of demonstrations of this). But as long as the standard library does have a regular expression module and doesn't have a parser module, people will continue building parsers using regular expressions and then blowing their feet off.

(Regular expressions are useful for lexing, but that is only a part of parsing things. My ideal parsing module would also have good support for building lexers, possibly integrated into the grammar specification.)

A good parsing module for the standard library would make it easier to build a real parser than to parse things with regular expressions, thereby encouraging people to solve their problems in the good, right way. (I would suggest that things like good error checking and error recovery would also be attractive, but people who are happy with regular expression based parsers are unlikely to be missing them now.)

Python has a number of (third party) parsing modules, none of which I've tried out. Ned Batchelder's comprehensive index and summary is the best resource on this that I know of.

(Since I have tried none of the existing modules, I have no opinion on which one should be in the standard library, if any. I just want some decent parsing module to be in there.)

Written on 18 October 2010.
« The cache eviction death spiral
Round-trip capable character encodings in Python »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Oct 18 22:00:42 2010
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.