On Python's grammar

February 13, 2007

Python's grammar looks imposingly complex from the outside, but the more I've thought about it the more I've realized that it's simple but clever. In particular, the way the work is split between the tokenizer and the actual grammar simplifies both.

(Technically you could claim that Python's grammar is straightforward because the tokenizer takes care of all the hard bits, but I consider this ducking the issue.)

The actual grammar is typical for an Algol-style language. You have the big set of rules for expressions, and the big shunting yards of the basic and compound statements, and that's mostly it. Python gains some simplicity because you can pretty much define anything anywhere (you can embed a class definition in the middle of a function defined inside a function inside a class definition, if you really want to), so it doesn't have to put ordering and placement constraints into the grammar.

(Python also cleverly sets up the grammar so that indentation only shows up in one rule, 'suite', the generic definition of a block of statements such as the body of a while or a class definition.)

The tokenizer too is pretty normal, although this is harder to see since it's hand-coded and thus there is no simple to read set of production rules, just the rather dry lexical structure description (very few people read language descriptions for fun, whereas you can actually skim a flex input file and get a sense of anything unusual going on). It does have to track the indentation level to generate synthetic INDENT and DEDENT tokens when it changes, and do implicit line joining, but neither are too complicated.

I am not sure that I would have come up with this split to start with, if I was designing a similar language where indentation was significant, and pretty much all of the other designs I can think of would be more complicated to implement.

Written on 13 February 2007.
« Why thin clients are doomed (part 2)
RPM tricks for dealing with multiarch machines »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Tue Feb 13 23:55:35 2007
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.