How Python parses indentationOne of the interesting things about Python for compiler geeks is that it is a partially implicit context language, in that line indentation is significant. Implicit context languages been out of favour for a very long time, and most parsing techniques these days are geared towards stream oriented grammars. (By 'implicit context languages' I mean ones where state changes like
entering and exiting blocks are implicit in the difference between
lines, such as different indentation levels. By contrast, stream
oriented languages use explicit markers for such state changes, like Python deals with this in the tokenizer, which transforms changes in indentation level into synthetic INDENT and DEDENT tokens. One consequence of this is that the tokenizer is what enforces the rule that when you dedent you have to return to an existing previous indentation level, not something between one and another. When I looked at Python's actual grammar (in human-readable form in
Note particularly that this applies to all occurrences of '
(I can't say it's a good idea, though.) This is not documented in the relevant bit of the Python language reference, so counting on it is unwise. |
These are my WanderingThoughts GettingAround This is part of CSpace, and is written by ChrisSiebenmann. * * * Atom feeds are available; see the bottom of most pages. Categories: links, linux, programming, python, snark, solaris, spam, sysadmin, tech, unix, web |