== You should convert wikitext to HTML through an AST
Suppose that you are turning wikitext or some other form of structured
markup into HTML. The straightforward and often easiest way to do this
is to directly generate the HTML as you process the wikitext; when you
encounter and parse a particular bit of markup, you immediately output
the relevant HTML.
Having done this and stubbed my toes very vigorously, I
have a bit of advice: ~~you should parse into an [[AST
http://en.wikipedia.org/wiki/Abstract_syntax_tree]] and then generate
HTML from that AST~~. Yes, it's more code and it seems more indirect,
but it has some significant advantages.
The first general advantage is that it decouples the process of parsing
your wikitext from the process of generating HTML. Rather than being two
sides of a single chunk of code they now communicate through an API, the
AST. The AST then gives you a vantage point to examine and verify each
side of the process independently (and to evolve them separately). For
example, if you're working on the parsing code you can verify that the
results are the same by checking the AST instead of having to compare
the output HTML.
(If you use automated tests I expect that having an AST in the middle
will make both parsing and HTML generation much easier to test. It
should also make it much less annoying to evolve either side, because
many fewer tests are likely to need changes if you change parsing or
HTML generation.)
The second general advantage is that once you have an AST you don't
have to output just HTML. For instance ([[as I mentioned once before
../web/WikiTextDeploymentProblem]]) you can output a different
wikitext dialect, giving you a fully reliable way of doing wikitext
format conversions. Decide that some part of your markup should
be different? Now you can fix that. Or you could transition to a
significantly different format (eg, to Markdown or MediaWiki from your
own custom format) without giving your users and yourself heartburn.
All of these options are simply an AST walker away.
([[Go http://golang.org]] shows the power of being able to do this
sort of change automatically and reliably with their '[[_go fix_
http://golang.org/cmd/fix/]]' tool, which they've used to do any number
of language and library transitions. My impression is that the existence
of _go fix_ makes the Go people more willing to make such changes.)
A smaller advantage of an AST is that it gives you
structured information. [[As I've found out the hard way
../python/DWikiCoreDesignMistake]], a large monolithic blob of HTML is
not necessarily what you want. Even when you want HTML (as opposed to
metadata) it can be very useful to get things like 'the first paragraph'
or 'every top-level section header text' and so on. Generating HTML
from an AST also lets you defer certain rendering decisions until very
late in the process; this can let you cache more (or cache things more
easily).
Another AST advantage is simply that it will almost certainly push
you to write a relatively systematic parser for your wikitext. [[Real
parsers are important because they are easier to understand
WhyParsersMatter]].
(This was inspired by the comment left on [[my earlier entry about my
mistake ../python/DWikiCoreDesignMistake]]. My new revised code still
falls well short of producing an AST, but if I was writing a new parser
from scratch I've realized that I definitely would go to an AST as the
intermediate form.)