Making a Python mountain out of a molehill

June 12, 2005

DWiki is the software that runs CSpace, including this blog. It's wound up much bigger than I expected and wanted it to be. This is sort of the story of how (or why) that happened.

DWiki started out with a simple goal: serve up (revision controlled) wiki-text documentation as HTML pages. It didn't have to edit the pages; we'd do that in a real editor in a real environment. The core program model was simple: pages would be generated through a basic templating system (no 'programming': loops, ifs, embedded code), and content would be produced by 'renderers' written in Python, each of which would produce one independant piece of text. The most important renderer would be the one that got the page's content and converted it from wikitext into HTML.

I looked at a number of existing Python modules for 'wikitext' to HTML conversion but passed on all of them. I wasn't willing to let their markup get in the way of writing about Unix system administration, and I thought a number of them had overly verbose syntax for things like links. (Also, a lot of the converters are written in perl with only incomplete reimplementations available in Python.)

So I had to write my own; I figured this would be a relatively simple job; after all, it was just text parsing. (Cue the hollow laughter.)

Speaking from painful experience, I can now say that wiki-text to HTML conversion is something that only seems simple. This is especially so if one wants to accept plain text that is visually appealing and that GNU Emacs is happy to generate. Parsing and formatting DWikiText rapidly grew into many hundreds of lines; the current module is over a thousand lines (not counting DWikiText macros, which add another five hundred lines).

Once DWiki was up enough to display pages, I rapidly discovered that the original simple templating system was a bit too simple. The current TemplateSyntax is still not a programming language, but it is no longer simple and elegant and the code has expanded accordingly.

But everything past that is really my fault, because I gave in to feature creep (wikitext rendering plus templates are only about a quarter of the code). From the initial simple feature list, DWiki acquired (in no particular order):

  • magic HTTP redirections
  • support for '304 Not Modified' HTTP responses
  • authentication and access restrictions
  • searching
  • comments
  • 'blog' and 'blogdir' (changelog) views of directories
  • virtual directories
  • Atom syndication feeds
  • serving static content itself

Each time my inability to say no to some neat feature I thought of resulted in another little growth, until as I write this DWiki weighs in at:

   lines    code     doc comment   blank  file
    7514    4673     179    1985     677  total

Personally, 7500 lines of Python for what DWiki does strikes me as somewhat over the top (maybe even absurd). Python is a powerful and therefor compact language; even if I just count the code lines alone, I feel that in 4600 lines I should have something much more impressive. Instead I don't even have unit tests.

Why does it matter?

For a start, I want DWiki to be viable as a CGI-BIN program on ordinary hardware. 7500 lines of Python is, how shall I say this, somewhat heavyweight. (It also uses a not insignificant chunk of virtual memory when started, although a lot of that seems to be due to system modules that DWiki uses.)

Plus, writing bulky overweight programs offends my sense of aesthetics (which has already taken a pummeling from the code alone).

Written on 12 June 2005.
« Why a Blog?
Pitfalls in generating Last-Modified: »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Jun 12 03:22:32 2005
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.