The programmer's problem with WikiText systems

May 31, 2011

Here is a confession: I have been sitting on an update to DWiki that adds some things to its dialect of wikitext for going on four years now. The core problem boils down to two issues specific to wikis.

The first is that in a wiki, additions to your markup language are more or less forever. Once people actually start writing pages that use them, you have three choices; you can support the addition forever, you can drop the addition and live with the resulting broken pages, or you can go through your entire page database and try to rewrite all of the pages to use some new equivalent. It is possible to do the last option well, but people usually don't and then sysadmins hate you.

(If you want to do a good job, reuse your real wikitext parser and just have it emit the correct new version of the wikitext instead of HTML. This way you guarantee that your conversion process parses the old wikitext in exactly the same way that page rendering does.)

The second is the eternal wikitext lament: all of the good formatting characters (and character sequences) are taken. You never have enough formatting markup to go around, and every bit of markup that you use is a bit of plain text that people can't use (this is especially so for the good markup). When combined with the first problem, this means that you want to be really sure that some particular bit of markup is the right use for its particular character sequence before committing to it because you're mostly stuck if it later turns out that you made a bad choice; a bad choice can be really painfully bad, forever locking off an otherwise attractive bit of markup.

It's interesting (well, to me) to think about why HTML doesn't suffer from this issue. Part of it is that HTML adds features slowly, but I think that a large part of it is that HTML's markup is not in short supply the way that wikitext markup is. The goal of good wikitext markup is to be unintrusive and small. HTML doesn't have this concern since it's already made the decision that its markup will be clearly intrusive, and thus it has a much wider range of decent markup to choose from.

(The CS geek way to put this is that HTML has made the decision to put all of its markup in its own namespace, separate from your document's actual text, whereas wikitext markup is trying to live in the same namespace as your document. One of the times that HTML becomes unusually irritating is exactly when the two namespaces can't be kept separate because your document text is riddled with <'s and &'s.)

Written on 31 May 2011.
« How to fail at versioning
My understanding of SQL normalization »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Tue May 31 22:58:31 2011
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.