The right way to do wikitext transitions

March 14, 2012

Suppose that you have a wiki and for some reason you really need to change the wikitext dialect that it accepts (sadly this is not always hypothetical). As I once alluded to in a parenthetical aside, there is a right way to do this, one that will not make people swear off your software. As part of this you can make a small change to your wiki engine that will make all sorts of transitions much easier and thus make people happier with your markup language.

To put it simply, the wrong way to do wikitext transitions is anything that does not use your normal wikitext rendering engine. The right way is to use your regular wikitext rendering engine but instead of having it output HTML, have it output your new wikitext markup. Using your regular engine means that the conversion process interprets the wikitext exactly as it usually gets displayed; you never have a case where the conversion thinks the wikitext markup means one thing but it actually means another.

(You are also quite likely to have a complete conversion, since the rendering engine is itself a natural checklist of all of your markup. And if you miss some markup that's actually used, you can spot it from unexpected HTML in the output.)

So why don't wiki authors routinely do this? My guess is that many wikis don't actually have rendering engines with real parsers, but instead mostly use regular expression based progressive rewrites of the input text. Such progressive rewrites are relatively easy for wikitext to HTML because your output format is generally hard to confuse with your input format (which means that you don't run the risk of accidentally reprocessing already fully processed output). They are not as easy with wikitext to wikitext, because here your output format is easily confused with as yet unprocessed input.

(This is the old general regular expression problem of wanting to rename A to B at the same time that you rename B to A.)

A closely related way to make people happy with you is to have some way to dump out raw (untemplated) HTML for wikitext pages. People like this because it makes migrating away from your wikitext engine much simpler. Content in plain HTML is extremely portable and relatively easy to put into something else; the HTML that your wiki outputs for actual pages is not so much, because it is ornamented with navigation, sidebars, and so on. Also, when you have a specific 'output plain HTML' mode you can easily make it walk all wikitext pages for people instead of forcing them to crawl their site.

(This is on my mind lately because we are staring at this issue; we have a MoinMoin wiki that we need to turn into something else, and extracting the content in some usable form is clearly going to be a pain.)

I understand that some wikitext engines can import sufficiently plain and straightforward HTML and turn it into wiki markup (eg, I believe there is software to do this for Markdown). I consider this going above and beyond the call of duty for a wiki, but if you want to do it and can do it well it'll certainly be appreciated. If you support both simple HTML output and simple HTML input, try to make sure that doing a round trip doesn't change the markup (because sooner or later some joker will try it, just to see what happens).


Comments on this page:

From 198.182.56.5 at 2012-03-15 02:59:14:

Yup. My own very stupid wiki-like system uses regular expression matching and I couldn't get it to do wikitext-to-wikitext transformations without a lot of work.

-- Smarry

From 132.183.156.105 at 2012-03-15 10:19:26:

What about adding a new wikitext engine for new content, and keeping the old one for historical content?

By cks at 2012-03-15 19:51:46:

The kind of people who want to do wikitext transitions generally don't want to keep around the old wikitext engine; they've wound up thinking that it's a mistake (or a maintenance nightmare, or both). There are also pragmatic issues with a dual-engine setup:

  • you need to mark content to say which engine it's rendered in.
  • you have to keep documentation and code for both engines up to date.
  • people have to know both engines, because someday they may be revising older content.

(If you say that the moment someone starts to revise old content they must convert it into the new wikitext format you will be what they call 'very unpopular'.)

Written on 14 March 2012.
« Configuration management is not documentation, at least not of intentions
Part of the cleverness of Unix permissions (a little thought) »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Mar 14 21:34:45 2012
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.