Tweaking code when I'm faced with the urge to replace it entirely

September 14, 2015

The one of the core parts of DWiki (the program behind all of what you're reading) is the code that turns DWikiText into HTML (and in many ways it is the most important component, since it's what ultimately renders all my content and all the comments). I spent a significant chunk of today tweaking a test version in an attempt to improve the conversion process for a number of corner cases that I care about and would like to make better.

There are two problems with this. The first is that one of the consequences of having what is now a long-running block is that I have a lot of content written in my wikitext dialect and pretty much all of it had better keep coming out just the same. I have no interest in trying to go through all of my entries to revise them for some bright new wikitext idea; backwards compatibility is quite important, warts and all. This is unfortunate in practice because I made some mistakes in my wikitext dialect way back when. It's sometimes possible to do little things around the corner of these mistakes, but that creates hacks and special rules and special magic code.

The other problem is that DWiki the program is also old and by now rather tangled, especially in the DWikiText renderer. Part of this tangle is just history, part of it is that it has been heavily optimized for speed, and part of it is that I made some fundamental structural mistakes in the beginning that have been carried through ever since then (one of them is not parsing to an AST, but it's not the only one). Faced with a complex set of code that I don't work with regularly, I descend to tweaking as carefully as I can rather doing anything deeper, which of course builds up the accumulated layers of hacks in the code and makes it harder to do anything except more tweaks and hacks around the edges.

(Python makes it surprisingly easy to do this sort of tweaking for various reasons, including its support for optional function arguments.)

At the same time, the more I worked on this code today the more clearly I saw how I wanted a modern version of the code to work. The more I have to stick in hacks and make tweaks, the more I also want to raze the whole complicated mess to the ground and redo it from scratch with a much better restructured version (or at least what I think would be such a thing, in my current state of not having written it and thus not having been faced by any messy gaps in my current grand vision).

(Any grand rewrite immediately starts to run into Python 3 thoughts, which lead to other thoughts I'm not going to try to cover here.)

I don't have any answers and I'm not even sure I'm going to deploy the tweaked version (I have a history of this sort of indecision), although I probably will now that I've written this up. But at least I had a reasonably enjoyable time fiddling around in the depths of DWiki once again and perhaps a bit more impetus towards doing some significant cleanups someday.

(While in the past I've lamented that I don't have a test suite for DWiki, I do actually have one for this sort of change; I can render all of this thing in both the old and new versions of the code and see what's different. This can be an interesting (re)learning experience, but that's another entry.)

Comments on this page:

By Moritz at 2015-09-14 02:09:45:

Why don't you introduce the ability to switch between two different renderer versions through some kind of meta tag on your entries?

So you could just use the new engine for selected and new posts and don't worry about the old ones.

By Alan at 2015-09-14 08:32:49:

The nuclear alternative would be to round-trip through HTML. Saves dealing with finicky custom parsing.

It'd start getting lossy and/or hard to test if you use any CSS classes. At least your blog formatting seems reasonably simple.

Maybe worth considering, if you want to switch to a markdown derivative :).



By cks at 2015-09-14 13:02:12:

I actually like almost all of my current wikitext dialect so I'm not looking to make any sort of major switch. If I did, I would make it primarily time-based (ie all content after time X is rendered with the new engine by default) because I'm not interested in littering my content with explicit format annotations.

For various reasons I continue to think that Markdown is not ideal for my usage cases, although my resistance is fading a bit. Any HTML to Markdown (or whatever) I do would probably be hand-tweaked for my specific HTML output, since I know a lot about it and what it's supposed to render into. With that said, a fully custom 'from my HTML to whatever' converter is an interesting general solution to the whole conversion problem if I ever want to do a full conversion, especially since there are a whole lot of HTML parsers out there that will turn well-structured HTML into some sort of AST or token stream; all I'd need is a backend that renders the result out in my own new markup.

Written on 14 September 2015.
« I should have started blocking web page elements well before now
A caution about cgo's error returns for errno »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Sep 14 01:37:09 2015
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.