The sinking feeling of discovering a design mistake in your code

February 11, 2013

It is not a pleasant feeling to slowly realize that you've made a design mistake down in the depths of your program's core data structures and code flow. I am having that experience now, so this is a war story about how what looks like the obvious right choice can be anything but.

One of the major things that DWiki does is convert DWikiText into HTML. Back when I started coding DWiki, I decided that the fundamental result of this rendering process would be a blob of HTML. After all, what could be more logical? The job of the renderer was to turn DWikiText into HTML, so clearly its output was a blob of HTML.

In retrospect the wheels started coming off this particular wagon almost immediately; I just didn't pay attention to the warning signs. You see, the output of the DWikiText rendering process is not just a blob of HTML. It's also things like the title of the page (in several different formats), whether or not the page is cacheable, what directory modification times are important for caching the page, and whether the page specifies permissions settings. And the giant blob of HTML has important structure itself; for a start, part of it is the title and part of it is the rest of the text, and it would be nice to be able to get one without the other.

(Simplifying, if a page starts with any level of header the header is taken to be the title. But the header is also rendered as part of the HTML and can't be separated from it later. This is why the Atom feed here has always repeated the entry titles in the actual text of the entry; in retrospect that should have been a warning sign.)

All of these extra things that DWikiText rendering produced are sort of glued on the side of things in the current code in various somewhat hacky ways (which should have been a warning sign to me). What the core rendering process should produce is an actual data structure that represents all of these bits explicitly. Code that just wants the HTML blob would then generate it from appropriate bits of the data structure.

Why didn't I see this when I wrote the code? Because I started from the basic operation of 'turn DWikiText into HTML' and never lifted my head up to see the growing big picture. Every time I needed something else from the rendering process I basically got out a hammer and took another shot at the code, because that was the easy way (and towards the end, the thought of making major changes to the rendering process scared me).

Now I've reached the point where that doesn't work any more. To make the long delayed changes to DWiki that I want I'm going to have to totally rip apart and redo core rendering (complete with the instability that that implies, plus I'm going to have to understand the code again; it's been years (and yes, that's a bad sign too)).

(One benefit of the change will be better Atom feeds. Another will be better caching. Right now I'm sort of caching the generated HTML when what I should be caching is the underlying data structure that results from rendering.)

Sidebar: the insoluble problem that pushed me over the edge

When people look at a single entry I want there to be a little discreet entry date below the entry title (I've tried a version with the date above the title and I don't like it). However the entry title and the entry text are all part of the same HTML blob and are currently inseparable; I can't crack them apart to insert template logic for this the way I want to.

There are a pile of ugly hack workarounds, none of which I like. For example I could make something that chopped the title out of the HTML blob with a regular expression, but ew. I could also hack up the rendering process to directly insert HTML for the date (in various ways), but that's equally unclean and also has unpleasant interactions with the disk caches.

All of this would be simple if CSS allowed you to relocate <div>s in the page layout, but as far as I know this is impossible (short of manipulating the DOM with JavaScript). You can fix the position of a <div> in various ways but not say 'slice it out of here and put it right after that thing, then lay everything out normally'. All things considered I can't really blame CSS for that omission.


Comments on this page:

From 173.164.235.197 at 2013-02-11 23:58:28:

All of this would be simple if CSS allowed you to relocate <div>s in the page layout, but as far as I know this is impossible (short of manipulating the DOM with JavaScript).

I'm not 100% current on this sort of thing, but I think the new Shadow DOM stuff might permit this sort of templatized rearrangement without JavaScript (at the cost of requiring readers to use a very recent browser). Not seriously suggesting you use it; just throwing it out there.

-- Donald King

From 89.70.184.230 at 2013-08-03 11:57:25:

Chris, I want to say a big thank you for this entry.

I'm writing a wiki engine (yet another) and I have chosen caching syntax tree exactly because of your article. I'm lucky enough to have parser generated from grammar, so I do have AST available to work on. (Frankly, having an AST was the main reason to write my own markup language parser.)

I've already had an example why caching AST helps when I redefined syntax of one of the markup elements.

-- dozzie

Written on 11 February 2013.
« Thinking about how I use email
Some notes on Linux's ionice »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Feb 11 22:15:54 2013
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.