The anatomy of a DWiki bug
August 13, 2005
This starts with DWiki's processing model, which involves displaying
pages in a number of different formats, which are called
'views'. (More about this is in ProcessingModel.) Not all
views are valid for all pages (for example, the '
If an URL doesn't explicitly mention a view, DWiki uses a
default. Files always default to the '
DWiki puts a toolbar at the bottom to give you access to the alternate views of the page that you're viewing. This raises a little issue for links that the page shows: what view of the target do they take you to? (For technical reasons this is mostly relevant for directory pages.)
I decided that the best answer was that non-default views should be modes, so links would show the target in the current view if the target could be displayed in it. This meant that if you visited a non-default view of a directory and then went into a subdirectory, you saw the subdirectory in the same way.
The actual problem:
This logic turns out to have a little problem, made visible through the following sequence:
This is bad for two reasons: it is redundant, and worse it makes these links look like they are new pages when in fact they may be pages you've already visited.
The latter is especially important for search engines crawling a DWiki site, since I want them to index the canonical URL for the page plus not wind up thinking that I have a lot of URLs with duplicate content. (I suspect that this causes search engines to dislike one's site, since one winds up looking like a search engine spammer. And even if it doesn't, it increases the total number of URLs in a DWiki that they have to crawl, slowing down the overall process.)
The fix is pretty simple: if we're generating a link to a page in an explicitly specified view but that view is already the page's default view, just leave out explicitly setting the view.
This has a little downstream problem; now if you go into a non-default view in a directory, go down into a subdirectory for which this is the default view, and then go down into a sub-sub-directory for which it is not, you will not still wind up in the same view. Instead you wind up in the sub-sub-directory's default view.
Fortunately this is down into the area of taste decisions, so I'm comfortable with this.
How I found the issue
I found this problem by looking at my server logs and noticing that
the MSN spider was crawling file pages in Software with URLs
that included an explicit '?
Software is a directory that uses a non-
An ironic side note
In the process of writing this blog entry, I just found another bug
(again one of logic). This one was an incorrectly parenthesized
Again I'm not at all sure if unit tests would have found this issue, because I'm not sure if I'd ever have thought to write a test for this specific case.
Written on 13 August 2005.
* * *
Atom feeds are available; see the bottom of most pages.