The anatomy of a DWiki bug
This starts with DWiki's processing model, which involves displaying
pages in a number of different formats, which are called
'views'. (More about this is in ProcessingModel.) Not all
views are valid for all pages (for example, the '
view isn't valid for a file without a version history).
If an URL doesn't explicitly mention a view, DWiki uses a
default. Files always default to the '
normal' view (which shows
their DWikiText as HTML), but directories can specify that they want
to default to something else, like the '
blog' directory view that
DWiki puts a toolbar at the bottom to give you access to the alternate views of the page that you're viewing. This raises a little issue for links that the page shows: what view of the target do they take you to? (For technical reasons this is mostly relevant for directory pages.)
I decided that the best answer was that non-default views should be modes, so links would show the target in the current view if the target could be displayed in it. This meant that if you visited a non-default view of a directory and then went into a subdirectory, you saw the subdirectory in the same way.
The actual problem:
This logic turns out to have a little problem, made visible through the following sequence:
- visit a directory with a non-
- switch to the
normalview of this directory.
- because this is not the default view, links to files are now made
with an explicit view-setting '
?normal' on the end of the URL.
This is bad for two reasons: it is redundant, and worse it makes these links look like they are new pages when in fact they may be pages you've already visited.
The latter is especially important for search engines crawling a DWiki site, since I want them to index the canonical URL for the page plus not wind up thinking that I have a lot of URLs with duplicate content. (I suspect that this causes search engines to dislike one's site, since one winds up looking like a search engine spammer. And even if it doesn't, it increases the total number of URLs in a DWiki that they have to crawl, slowing down the overall process.)
The fix is pretty simple: if we're generating a link to a page in an explicitly specified view but that view is already the page's default view, just leave out explicitly setting the view.
This has a little downstream problem; now if you go into a non-default view in a directory, go down into a subdirectory for which this is the default view, and then go down into a sub-sub-directory for which it is not, you will not still wind up in the same view. Instead you wind up in the sub-sub-directory's default view.
Fortunately this is down into the area of taste decisions, so I'm comfortable with this.
How I found the issue
I found this problem by looking at my server logs and noticing that
the MSN spider was crawling file pages in Software with URLs
that included an explicit '?
normal' specifier. This made me scratch
my head; while they were valid URLs, I was puzzled where the MSN
spider was getting them from.
Software is a directory that uses a non-
normal default view.
Suddenly a little light went on in my head as I thought about the path
from visiting Software, following its 'View Normal' link, and
perhaps seeing links to regular pages with an explicit
?normal in the
URLs. Following the path by hand showed me that DWiki was generating
such URLs; mystery solved.
An ironic side note
In the process of writing this blog entry, I just found another bug
(again one of logic). This one was an incorrectly parenthesized
if statement that caused DWiki to explode when I put
two or more '[[/Software/]]' links into this article. (It had to have
the trailing slash, but two or more links to a nonexistent page would
have done it too.)
Again I'm not at all sure if unit tests would have found this issue, because I'm not sure if I'd ever have thought to write a test for this specific case.