Wandering Thoughts archives

2005-08-13

The anatomy of a DWiki bug

Yesterday's entry mentioned in passing fixing a bug in DWiki. Today's entry is about the anatomy of that bug (and how I found it).

This starts with DWiki's processing model, which involves displaying pages in a number of different formats, which are called 'views'. (More about this is in ProcessingModel.) Not all views are valid for all pages (for example, the 'history' view isn't valid for a file without a version history).

If an URL doesn't explicitly mention a view, DWiki uses a default. Files always default to the 'normal' view (which shows their DWikiText as HTML), but directories can specify that they want to default to something else, like the 'blog' directory view that creates WanderingThoughts.

DWiki puts a toolbar at the bottom to give you access to the alternate views of the page that you're viewing. This raises a little issue for links that the page shows: what view of the target do they take you to? (For technical reasons this is mostly relevant for directory pages.)

I decided that the best answer was that non-default views should be modes, so links would show the target in the current view if the target could be displayed in it. This meant that if you visited a non-default view of a directory and then went into a subdirectory, you saw the subdirectory in the same way.

The actual problem:

This logic turns out to have a little problem, made visible through the following sequence:

  • visit a directory with a non-normal default view.
  • switch to the normal view of this directory.
  • because this is not the default view, links to files are now made with an explicit view-setting '?normal' on the end of the URL.

This is bad for two reasons: it is redundant, and worse it makes these links look like they are new pages when in fact they may be pages you've already visited.

The latter is especially important for search engines crawling a DWiki site, since I want them to index the canonical URL for the page plus not wind up thinking that I have a lot of URLs with duplicate content. (I suspect that this causes search engines to dislike one's site, since one winds up looking like a search engine spammer. And even if it doesn't, it increases the total number of URLs in a DWiki that they have to crawl, slowing down the overall process.)

The fix

The fix is pretty simple: if we're generating a link to a page in an explicitly specified view but that view is already the page's default view, just leave out explicitly setting the view.

This has a little downstream problem; now if you go into a non-default view in a directory, go down into a subdirectory for which this is the default view, and then go down into a sub-sub-directory for which it is not, you will not still wind up in the same view. Instead you wind up in the sub-sub-directory's default view.

Fortunately this is down into the area of taste decisions, so I'm comfortable with this.

How I found the issue

I found this problem by looking at my server logs and noticing that the MSN spider was crawling file pages in Software with URLs that included an explicit '?normal' specifier. This made me scratch my head; while they were valid URLs, I was puzzled where the MSN spider was getting them from.

Software is a directory that uses a non-normal default view. Suddenly a little light went on in my head as I thought about the path from visiting Software, following its 'View Normal' link, and perhaps seeing links to regular pages with an explicit ?normal in the URLs. Following the path by hand showed me that DWiki was generating such URLs; mystery solved.

An ironic side note

In the process of writing this blog entry, I just found another bug (again one of logic). This one was an incorrectly parenthesized multi-clause if statement that caused DWiki to explode when I put two or more '[[/Software/]]' links into this article. (It had to have the trailing slash, but two or more links to a nonexistent page would have done it too.)

Again I'm not at all sure if unit tests would have found this issue, because I'm not sure if I'd ever have thought to write a test for this specific case.

DWikiBugAnatomy written at 01:58:05; Add Comment

2005-08-12

Chiming in on static versus dynamic typing

There's a large debate about static versus dynamic typing in the programming and language communities. (There is a classic Bruce Eckel article on this here, for example.)

I spent part of today fixing some 'issues' in DWiki, which is written in Python. This gave me the opportunity to reflect about the sort of bugs I've been finding in it. Python is dynamically typed, and I don't have any sort of formal test system for DWiki. In theory, this should lead to disaster.

In practice, type bugs in DWiki have been rare and found rapidly. Most bugs are like the ones I spent yesterday on: things where what was wrong was not my syntax or my programming, but my logic. I hadn't done something wrong, like passing a variable of the wrong type to a function; I had failed to think everything through correctly, and the code was quite faithfully carrying out my flawed logic, with equally flawed results.

(Indeed, just now I discovered that my revised logic for yesterday's fix was still a little bit wrong.)

This isn't too surprising. In a strongly typed language like Python, a type mistake is likely to cause an immediate error of some sort: either an exception or a clearly and majorly wrong result. If you test a code path at all, even informally by seeing if a given feature works, you're probably going to see those problems immediately.

As for their rarity, I think type errors are rare even in statically typed languages. Most of the time we get the obvious programming stuff right regardless of the language we're using; this includes types just as much as it does language syntax and the number of arguments a function takes. (If we routinely got static types wrong no one would put up with manual type declarations; it would be too much work. Even as it is, automating them is one popular IDE function.)

(I'm not even sure unit tests would have caught this particular issue, because I'm not sure it would have occurred to me to unit test the particular set of circumstances that showed the erroneous logic. I spotted the issue only by being puzzled by something in the request logs for CSpace and then working backwards to how MSN's spider must have obtained the particular URLs it was crawling.)

Sidebar: A brief summary of the positions

To brutally summarize the arguments between the two camps, static typing people believe it finds bugs and are horrified by the thought of latent type incompatibility bugs lurking unfound in programs in dynamically typed languages; how are you sure that all of your routines get the right type of argument every time? Dynamic type people retort that in practice this doesn't happen, that you develop much faster and refactor much easier without static typing getting in the way, and recite 'if you didn't test it, it doesn't work' a lot.

The Bruce Eckel article has a longer discussion of this.

StaticVsDynamicTyping written at 03:15:52; Add Comment

2005-08-11

Some interesting software tools (part 1)

Here's some interesting bits and pieces for software developers:

  • bstring, a C library for safe and easy string manipulation in the style of Python. Every so often I need to mangle strings faster than Python or Perl can manage it; at those times, I turn to the bstring library. (I have a longer writeup on bstring here.)

  • quilt, a package that allows one to manage a series of patches by keeping track of the changes each patch makes. Patches can be applied, un-applied, refreshed, etc.

quilt is especially handy for adding modifications to existing RPM packages (which are already a base program plus a set of patches). It will even take the RPM specfile and build a quilt working environment for you, which is really handy for adding quick modifications to an RPM.

SoftwarePointersI written at 01:57:38; Add Comment

2005-08-06

The importance of 'transparency' in data structures

The other day's entry on why Perl is not my favorite language produced an interesting comment exchange between me and a friend that touched on how transparently Perl can embed data structures in other data structures. (In specific, that one can't transparently put aggregates, lists and hashes, straight into and out of other aggregates.)

Why does this matter? Because every time you have a lack of transparency with data structures, you have to be aware of the actual types that you are working with. Being aware of the actual types complicates changing the program later, and complicates some design patterns always.

For example, the Python design pattern to introduce a lookup cache is pretty simple and quite general:

fooCache = {}
def cachedFoo(key):
  if key not in fooCache:
    fooCache[key] = getFoo(key)
  return fooCache[key]

This works for almost anything. (Honesty compels me to admit that it doesn't work for iterators; for why, see GeneratorGotchas. There are also some key types that won't be accepted, because if you work at it you can create something that isn't acceptable as a hash key.)

The equivalent pattern in Perl has to either use explicit references or know that getFoo() returns a scalar.

This also complicates changing what getFoo() returns, even if it's supposed to be used as a relatively opaque token. If the new type isn't storage-compatible with the old one, you wind up having problems. (Python is not immune from this havoc, since getFoo() could change to returning a non-hashable or non-reusable object.)

(Part of this is my computer science background being neurotic, because it is dissatisfied by irregular things and exceptions.)

TransparencyImportance written at 01:48:56; Add Comment

2005-08-03

Why Perl is not my favorite language

I could talk about aesthetics; I could talk about line noise and readability; but for me it ultimately comes down to one thing: data structures.

Namely, I find it very hard to write perl code of any depth without running headlong into the fact that Perl's 'collection' types (arrays and hashes) can't contain themselves. All they can contain is scalars (strings, numbers, and references).

For example, recently I was writing a program to mass-query the SBL DNS blocklist and wanted to stick a caching layer in so that I'd only look up a given IP address once. The natural way to do that is with a hash, indexed by the IP address. Except that an IP address can be in more than one SBL record, so the natural representation for the result is an array, which can't be put in the hash.

Perl fans will retort that I can just use references. I could write a long answer, but I'll just go with the short one: 'if I wanted to use pointers, I'd write C'.

That's what references are: sticky pointers. And like all forms of pointers, they're an implementation detail. Low level languages like C are all about the implementation details, but I dislike high-level languages that make me think about them very often. (Implementation details are a distraction; every bit of effort you have to spend on them is effort you are not spending on the real problem.)

I could write more than small things in Perl. It's not that the language is incapable. It's just that it's annoyingly distracting, and I'd generally rather not bother if I have the choice.

PerlNonFavorite written at 01:39:22; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.