Wandering Thoughts archives


File based engines and the awkward problem of special URLs

I was recently asked a good question on Twitter:

@thatcks Do you publish feed URLs on your blog besides '/blog/?atom'? My reader of choice sadly has issues re: dropping the GET param.

The answer is unfortunately not. So, you might reasonably wonder, why do syndication feeds here use a query parameter? The answer is that DWiki (the engine behind Wandering Thoughts) is a file and directory based engine and when you build such an engine, you wind up with a URL namespace problem.

Put simply, when you're simply presenting a view of a directory hierarchy the user de-facto owns the URL namespace. They create valid URLs by creating files and directories, and it's within their power and their right to create even ones with awkward names. If you add your own names to this namespace (for example a 'blog/atom' URL for the blog's Atom syndication feed) you're at risk of colliding with a name the user is creating in the directory hierarchy. Collisions are generally bad, especially collisions when you have not told the user what happens next.

I think that there are three main things you can do here. First, you can simply reserve some names in the namespace, ie you tell the user 'you can't create a file or directory called 'atom', that's a reserved name'. There are several versions of name reservation but I think that they're all unappetising for various reasons. Reserved names also give you problems if you want to add features that require new ones, since the user may already be using the name you want to take over.

(This is a familiar issue for programming languages; adding new reserved keywords for things like new syntax is fraught with peril and the possibility that old programs suddenly won't work with your new version because they're using what is now a reserved keyword as a variable name.)

The second and related approach is to fence off certain classes of names as invalid for the user and thus available for your program's synthetic URLs. This can work reasonably well if you create rules that match user expectations and have solid, appealing reasons. For example, DWiki won't serve files with names that start in '.' or end in '~', and so both categories of names are available for synthetic URLs. The drawback of this is that the resulting synthetic URLs generally look ugly; you would have 'blog/.atom' or 'blog/atom~'.

(DWiki uses a few such synthetic URLs but all for transient things, not for any URL that users will ever want to bookmark or pass around.)

The third approach is to take your program's synthetic URLs completely out of the user's namespace, such as by making them query parameters. Even if a user creates a file with a '?' in its name, it simply won't be represented as an URL with a query parameter; to be done right, the '?' will have to be %-encoded in the URL. This approach has two virtues. First, it's simple. Second, it can be applied to any regular URL whether or not it's a directory or a file, and it doesn't require turning a file into a pseudo-directory (eg going from 'blog/anentry' to 'blog/anentry/commentfeed', which raises the question of what 'blog/anentry/' should mean). DWiki takes this approach, and so syndication feeds and in fact all alternate views of directories or files are implemented as query parameters.

(From the right perspective, a syndication feed is just an alternate view of a directory hierarchy. Or at least that's my story and I'm sticking to it.)

web/FileBasedUrlConstraints written at 00:12:28; Add Comment

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.