File based engines and the awkward problem of special URLs

February 17, 2014

I was recently asked a good question on Twitter:

@thatcks Do you publish feed URLs on your blog besides '/blog/?atom'? My reader of choice sadly has issues re: dropping the GET param.

The answer is unfortunately not. So, you might reasonably wonder, why do syndication feeds here use a query parameter? The answer is that DWiki (the engine behind Wandering Thoughts) is a file and directory based engine and when you build such an engine, you wind up with a URL namespace problem.

Put simply, when you're simply presenting a view of a directory hierarchy the user de-facto owns the URL namespace. They create valid URLs by creating files and directories, and it's within their power and their right to create even ones with awkward names. If you add your own names to this namespace (for example a 'blog/atom' URL for the blog's Atom syndication feed) you're at risk of colliding with a name the user is creating in the directory hierarchy. Collisions are generally bad, especially collisions when you have not told the user what happens next.

I think that there are three main things you can do here. First, you can simply reserve some names in the namespace, ie you tell the user 'you can't create a file or directory called 'atom', that's a reserved name'. There are several versions of name reservation but I think that they're all unappetising for various reasons. Reserved names also give you problems if you want to add features that require new ones, since the user may already be using the name you want to take over.

(This is a familiar issue for programming languages; adding new reserved keywords for things like new syntax is fraught with peril and the possibility that old programs suddenly won't work with your new version because they're using what is now a reserved keyword as a variable name.)

The second and related approach is to fence off certain classes of names as invalid for the user and thus available for your program's synthetic URLs. This can work reasonably well if you create rules that match user expectations and have solid, appealing reasons. For example, DWiki won't serve files with names that start in '.' or end in '~', and so both categories of names are available for synthetic URLs. The drawback of this is that the resulting synthetic URLs generally look ugly; you would have 'blog/.atom' or 'blog/atom~'.

(DWiki uses a few such synthetic URLs but all for transient things, not for any URL that users will ever want to bookmark or pass around.)

The third approach is to take your program's synthetic URLs completely out of the user's namespace, such as by making them query parameters. Even if a user creates a file with a '?' in its name, it simply won't be represented as an URL with a query parameter; to be done right, the '?' will have to be %-encoded in the URL. This approach has two virtues. First, it's simple. Second, it can be applied to any regular URL whether or not it's a directory or a file, and it doesn't require turning a file into a pseudo-directory (eg going from 'blog/anentry' to 'blog/anentry/commentfeed', which raises the question of what 'blog/anentry/' should mean). DWiki takes this approach, and so syndication feeds and in fact all alternate views of directories or files are implemented as query parameters.

(From the right perspective, a syndication feed is just an alternate view of a directory hierarchy. Or at least that's my story and I'm sticking to it.)


Comments on this page:

By opk at 2014-02-17 12:21:05:

For what it's worth, my feed reader also has problems with the ? thing on your blog feed. My bug report to them has gone unanswered. I've ended up having to use a Firefox live bookmark.

By cks at 2014-02-18 00:05:06:

It strikes me that one possible option is to use Feedburner to create an alternate URL for the feed you're interested in. I think you can set up a Feedburner (re)feed for any random feed URL, not just your own blog.

(And if you have to be the blog owner and Feedburner checks, well, I can set up one.)

By bitprophet at 2014-02-21 14:42:15:

Ironically, I didn't even see this post until today, because I was the tweet author & thus have to come check the site manually :)

Thanks for the write-up, cks - thorough & well spoken as usual.

Totally forgot about FeedBurner; it looks like anybody can add any site, as I just made http://feeds.feedburner.com/thatcksblog and it appears to work, for now. ("cksblog" was sadly taken.)

The presumption in this article is that you have to treat Atom feeds specially in some way. They could just be statically generated and placed on the filesystem as a regular file. That's what ikiwiki does for example.

By cks at 2016-01-27 10:25:48:

In my view, the 'generated' bit is the problem. Whether the Atom feed is generated dynamically when requested or generated once then written to the filesystem, it's not being authored by the user and thus it's claiming a name that the user might want to use. That claim is somewhat more visible if the generation process results in a visible filename, that's all.

Written on 17 February 2014.
« Why comments aren't immediately visible on entries here
People can always unsubscribe from your mailing lists »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Feb 17 00:12:28 2014
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.