Why directory URLs have to have trailing slashes

May 20, 2009

Web servers and many web applications are quite insistent that URLs that represent 'directories' have to end in a slash; if you request the URL without the trailing slash, they just give you a redirection to the same URL with a slash on the end instead of the actual content. One might well wonder why they have this neurotic insistence, especially when it complicates URL rewriting for applications that can do whatever they want with incoming URLs anyways.

(The other side of this question is whether web applications have to care about this, and if they do, why.)

The answer is that it's required in order to correctly resolve any relative URLs in the document. Consider a 'directory' page /a/b that contains the relative link <a href="c.html">. If the URL of the page the browser is dealing with doesn't have the slash (if it is just /a/b), the browser will make the relative link point to /a/c.html, but if the URL has a slash on the end (if it is /a/b/), the same link will point to /a/b/c.html. Presumably only one of these is correct and intended.

(The official source for this whole process is RFC 2396, updating the original RFC 1808.)

Whether this matters to your web application depends on what sort of links you generate. If you always use absolute paths, then you don't need to care; you can ignore the situation and give people the same contents regardless of the presence or absence of the trailing slash. If you do use relative links, then you need to notice the situation and either force the redirection or generate slightly different content.

(I would suggest forcing the redirection on the grounds that it is less confusing to both Google and users; otherwise you have two URLs that are the same thing.)

(This is one of those entries that I write to tack things down firmly in my mind, after a co-worker had to remind me of all of this.)


Comments on this page:

By Dan.Astoorian at 2009-05-20 10:00:18:

Alternatively, one may provide a <BASE> tag specifying the base URI for resolving relative URIs.

The downside of this is that it means the server needs to be able to correctly infer the absolute URL from the request and compute the correct base (or, at a minimum, provide a canonical absolute URL instead).

--Dan

By cks at 2009-05-20 10:28:58:

I've got the impression that enough browsers and web servers have enough bugs to make all of the various methods of supplying a base URL for a page not really good ideas in practice.

(For example, browsers apparently have to ignore the Content-Location: HTTP header because some popular versions of IIS put bad things there in various circumstances.)

Written on 20 May 2009.
« Some notes on rewrites in Apache .htaccess files
Solving the Python SIGCHLD problem »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed May 20 01:55:07 2009
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.