Wandering Thoughts archives

2009-05-20

Why directory URLs have to have trailing slashes

Web servers and many web applications are quite insistent that URLs that represent 'directories' have to end in a slash; if you request the URL without the trailing slash, they just give you a redirection to the same URL with a slash on the end instead of the actual content. One might well wonder why they have this neurotic insistence, especially when it complicates URL rewriting for applications that can do whatever they want with incoming URLs anyways.

(The other side of this question is whether web applications have to care about this, and if they do, why.)

The answer is that it's required in order to correctly resolve any relative URLs in the document. Consider a 'directory' page /a/b that contains the relative link <a href="c.html">. If the URL of the page the browser is dealing with doesn't have the slash (if it is just /a/b), the browser will make the relative link point to /a/c.html, but if the URL has a slash on the end (if it is /a/b/), the same link will point to /a/b/c.html. Presumably only one of these is correct and intended.

(The official source for this whole process is RFC 2396, updating the original RFC 1808.)

Whether this matters to your web application depends on what sort of links you generate. If you always use absolute paths, then you don't need to care; you can ignore the situation and give people the same contents regardless of the presence or absence of the trailing slash. If you do use relative links, then you need to notice the situation and either force the redirection or generate slightly different content.

(I would suggest forcing the redirection on the grounds that it is less confusing to both Google and users; otherwise you have two URLs that are the same thing.)

(This is one of those entries that I write to tack things down firmly in my mind, after a co-worker had to remind me of all of this.)

DirectoryTrailingSlashes written at 01:55:07; Add Comment

2009-05-19

Some notes on rewrites in Apache .htaccess files

Since I keep rediscovering this every so often, here's what I know about rewrite rules in .htaccess files so that I can just read it here the next time around.

Some basics:

  • you need a 'RewriteEngine on' statement, even if the rewrite engine is already on in the main configuration.

  • the 'URLs' that you match against in RewriteRule are relative to the directory the .htaccess is in. However, Apache variables like %{REQUEST_FILENAME} that you use in RewriteCond are the full real URLs, not URLs relative to the directory. This makes sense, but does mean one has to keep track of it all.

Suppose that you want to have a 'directory' that is actually a CGI-BIN. There are two ways to do this:

  • make an actual directory, and put a .htaccess in it that has:
    RewriteRule ^(.*)$ /cgis/my-cgi/$1 [PT]

    Apache itself will then handle generating a redirect for people who ask for the directory without the trailing slash; your CGI-BIN does not have to worry about it.

  • put a .htacces in the directory that is one level up. This should have something like:
    RewriteRule ^foo$ /cgis/my-cgi [PT]
    RewriteRule ^foo/(.*)$ /cgis/my-cgi/$1 [PT]

    Your CGI will have to generate the redirect when people ask for the directory without the trailing slash (or, well, do whatever you want with their requests); Apache won't do anything special for you.

It is common to implement the latter approach with a single rewrite rule:

RewriteRule ^foo(.*)$ /cgis/my-cgi/$1 [PT]

However, this is incorrect because it matches too much; it will send any URL in that directory that starts with foo off to your CGI-BIN, including things like a request for 'foobar'.

(You may not care about this. I do, partly because I don't like handing my CGIs URLs that they're not actually supposed to be handling.)

PS: the very similar looking destination '/cgis/my-cgi$1' is very much not what you want; in fact, I believe that it's a security risk, as I think it means that Apache can be tricked into running things like '/cgis/my-cgi.old' with a suitable request.

HtaccessRewrites written at 00:37:56; Add Comment

By day for May 2009: 19 20; before May; after May.

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.