Redirecting paths that start with two slashes in Apache

July 8, 2021

Suppose that you have a dynamic web application of some sort that sits behind Apache, and an URL for one of your application's pages where the path starts with two slashes instead of one starts going around. In other words, people are sharing 'https://example.org//app/page', instead of the version with one slash at the start of the path. You can support this in your web app (with some potential cautions), but you would prefer to have Apache redirect the non-ideal URL to the canonical URL.

Under normal circumstances, this sort of selective redirection should be straightforward using mod_rewrite. If you wanted to rewrite only a single specific bad path instead of all potential ones, I'd expect something like the following to work:

RewriteCond %{REQUEST_URI} "=//app/page"
RewriteRule ^.* https://example.org/app/page

(This is one of the cases where exact string matching in RewriteCond is a useful thing.)

However, this appears not to work in at least a .htaccess file, and I suspect it won't work in the Apache configuration file either. Although Apache's %{REQUEST_URI} gives you the full URL path even in a .htaccess, it appears that Apache canonicalizes it for the purposes of things like RewriteCond and so turns the two leading slashes into one. This canonicalization isn't passed through to CGIs, though; they will see the original "//app/page" version.

(This canonicalization appears to apply to any / in the URL path, not just the first ones. If you write a condition for "/app/dir/page", it will match for URLs with any amount of additional slashes, eg "//app////dir///page" will match.)

Instead, the only way I found to do this was with Apache's special %{THE_REQUEST} variable for the full HTTP request line. As the mod_rewrite documentation covers, this value has not been escaped, which may cause you heartburn if people get clever or you want to do general matching. So the rule I wound up with looks like:

RewriteCond %{THE_REQUEST} "^GET //app/page "
RewriteRule ^.* https://example.org/app/page

This match is very specific, since we're doing a very specific HTTP redirection. You'd want to be more complicated if you need to handle query variables, for example. But it works, unlike the other option.

Possibly I'm missing a clever trick that enables a better version of this. I don't really like matching things so specifically, but it seems to be what you have to reach for in this unusual situation.


Comments on this page:

You can limit the amount of escaping you have to account for by not trying to do all of the matching on the RewriteCond line.

RewriteCond %{THE_REQUEST} "^[^?#]*(/|%2[Ff]){2,}"
RewriteRule ^/(app)/+(page)$ /$1/$2 [R=301,L]

This RewriteCond narrowly checks for a double slash, encoded any which way, present anywhere on the request line, as long as it’s before the query string or fragment part. The actual path matching is left to the RewriteRule, where Apache’s normalization actually works to your advantage. This version of the rule also takes advantage of the fact that mod_rewrite’s R modifier will turn convert a relative URL into an absolute one for you based on the request URL.

Written on 08 July 2021.
« A semi-surprise with Python's urllib.parse and partial URLs
University computer accounts are often surprisingly complicated »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Jul 8 23:57:15 2021
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.