Tangled issues with what status we should use for our HTTP redirects

September 19, 2022

We have a general purpose web server, which includes user home pages. Historically, every so often people moved on but wanted their home pages to redirect to elsewhere, and we generally obliged, using various Apache mechanisms to set up HTTP redirections (most recently with Apache's RewriteMap). However, we haven't had any such new requests for years and years, which means that by now all of our existing such redirections are very old (and, naturally, not all of them still went to working destinations).

When we set up any HTTP redirection, we have historically tended to initially make them 'temporary' redirections (ie, HTTP status 302). Partly this is because it's usually the Apache default, and partly this is because we're concerned that we may have made a mistake (either in configuration or intentions) and historically permanent redirects could be cached in browsers, although I'm not sure how much that happens today. Our most recent version of redirections for people's old home pages were set up this way, and so they've stayed for four years.

Recently we had cause to look at how frequently these old redirections were still being used. To my surprise, a fair number of them were being used fairly often, and not just by search engines crawling them. Some of these uses may be from old URLs embedded in various places, but some of them seem to come from people following search engine links. I don't know for sure that search engines wouldn't be providing these links if we'd using permanent HTTP redirections, but it probably wouldn't hurt. So, more than four years after we set up things as temporary redirections just in case, we got around to making them permanent redirections. Quite possibly we should have left ourselves a note to do it sooner than that, once things were all proven and working.

Except, of course, there is a catch. Every so often we want to remove such a redirection (for example, because it's broken, or no longer desired), and then perhaps later the login name and thus the home page URL will be reused for another person. When that happens, we definitely don't want search engines (or browsers) to be convinced that '<us>/~user/' is permanently redirected to elsewhere, and to refuse to index or use the new, real, non-redirected version. If permanent HTTP redirections make this less likely, we should probably keep our redirections as temporary ones, even if this has other effects.

In part this is a conflict between the needs of the old and the new users of these URLs (or of any URLs). Permanent redirects may help the old users but hurt the new users, while temporary redirects may be the reverse. In theory this means that we should prioritize the needs of new users (who will be our current users) and use temporary redirects, but on the other hand the new users are generally only a theoretical future thing while the redirections for the old users exist now. I don't think I have any simple answers here.

(Let's take it as a given that the redirections will eventually go away and the URLs will eventually be reused. In some ideal worlds, URLs would be permanently claimed by and for their first use, but this is not the world we exist in in practice.)


Comments on this page:

In situations like this it would be easier to decide if there were a hit count. How many times has every redirect rule that you've put in the map been used over this period of time?

Same thing applies for rules in an access list of a firewall, you put something in there temporarily to fix something, and three years later you wonder if it's still being used.

By cks at 2022-09-20 22:41:03:

One of the complications with hit counts for the redirection rules is that you probably want to count search engines differently than actual people, but determining which is which is not straightforward in an automated analysis (in an ad-hoc on the spot check analysis you can eyeball the user agent strings and the IP address ranges).

In our environment there are also broader policy issues around when user home page redirections will and won't be preserved that probably trump usage counts in many situations. Hit counts will only influence things at the margins.

I’d suggest a five phased approach:

1) $USER1 exists -> 200
2) $USER1 moved -> 302 for one year
3) $USER1 moved more than a year ago -> 301 for one year
4) $USER1 moved more than two years ago -> 410 gone

5) $USER2 exists -> 200 as necessary

Where $USER1 and $USER are different people with the same username at different times.

Written on 19 September 2022.
« I believe SELinux needs active support from your distribution
Why the ZFS ZIL's "in-place" direct writes of large data are safe »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Sep 19 22:04:46 2022
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.