Web page generation systems should support remapping external URLs

October 4, 2020

Some web pages and web sites are hand authored, but many more are generated (dynamically or statically) through web page generation systems and content management systems of various sorts. Also, often our writing in these systems has links to external pages; to other people's writing, to reference documentation, to Wikipedia, to whatever. This presents us (the people running web sites and writing on them) a long term problem, because in practice some or many of those external URLs will eventually change.

Today, we don't have good support in our page generation systems for this unfortunate reality of web life. If you find out that a an external URL you reference has moved, you generally have to hunt around through all of your content and update it, either completely manually or at best semi-automatically. The unsurprising result of this is that people often don't, even when they know old links have changed; it's simply too much work to go back through everything and fix it all up.

So here's an idea: all of our web page generation systems should support a remapping file (or data source) for external URLs, which would list the old URL and its new replacement. A fancier version could also have site matching, prefix matching or general pattern matching. When you're generating a page and the page has a link pointing to such an old URL, it would automatically get replaced with the new URL. The obvious advantage of this remapping system is that it's less work; the subtle one is that it's automatically universal, with you not having to hunt down every last obscure corner of the site where the URL is mentioned.

(In some systems it would make sense to automatically edit this change into the source data; generally I think those are systems where the source data is already held in a database by the web generation system and is not edited by people by hand.)

One additional advantage of doing this in the web page generation system instead of in external tools is that the web page generator generally has the best idea if what it's really dealing with is a link target, instead of some other text that happens to mention or include the URL. You probably don't want to rewrite mentions of old URLs in plain text, for example, especially not automatically.

PS: This remapping should be applied repeatedly, because replacement URLs can themselves get replaced. Yes, sure, theoretically people could go through and update the original mappings again, but let's make it easy and as foolproof as possible. Since link rot is going to happen, we should make it easy to deal with.

(This idea was sparked by Aristotle Pagaltzis linking to a web.archive.org copy of a diveintomark.org entry in a comment on this entry, causing me to realize that I had entries with direct links to diveintomark that needed to be updated to web.archive.org. This shows both how long it can take me to write some Wandering Thoughts entries and how I still haven't gotten around to finding and editing all of those entries (or implementing a remapping file here).)

Written on 04 October 2020.
« Link: Old-School Disk Partitions
Linux distributions have sensible reasons to prefer periodic releases »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Oct 4 23:35:20 2020
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.