Cool URLs keep their contents (an obvious point that still matters)

July 1, 2023

It has been an article of faith for some time that "cool URLs don't change", as seen in (for example) this 1998 W3C article written by Tim Berners-Lee (you may have heard of him). Sadly, time has shown that cool URLs do often change, and sometimes they become inaccessible even when they don't change. But there is another part of this that I feel the need to say out loud: cool URLs keep their contents.

It's not enough for a 'cool URL' to still resolve and not give you a bad response. What we actually care about is not the URL, it's what we find at the URL, and it's that content that we want to still be there, to 'not change'. An old URL that now returns a HTTP 2xx response with 'there is nothing here' is 'unchanged' in some technical sense, but it has changed in a practical one. Similarly, generally we think of a cool URL as relatively unchanged even if it gives us a redirection to the new home of contents (especially if the redirection is merely from HTTP to HTTPS). What we care about is that we can get the same content by going to the same URL.

(You can argue about this for URLs that are only fetched by programs, but my view is that 'cool URLs don't change' is mostly about how it's seen by people. If you have an API or a fixed resource, you may have all sorts of requirements around redirections (to the same site or others), TLS versions for HTTPS, and so on.)

What is considered 'the content of a URL' is a situational thing, and not something that can be evaluated with hard and fast rules. The URL of a blog home page or an Atom syndication feed generally changes the textual contents on a regular basis, but pretty much everyone would say that the URL hasn't changed in the sense of the W3C article; you still get the same sort of thing by going to the same URL. A wiki page may be substantially overhauled and rewritten, but generally people will still consider it to have not changed in the cool URL sense.

(Although syndication feed URLs are often fetched by programs, my view is that 'change' in the URLs has to be viewed through the lens of people and browsers. A syndication feed reader needs to be able to cope with things like HTTP redirections, especially from HTTP to HTTPs.)

Unfortunately this means that some websites we'd like to have cool URLs don't always keep them that way. One unfortunate example is Wikipedia. For fairly rational reasons, Wikipedia keeps tinkering with the names (and thus the URLs) of entries, as well as their internal HTML anchors (which have URLs associated with them). Over the long term, links to Wikipedia entries aren't necessarily stable.

(A lot of Wikipedia entries do have stable URLs in practice, but not all of them. Links to anchors in Wikipedia entries seem to be less stable, at least for the sort of them that I use here on Wandering Thoughts.)

(This elaborates a bit on a Fediverse post.)


Comments on this page:

By sam at 2023-07-02 05:12:26:

FWIW, probably the most principled way to get a stable URL for a Wikipedia article would be to go through Wikidata, which does provide a stable URL for the article's concept. Renamings are handled transparently by this model, but it gets more complicated if articles are split or merged (the 'Bonnie and Clyde problem' in the lingo, as different language editions of Wikipedia disagree about whether there should be a single article for the pair of them, as the English one does, or two articles for each of them, as the Dutch one does), which might require traversing the linked data graph to find the 'best' encyclopaedia article for the topic. Subtle differences in concept definitions or scope are even more awkward, but thankfully they're often subtle enough to just be glossed over.

This is assuming that the point is to link to Wikipedia's current understanding of the topic, with all the attendant mutability. If the point really is to link to a specific version of an article for whatever reason (e.g., because it was amusingly wrong, or to examine how it changed over time), linking to a specific revision (from the 'View history' tab at the top of the article) is the way to go.

Written on 01 July 2023.
« In practice, cool URLs can become inaccessible even if they don't change
The evolving Unix attitudes on handling signals in your code »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Jul 1 21:58:26 2023
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.