URLs are terrible permanent identifiers for things
I was recently reading the JSON Feed version 1 specification (via Trivium, among other places). I have a number of opinions on it as a syndication feed format, but that's not the subject of today's entry, because in the middle of the specification I ran into the following bit (which is specifically talking about the elements of feed entries, ie posts):
id(required, string) is unique for that item for that feed over time. [...] Ideally, the
idis the full URL of the resource described by the item, since URLs make great unique identifiers.
When I read this bit, I had an immediate pained reaction. As someone who has been running a blog for more than ten years and has made this exact mistake, let me assure you that URLs make terrible permanent unique identifiers for things. Yes, yes, cool URLs don't change, as the famous writeup says. Unfortunately in the real world, URLs change all of the time. One reason for this that is especially relevant right now is that URLs include the protocol, and right now the web is in the process of a major shift from HTTP to HTTPS. That shift just changed all your URLs.
(I think that over the next ten years the web will wind up being almost entirely HTTPS, even though much of it is not HTTPS today, so quite a lot of people will be going through this URL transition in the future.)
This is not the only case that may force your hand. And beyond more or less forced changes,
you may someday move your blog from one domain to another or change
the real URLs of all of your entries because you changed the blog
system that you use (both of which has happened). In theory you can
create a system to generate syndication feeds that deals with all
of that, by having a 'URL for id' field of some sort (perhaps
automatically derived from your configuration of URL redirections),
but if you're going to wind up detaching what you put in the
field from the actual canonical URL of the entry, why not make it
arbitrary in the first place? It will save you a bunch of pain to
do this from the start.
(Please trust me on this one, seeing as this general issue has caused me pain. As my example illustrates, using any part of the URL as part of your 'permanent identifier' is going to cause you heartburn sooner or later.)
There are excellent reasons why the Atom syndication format both explicitly allows for and more importantly encourages various forms of permanent identifiers for feed entries that are not URLs. For example, you can use UUIDs (as 'urn:uuid:<uuid>') or your own arbitrary but unique identifier in your own namespace (as tag: URNs). The Atom format does this because the people who created it had already run into various problems with the widespread use of URLs as theoretically permanent entry identifiers in RSS feeds.
Comments on this page:Written on 25 May 2017.