2017-05-26
Why globally unique IDs are useful for syndication feed entries
Pretty much every syndication feed format ever has some sort of 'id' field for syndication feed entries (ie, the posts and so on) that is supposed to be unique for every entry in the feed and never repeat. Since basically everything else about a feed entry can change (the title, the text, and yes even the URL), this unique ID is used by feed readers and other consumers of the syndication feed in order to tell the difference between new entries and ones that have merely been updated. Often it is also used to find unchanged entries as well, rather than forcing a feed consumer to carefully compare all of the other fields against all of the current entries it knows about (either directly or by using them to derive a hash identity for every entry).
When used for this purpose alone it's sufficient for the ID field to be merely unique within this particular syndication feed, but potentially duplicated in other syndication feeds; in other worlds, you don't strictly speaking need an ID that is globally unique. However, this is thinking too small. In practice it's very useful to be able to recognize the same entry appearing across multiple feeds and to do things specially as a result of it. One obvious action is for a feed reader to mark such a cross-posted feed entry as 'read' in all feeds it appears in when you read it in one, so that you don't have to read and re-read and re-re-read it as it reappears repeatedly, but there are other tricks that can be done here as well.
You might wonder how feed entries can ever be cross-posted. There are two common cases: aggregation sites and feeds such as Planet Debian, and subset feeds from a site where a single entry appears in multiple ones (for example, if there are category feeds and a single entry is in several categories). The aggregation sites case definitely happens in the field and is even not uncommon if you follow several Planets in the same general field (Planet Debian and Kernel Planet, for example; there are a number of people included in both).
The usefulness of truly globally unique feed identifies is why the
Atom syndication format goes
out of its way to specify the atom:id
element such that
properly constructed Atom IDs will be globally unique, not merely
locally unique within a single feed. That the standard explicitly
says that they must be IRIs
is neither an accident nor Atom being overly picky; properly formed
IRIs are globally unique.
This is also why it makes me sad that the JSON Feed version 1
specification does not talk at all
about making your items.id
values globally unique. By failing to
even request that people generate globally unique IDs, the JSON
Feed people are ignoring a significant body of practical syndication
feed experience.
(Sure, their suggestion of URLs will result in globally unique IDs, but URLs have problems as permanent IDs and there is no guidance to people who immediately see the problems with using URLs here and want to do something better. With no guidance, people will be tempted to do things like use the database primary key as the ID, or maybe generate a random and must-be-unique GUID for each entry. Things like global uniqueness of IDs are too important to be left as implicit side effects of one suggested implementation strategy.)