Please have stable ids for your feed entries

January 21, 2006

In both RSS and Atom syndication feeds, feed entries can have an identifier element (optional in the case of RSS, mandatory in Atom feeds). The entry ID is supposed to be permanent and stable, no matter what; things that process feeds use it to know what they've seen before and what they haven't.

This might seem like an unimportant picky thing, except that LiveJournal just inadvertently gave everyone reading Planet Debian a glaring example of why it's so important. (And Planet Debian is a default feed in liferea, so that may be a decent number of people.)

It goes like this:

  1. A number of the people aggregated at Planet Debian use LiveJournal.
  2. LiveJournal makes the RSS <guid> element the URL of the post, which includes the journal's URL. (Possibly they have to, if too many RSS readers assume that the <guid> is a URL.)
  3. Due to a security issue, LiveJournal recently changed the URL to everyone's journal.
  4. All the <guid> elements in people's entries promptly changed.

The result of all of this has been a flood of old posts washing over Planet Debian, bit by bit (LJ feeds only refresh when the user posts a new entry).

I'm sure this isn't deliberate; no one wanted this to happen. But it does make a handy demonstration of why changeable entry identifiers are a bad idea.

Unfortunately DWiki has this problem too, because its only concept of an object's identity is its path and thus its URL, which has caused occasional heartburn when I've been forced to rename entries. However, DWiki is operating under stricter constraints than most web sites; if you're storing any sort of metadata about pages or entries, you can also store some sort of permanent unique identifier.

(Heck, if you store entries in a database, you need a primary key anyways. Even if this is not easily representable in ASCII, it can be hashed down and ASCII-fied.)

Written on 21 January 2006.
« The limits of web spider tolerance
Weekly spam summary on January 21st, 2005 »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Jan 21 03:38:39 2006
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.