How well do some Atom feed fetchers do conditional GETs?

November 2, 2005

'Conditional GET' is the HTTP technique used to save bandwidth by not re-fetching unchanged pages. Using conditional GET is especially important for things that fetch syndication feeds (RSS or Atom), because people usually check feeds much more often than they revisit web pages. (This is another good reference for syndication feed reader authors.)

WanderingThoughts has a lot of syndication feeds and the main ones are quite big. Recently, partly prompted by issues with MSNbot, I decided to take a look at what was fetching my syndication feeds and how well they did conditional GET. So I looked at my data for about the past week, chosen in part because I recently added detailed logging about what conditional GET related headers get sent by things fetching my Atom feeds.

(First, I have to say that I like having readers and we have a lot of spare bandwidth. If your syndication feed reader does badly here, it is absolutely not a request for you to unsubscribe.)

Conditional GET can be done with ETag / If-None-Match, or with If-Modified-Since; ETag is better. Perfect scores go to the feed fetchers that always use it: SharpReader, Bloglines, LiveJournal, Feedster Crawler, and NetNewsWire.

A few feed fetchers lose some points from the East German judge:

  • liferea lost out on a perfect score because while it always uses If-Modified-Since, it only sometimes uses If-None-Match (only if it's fetched a changed feed since the program was started; it doesn't store the ETag value in its persistent database).
  • Yahoo Slurp and PubSub-RSS-Reader only use If-Modified-Since, which works but is not ideal.

The 'nice try, but...' award goes to:

  • Rojo 1.0, who support ETag but unfortunately make up their own timestamps for If-Modified-Since, and send both headers. This doesn't work, for reasons explained here and here.
  • BlogSearch, which sends If-None-Match but stripped of the quotes that DWiki's ETag value has. (This may be RFC-compliant, in which case I need to fix DWiki.)

A number of syndication feed fetchers don't support conditional GET; they don't even bother to send If-Modified-Since headers, and always wind up re-fetching my syndication feeds (when they fetch the main one, this is 300K or so a shot). They are:

  • everyone's friend MSNbot, who is by far the most active fetcher of my Atom feeds.
  • 'madicon RSS Reader', which appears to be a syndication feed reader addon for Lotus Notes. Working in the Notes environment may make it difficult to store the per-feed information necessary to support conditional GET.
  • 'Waggr_Fetcher)', http://www.waggr.com/; this appears to be a web-based feed reader.
  • kinjabot, another web-based aggregator thing.
  • FeedFetcher-Google and 'Googlebot/2.1' (fetching as a browser); these surprised me, because I expected Google to do better.
  • BlogPulse, although to be fair it only visited three times in the last week. (It's an interesting blog search engine; I wish it indexed WanderingThoughts more. Unfortunately they want an email address to submit blog URLs, which is an immediate turnoff these days.)

Comments on this page:

From 70.231.194.106 at 2005-11-02 11:34:27:

Perhaps google chooses to err on the side of always-getting-everything in the interests of guaranteeing they're as up-to-date as possible even in the face of broken sources, or something. Stupid, but possibly sensible for "business reasons"?

--nothings

From 192.88.60.254 at 2005-11-02 14:49:18:

You say:

BlogSearch, which sends If-None-Match but stripped of the quotes that DWiki's ETag value has. (This may be RFC-compliant, in which case I need to fix DWiki.)

It's not RFC compliant. In fact, it's pretty explicitly not. Shame on them. Quoting from the HTTP/1.1 rfc, section 14.26:

   If-None-Match = "If-None-Match" ":" ( "*" | 1#entity-tag )

But section 3.11 of the same rfc says:

   entity-tag = [ weak ] opaque-tag
   weak       = "W/"
   opaque-tag = quoted-string

And quoted-string means what you think it does. Section 14.26 concludes with some sample valid If-None-Match headers:

   If-None-Match: "xyzzy"
   If-None-Match: W/"xyzzy"
   If-None-Match: "xyzzy", "r2d2xxxx", "c3piozzzz"
   If-None-Match: W/"xyzzy", W/"r2d2xxxx", W/"c3piozzzz"
   If-None-Match: *

Though the semantics of that last one make it a silly thing to send, unless accompanied by an If-Modified-Since header.

Note also that there is no relationship between a tag without W/ at the front and the same string with W/ at the front. The two types of tags are completely separate namespaces.

By cks at 2005-11-02 16:14:29:

Google's FeedFetcher is explicitly not for Google's web searching; it's for Google Homepages and Google Reader. Given that it's a (web-based) feed reader, I think that this means that it should behave like a good feed reader and therefor use conditional GET.

Written on 02 November 2005.
« Another tip: Label your hard drives
Fun with upgrading our backup server »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Nov 2 02:16:15 2005
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.