2005-11-02
How well do some Atom feed fetchers do conditional GETs?
'Conditional GET' is the HTTP technique used to save bandwidth by not re-fetching unchanged pages. Using conditional GET is especially important for things that fetch syndication feeds (RSS or Atom), because people usually check feeds much more often than they revisit web pages. (This is another good reference for syndication feed reader authors.)
WanderingThoughts has a lot of syndication feeds and the main ones are quite big. Recently, partly prompted by issues with MSNbot, I decided to take a look at what was fetching my syndication feeds and how well they did conditional GET. So I looked at my data for about the past week, chosen in part because I recently added detailed logging about what conditional GET related headers get sent by things fetching my Atom feeds.
(First, I have to say that I like having readers and we have a lot of spare bandwidth. If your syndication feed reader does badly here, it is absolutely not a request for you to unsubscribe.)
Conditional GET can be done with ETag
/ If-None-Match
, or with
If-Modified-Since
; ETag
is better. Perfect scores go to the feed
fetchers that always use it:
SharpReader,
Bloglines,
LiveJournal,
Feedster Crawler,
and
NetNewsWire.
A few feed fetchers lose some points from the East German judge:
- liferea lost out on a perfect score
because while it always uses
If-Modified-Since
, it only sometimes usesIf-None-Match
(only if it's fetched a changed feed since the program was started; it doesn't store theETag
value in its persistent database). - Yahoo Slurp and
PubSub-RSS-Reader only use
If-Modified-Since
, which works but is not ideal.
The 'nice try, but...' award goes to:
- Rojo 1.0, who support
ETag
but unfortunately make up their own timestamps forIf-Modified-Since
, and send both headers. This doesn't work, for reasons explained here and here. - BlogSearch, which sends
If-None-Match
but stripped of the quotes that DWiki'sETag
value has. (This may be RFC-compliant, in which case I need to fix DWiki.)
A number of syndication feed fetchers don't support conditional GET;
they don't even bother to send If-Modified-Since
headers, and always
wind up re-fetching my syndication feeds (when they fetch the main
one, this is 300K or so a shot). They are:
- everyone's friend MSNbot, who is by far the most active fetcher of my Atom feeds.
- 'madicon RSS Reader', which appears to be a syndication feed reader addon for Lotus Notes. Working in the Notes environment may make it difficult to store the per-feed information necessary to support conditional GET.
- 'Waggr_Fetcher)', http://www.waggr.com/; this appears to be a web-based feed reader.
- kinjabot, another web-based aggregator thing.
- FeedFetcher-Google and 'Googlebot/2.1' (fetching as a browser); these surprised me, because I expected Google to do better.
- BlogPulse, although to be fair it only visited three times in the last week. (It's an interesting blog search engine; I wish it indexed WanderingThoughts more. Unfortunately they want an email address to submit blog URLs, which is an immediate turnoff these days.)