How well do some Atom feed fetchers do conditional GETs?
'Conditional GET'
is the HTTP technique used to save bandwidth by not re-fetching
unchanged pages. Using conditional GET is especially important for
things that fetch syndication feeds (RSS or Atom), because people
usually check feeds much more often than they revisit web pages.
(This
is another good reference for syndication feed reader authors.)
WanderingThoughts has
a lot of syndication feeds
and the main ones are quite big. Recently, partly prompted by
issues with MSNbot, I decided to take a
look at what was fetching my syndication feeds and how well they did
conditional GET. So I looked at my data for about the past week,
chosen in part because I recently added detailed logging about what
conditional GET related headers get sent by things fetching my Atom
feeds.
(First, I have to say that I like having readers and we have a lot of
spare bandwidth. If your syndication feed reader does badly here, it
is absolutely not a request for you to unsubscribe.)
Conditional GET can be done with ETag / If-None-Match, or with
If-Modified-Since; ETag is better. Perfect scores go to the feed
fetchers that always use it:
SharpReader,
Bloglines,
LiveJournal,
Feedster Crawler,
and
NetNewsWire.
A few feed fetchers lose some points from the East German judge:
- liferea lost out on a perfect score
because while it always uses
If-Modified-Since, it only sometimes
uses If-None-Match (only if it's fetched a changed feed since the
program was started; it doesn't store the ETag value in its
persistent database).
- Yahoo Slurp and
PubSub-RSS-Reader only use
If-Modified-Since,
which works but is not ideal.
The 'nice try, but...' award goes to:
- Rojo 1.0, who support
ETag but
unfortunately make up their own timestamps for If-Modified-Since,
and send both headers. This doesn't work, for reasons explained
here and
here.
- BlogSearch, which sends
If-None-Match but stripped of the quotes that DWiki's ETag value
has. (This may be RFC-compliant, in which case I need to fix DWiki.)
A number of syndication feed fetchers don't support conditional GET;
they don't even bother to send If-Modified-Since headers, and always
wind up re-fetching my syndication feeds (when they fetch the main
one, this is 300K or so a shot). They are:
- everyone's friend MSNbot, who is by far
the most active fetcher of my Atom feeds.
- 'madicon RSS Reader', which appears to be a syndication feed reader
addon for Lotus Notes. Working in the Notes environment may make it
difficult to store the per-feed information necessary to support
conditional GET.
- 'Waggr_Fetcher)', http://www.waggr.com/; this appears to be a
web-based feed reader.
- kinjabot, another web-based aggregator
thing.
- FeedFetcher-Google and
'Googlebot/2.1' (fetching as a browser); these
surprised me, because I expected Google to do better.
- BlogPulse, although to be fair it only
visited three times in the last week. (It's an interesting blog
search engine; I wish it indexed WanderingThoughts
more. Unfortunately they want an email address to submit blog URLs,
which is an immediate turnoff these days.)