Google Feedfetcher is still fetching feeds and a User-Agent caution

November 9, 2013

Feedfetcher was Google's feed fetching backend for Google Reader, which as you may remember was shut down on July 1st this year (to generally mixed feelings). At the time of that shutdown Google was pretty definite about how the service was gone, its data was not being retained, and there would be no recovery or resumption possible. One would normally expect that the feed fetching backend would also be shut down at the same time.

Well, no, of course not. This is Google, after all, the new home of 'we don't care because we don't have to' (cf). Google Feedfetcher is still pulling my feeds more than four months after the shutdown of Google Reader. In fact it's worse than that; the claimed readership numbers listed in its User-Agent have barely budged from the time when Google Reader was running (this is what is known as a flat out lie). As irritating things involving Google go, this is a drop in the bucket. Still I've recently decided that I've had enough so I've blocked their user-agent. It turns out that this exposes a little issue that you may want to think about when you create User-Agent strings.

Here is the User-Agent header for Google Feedfetcher:

Feedfetcher-Google; (+; 445 subscribers; feed-id=1422824070729197911)

Here is the User-Agent header for Feedly:

Feedly/1.0 (+; like FeedFetcher-Google)

If you block Google Feedfetcher using a case-independent match you'll probably also block Feedly unless your User-Agent parser is really smart. It would be easy to miss this when you set up blocks unless you make a habit of monitoring what they match (and I suspect that many people don't do that, any more than they have a fancy User-Agent parser instead of a general regexp engine).

By the way, if this happens I would argue that it is more or less Feedly's fault here. There are quite a lot of feed fetchers that do not feel the need to drop Google Feedfetcher's name in their User-Agent header and the way that Feedly is doing this, combined with Google's own User-Agent formatting, makes it very easy for a match to hit both. If Feedly wants to communicate the similarity to webmasters reading their logs they could have used a different phrasing that would not run this risk.

(Of course I rather suspect that Feedly actively wanted their feed fetcher to be mistaken for Google Feedfetcher by automated code, it's just that when they planned it this they expected that it was going to be a good thing.)

Written on 09 November 2013.
« The spectrum of options when netbooting systems
My views on network booting as an alternative to system disks »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Nov 9 00:04:40 2013
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.