Wandering Thoughts archives

2013-06-24

How to get your syndication feed fetcher at least temporarily banned here

In the spirit of a previous series, here's how to get me to at least temporarily ban a syndication feed fetcher that appears potentially legitimate. This is not something that I like to do because it potentially cuts off people who actually want to read Wandering Thoughts, but this case is so bad and so potentially questionable that I'm doing it at least temporarily.

So here's the procedure:

  • Make a lot of requests for the same feed. For example, request the main feed here once every ten minutes like clockwork (despite the fact that it doesn't change anywhere near that often).

  • Don't use any form of conditional GET, so you fetch the full feed every time.

  • Don't support gzip encoding, so you fetch nearly half a megabyte every ten minutes.

  • Insert bogus Cookie headers into the request. In this case the feed fetcher appears to be leaking cookies set by other sites into requests to here, including some badly formed cookies that cause the standard Python cookie parser to throw errors (which get logged by DWiki, which is why I noticed all of this in the first place).

  • Don't have any meaningful reverse DNS and have a User-Agent: header of:
    Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2; Feeder.co) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.43 Safari/537.31

    This is not a proper User-Agent for an automated feed fetcher. A proper User-Agent clearly identifies the organization responsible and that this is a robotic agent making the request. This is instead an almost complete imitation of a real web browser's User-Agent, with only an inconspicuous 'Feeder.co' to perhaps identify the actual responsible party (there really is a 'feeder.co' and they appear to do feed fetching).

  • Of course the Feeder.co website exposes almost no contact information and especially doesn't have a 'contact us here if our feed fetcher is doing something odd' page.

Under normal circumstances I would continue to allow this feed fetcher to pull my feed and send the people running it email about the problems in the hopes that they'll fix them. But the User-Agent here smells very much like what spammers do and with everything else going on I have no idea if Feeder.co is even responsible for this or whether someone is abusing their vaguely good name. Certainly I don't feel like trusting them with any of my email addresses; even at the best they are running a significantly bad feed fetcher and have made a number of extremely questionable decisions in operating it. It doesn't help that some of their program bugs are drastically polluting my logs (due to the complaints about the malformed cookies).

(If you do not support conditional GET you have absolutely no business polling feeds at a rate anywhere near close to once every ten minutes. Never. Ever.)

(It's not just that the spammers have thoroughly poisoned the well for reaching out to random people on the Internet that you don't have any real knowledge of. It's also that telling people that their software has serious problems is sometimes an excellent way of sparking a great deal of drama (with a capital D). Especially if they are a commercial company.)

PS: I may reluctantly change my opinions here in a few days. I really don't like cutting Wandering Thoughts readers off, even if they are using a service with major problems.

(I've considered redirecting these requests to a very small Atom feed with a single entry that just says 'this feed fetcher is broken and not getting actual content, please switch software or report this to the operators', but that would require creating such a feed somehow. I suppose it wouldn't be too hard. Right now the feed requests are just getting 403 responses (and they are still coming in every ten minutes, which is another failure).)

web/HowToGetYourFeedFetcherBanned written at 00:40:31; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.