Another case of someone being too clever in their User-Agent field

June 5, 2017

Every so often, something prompts me to look at the server logs for Wandering Thoughts in some detail to see what things are lurking under the rocks. One area I wind up looking at is what User-Agents are fetching my syndication feeds; often interesting things pop out (by which I mean things that make me block people). In a recent case, I happened to spot the following User-Agent:

Mozilla/5.0 (compatible) AppleWebKit Chrome Safari

That's clearly bogus, in a way that smells of programming by superstition. Someone has heard that mentioning other user-agents in your User-Agent string is a good idea, but they don't quite understand the reason why or the format that people use. So instead of something that looks valid, they've sprayed in a random assortment of browser and library names.

As with the first too-clever User-Agent, my initial reaction was to block this user agent entirely. It didn't help that it was coming from random IPs and making no attempt to use conditional GET. After running this way for a few days and seeing the fetch attempts continue, I got curious enough to do an Internet search for this exact string to see if I could turn up someone who'd identified what particular spider this was.

I didn't find that. Instead, I found the source code for this, which comes from Flym, an Android feed reader (or maybe this fork of it). So, contrary to how this User-Agent makes it look, this is actually a legitimate feed reader (or as legitimate a feed reader as it can be if it doesn't do conditional GET, which is another debate entirely). Once I found this out, I removed my block of it, so however many people who are using Flym and spaRSS can now read my feed again.

(Flym is apparently based on Sparse-RSS, but the current version of that sends a User-Agent of just "Mozilla/5.0" (in here), which looks a lot less shady because it's a lot more generic. Claiming to be just 'Mozilla/5.0' is the 'I'm not even trying' of User-Agents. Interestingly, I do appear to have a number of people pulling Wandering Thoughts feeds with this User-Agent, but it's so generic that I have no idea if they're using Sparse-RSS or something else.)

In the past I've filed bugs against open source projects over this sort of issue, but sadly Flym doesn't appear to accept bug reports through Github and at the moment I don't feel energetic enough to even consider something more than that. I admit that part of it is the lack of conditional GET; if you don't put that into your feed reader, I have to assume that you don't care too much about HTTP issues in general.

(See my views on what your User-Agent header should include and why. Flym, spaRSS, and Sparse-RSS all fall into the 'user agent' case, since they're used by individual users.)

PS: Mobile clients should really, really support conditional GET, because mobile users often pay for bandwidth (either explicitly or through monthly bandwidth limits) and conditional GET on feeds holds out the potential of significantly reducing it. Especially for places with big feeds, like Wandering Thoughts. But this is not my problem.


Comments on this page:

Besides bandwidth, don’t forget battery life. A 304 is much cheaper than parsing a large XML file, looping over all the feed entries, and comparing each against your database, only to find that none are new or changed. Over and over again.

Written on 05 June 2017.
« Link: The evolution of Unix's overall architecture
The IPv6 address lookup problem (and brute force solution) »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Jun 5 01:23:34 2017
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.