Another case of someone being too clever in their User-Agent
field
Every so often, something prompts me to look at the server logs for
Wandering Thoughts in some detail to see what things are
lurking under the rocks. One area I wind up looking at is what
User-Agent
s are fetching my syndication feeds; often interesting
things pop out (by which I mean things that make me block people). In a recent case, I happened to
spot the following User-Agent
:
Mozilla/5.0 (compatible) AppleWebKit Chrome Safari
That's clearly bogus, in a way that smells of programming by
superstition. Someone
has heard that mentioning other user-agents in your User-Agent
string is a good idea, but they don't quite understand the reason
why or the format that people use. So instead of something that
looks valid, they've sprayed in a random assortment of browser
and library names.
As with the first too-clever User-Agent
, my initial reaction was to block
this user agent entirely. It didn't help that it was coming from
random IPs and making no attempt to use conditional GET
. After running this way for a few days and
seeing the fetch attempts continue, I got curious enough to do an
Internet search for this exact string to see if I could turn up
someone who'd identified what particular spider this was.
I didn't find that. Instead, I found the source code for this,
which comes from Flym, an Android feed reader (or maybe this fork of it). So, contrary to how this
User-Agent
makes it look, this is actually a legitimate feed
reader (or as legitimate a feed reader as it can be if it doesn't
do conditional GET
, which is another debate entirely). Once
I found this out, I removed my block of it, so however many people
who are using Flym and spaRSS can now read my feed again.
(Flym is apparently based on Sparse-RSS, but the current version of
that sends a User-Agent
of just "Mozilla/5.0"
(in here),
which looks a lot less shady because it's a lot more generic. Claiming
to be just 'Mozilla/5.0
' is the 'I'm not even trying' of User-Agents
.
Interestingly, I do appear to have a number of people pulling Wandering
Thoughts feeds with this User-Agent
, but it's so generic that I have
no idea if they're using Sparse-RSS or something else.)
In the past I've filed bugs against open
source projects over this sort of issue, but sadly Flym doesn't appear to accept bug
reports through Github and at the moment I don't feel energetic
enough to even consider something more than that. I admit that
part of it is the lack of conditional GET
; if you don't
put that into your feed reader, I have to assume that you don't
care too much about HTTP issues in general.
(See my views on what your User-Agent
header should include and
why. Flym, spaRSS, and Sparse-RSS all fall
into the 'user agent' case, since they're used by individual users.)
PS: Mobile clients should really, really support conditional GET
,
because mobile users often pay for bandwidth (either explicitly or
through monthly bandwidth limits) and conditional GET
on feeds
holds out the potential of significantly reducing it. Especially
for places with big feeds, like Wandering Thoughts. But this
is not my problem.
|
|