Wandering Thoughts archives

2023-04-26

Putting the 'User-Agent' in your web crawler's User-Agent

In the "that's not how you do it" category, here are two HTTP User-Agent values that I saw on Wandering Thoughts recently:

User-Agent=Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20100101 Firefox/12.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:103.0) Gecko/20100101 Firefox/103.0

That's right, these User-Agents have 'User-Agent' in them (at the start). This is not exactly a new development in crawler user-agents, since I saw it long ago when Wandering Thoughts was new, and possibly even before then (but if so, I didn't bother writing it down).

One option is that this is people being a little bit unclear on the concept of what should go in a User-Agent string. Another option, brought to mind by the 'User-Agent=' version, is that some crawler software is confusing about how you should configure the user-agent it will use, such that people are taking a pure string field as something that needs a key=value form (or maybe a 'label: value' form, for the second user agent). Since a pure string configuration field generally accepts either other versions, these people's configurations 'work' in the sense that the software runs.

In the traditional way of software configuration, the people running the software could be copying examples around via superstition. This would make extra sense for the first user-agent, since Firefox 12.0 has not been an even vaguely likely actual browser for just over a decade (it was apparently released April 24, 2012, which is more recently than I would have thought).

(Because both of these are obviously forged user-agents, I've followed my usual practice and arranged to block them from further access to Wandering Thoughts. In fact I've gotten around to blocking access for all user-agents that start with 'User-Agent'. Not that I expect it to make any real difference in the rain of stealth crawlers that poke at things here, but one does what one can, or at least what one feels grumpy about.)

web/UserAgentInUserAgent written at 22:49:59; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.