Putting the 'User-Agent' in your web crawler's User-Agent

April 26, 2023

In the "that's not how you do it" category, here are two HTTP User-Agent values that I saw on Wandering Thoughts recently:

User-Agent=Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20100101 Firefox/12.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:103.0) Gecko/20100101 Firefox/103.0

That's right, these User-Agents have 'User-Agent' in them (at the start). This is not exactly a new development in crawler user-agents, since I saw it long ago when Wandering Thoughts was new, and possibly even before then (but if so, I didn't bother writing it down).

One option is that this is people being a little bit unclear on the concept of what should go in a User-Agent string. Another option, brought to mind by the 'User-Agent=' version, is that some crawler software is confusing about how you should configure the user-agent it will use, such that people are taking a pure string field as something that needs a key=value form (or maybe a 'label: value' form, for the second user agent). Since a pure string configuration field generally accepts either other versions, these people's configurations 'work' in the sense that the software runs.

In the traditional way of software configuration, the people running the software could be copying examples around via superstition. This would make extra sense for the first user-agent, since Firefox 12.0 has not been an even vaguely likely actual browser for just over a decade (it was apparently released April 24, 2012, which is more recently than I would have thought).

(Because both of these are obviously forged user-agents, I've followed my usual practice and arranged to block them from further access to Wandering Thoughts. In fact I've gotten around to blocking access for all user-agents that start with 'User-Agent'. Not that I expect it to make any real difference in the rain of stealth crawlers that poke at things here, but one does what one can, or at least what one feels grumpy about.)


Comments on this page:

By Ivan at 2023-04-27 06:56:13:

Have you seen bots with long strings of spaces in their User-Agents? I thought they were tabs, but no, they are actually clusters of 17 spaces:

Mozilla/5.0 (Macintosh;                 Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML,                 like Gecko) Chrome/39.0.2171.95 Safari/537.36

Yo, dawg, I heard you like UserAgents so I put a UserAgent in your UserAgent so you can... I don't know, I'm not Xzibit and my coffee hasn't kicked in.

Unrelated: I'm glad I clicked through the historical links. I got to re-experience fun conversations that I long forgot I had. Thanks for blogging all of these years!

Written on 26 April 2023.
« Understanding ZFS ARC hit (and miss) kstat statistics
I can't recommend serious use of an all-in-one local Grafana Loki setup »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Apr 26 22:49:59 2023
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.