I'm not sure what I feel about this web spider's User-Agent value
Every so often I do the unwise thing of turning over rocks in the web logs for this blog. Today, one of the things that I found under there was a web spider with the claimed User-Agent of:
BuckyOHare/1.4 (Googlebot/2.1; +https://hypefactors.com/webcrawler)
The requests all came from AWS IP address space, so I have no idea if this actually belongs to the people that it claims to. As is typical for these spiders, it got my attention primarily by attempting to access URLs that no crawler should.
The bit that raised my eyebrows a lot is the mention of Googlebot. On the one hand, there is a long tradition of browsers including the name of other browsers in their User-Agents in order to persuade web sites to do the right thing and serve them the right content. On the other hand, the biggest reason that I can think of to claim to be Googlebot is so that web sites that give Googlebot special allowances for crawling things will extend those allowances to you, and that's a rather different kind of fakery.
(Ironically this backfired for these people because I already had Googlebot blocked off from almost all of the URLs that they tried to access. It does raise my eyebrows again that almost all of the pages they tried to access were Atom feeds or 'write a comment' pages. For now I've decided that I don't trust these people enough to allow them any access to Wandering Thoughts, so they're now totally blocked.)
I wouldn't be surprised if other web spider operators have also experimented with this clever idea already. If not, I rather suspect that more people will in the future. Given that there are websites that are willing (or reluctantly forced) to allow Google(bot) access but would rather like to block everyone else, more than a few of them are probably using User-Agent matching instead of anything more sophisticated.
(Partly this is because more sophisticated methods are some combination of more work to maintain and more time to check in the web server itself.)