Microsoft's Bingbot crawler is on a relative rampage here
For some time, people in various places have been reporting that Microsoft Bing's web crawler is hammering them; for example, Discourse has throttled Bingbot (via). It turns out that Wandering Thoughts is no exception, so I thought I'd generate some numbers on what I'm seeing.
Over the past 11 days (including today), Bingbot has made 40998 requests, amounting to 18% of all requests. In that time it's asked for only 14958 different URLs. Obviously many pages have been requested multiple times, including pages with no changes; the most popular unchanging page was requested almost 600 times. Quite a lot of unchanging pages have been requested several times over this interval (which isn't surprising, since most pages here change only very rarely).
Over this time, Bingbot is the single largest source by user-agent (and the second place source is claimed by a bot that is completely banned; after that come some syndication feed fetchers). For scale, Googlebot has only made 2,800 requests over the past 11 days.
Traffic fluctuates from day to day but there is clearly a steady volume. Traffic for the last 11 days is, going backward from today, 5154 requests, then 2394, 2664, 3855, 1540, 2021, 3265, 7575, 2516, 3592, and finally 6432 requests.
As far as bytes transferred go, Bingbot came in at 119.8 Mbytes over those 11 days. Per day volume is 14.9 Mbytes, then 6.9, 7.3, 11.5, 4.6, 5.8, 8.8, 22.9, 6.7, 10.8, and finally 19.4 Mbytes. On the one hand, the total Bingbot volume by bytes is only 1.5% of my total traffic. On the other hand, syndication feed fetches are about 94% of my volume and if you ignore them and look only at the volume from regular web pages, Bingbot jumps up to 26.9% of the total bytes.
I think that all of this crawling is excessive. It's one thing to want current information; it's another thing to be hammering unchanging pages over and over again. Google has worked out how to get current information with far fewer repeat visits to fewer pages (in part by pulling my syndication feed, presumably using it to drive further crawling). The difference between Google and Bing is especially striking considering that far more people seem to come to Wandering Thoughts from Google searches than come from Bing ones.
(Of course, people coming from Bing could be hiding their Referers far more than people coming from Google do, but I'm not sure I consider that very likely.)
I'm not going to ban Bing(bot), but I certainly do wish I had a useful way to answer their requests very, very slowly in order to discourage them from visiting so much and to be smarter about what they do visit.