The MSN search spider has gone crazyI'm not the first person
to notice this, but the MSN search spider ('msnbot' if you want to
search for it in the user agent portion of weblogs, or just look for
requests from the Specifically, the MSNBot is repeatedly and aggressively crawling completely unchanging URLs, while paying much less attention to changing ones. On this server, the past 29 day's worth of logs show:
Overall, MSNbot requested over 5100 pages from us; judging from Referer logs, exactly 72 MSN searches brought visitors here. On a second system I have logs going back to May 29th. Since then, MSNbot requested 364,000 pages from the website, with about 5,000 MSN searches bringing people to us. The most popular MSNbot pages to request are again completely crazy:
On both websites, many of the most requested URLs don't exist. While looking periodically to see if nonexistent URLs that people are still linking to have reappeared is a good idea, I don't see why it should be the MSNBot's most popular thing to do. It's clear that how often the MSN spider looks at web pages has very little to do with how often they change. For example, the index page for WanderingThoughts, which changes at least every day, was fetched only 14 times over 29 days (and in a completely uneven pattern, with several skips). Meanwhile, my top level home page, unchanged since May of 2001, was fetched 45 times (more than once a day). Fortunately the University of Toronto has lots of bandwidth to spare, and neither web server is exactly straining under the load. (The good news is that this issue may reach the ears of some MSN Search people and they'll hopefully fix whatever is wrong. If so, I'll update this entry with appropriate information.) Update, September 9th or so: some people from MSN Search have been in contact with me and now have various details (like specific URLs and so on). There's no other developments (including no particularly apparent change in MSNbot's crawling patterns). Update, September 30th: the MSN Search people have gotten back in contact with me again. Unfortunately, MSNbot continues to have various issues with how it crawls us, including significantly excessive tranfers of ISO images. Since the MSN Search people are talking with me, I am not currently planning to take aggressive action against MSNbot. Update, November 14th: with no contact from MSN Search in over a month and continued bad MSNbot behavior, I have given up and banned MSNBot from crawling our website. See BanningMSNBot. (2 comments.)
|
These are my WanderingThoughts GettingAround This is part of CSpace, and is written by ChrisSiebenmann. * * * Atom feeds are available; see the bottom of most pages. Categories: links, linux, programming, python, snark, solaris, spam, sysadmin, tech, unix, web |