|
2011-07-02 Dear Googlebot: SMTP is not HTTPFrom the logs of a SMTP server here: 32301# remote from [66.249.67.36] 32301r GET /robots.txt HTTP/1.1 32301w 550 Syntax error 32301r Host: 128.100.3.51:25 32301w 550 Unknown command 'Host' 32301r Connection: Keep-alive 32301w 550 Syntax error 32301r Accept: text/plain,text/html 32301w 550 Syntax error 32301r From: googlebot(at)googlebot.com 32301w 550 Unknown command 'From' 32301# aborted: session terminated (The abort is from my server, which drops connections after too many syntax errors.) Then it immediately tries the same thing with ' Yes, I'm sure that somewhere there is something that looks like a HTTP link to port 25 on this IP address (although Google doesn't know about it; I've tried the obvious web search). But this is still a failure on Google's part, because they should be much more careful than this with any 'url' that involves a port that is known to be used for another protocol. Sure, someone could be running a web server on port 25 against all expectations, but the odds are far better that someone has created a bad or malicious link. And certainly when Googlebot has been receiving SMTP replies for years, it should stop attempting to crawl entirely. The other failure is that Googlebot should not have made the second
query for PS: I'm aware that part of the blame falls on my MTA for being so old that it doesn't immediately disconnect Googlebot for illegal pipelining (I assume that that's what's happening here).
|
These are my WanderingThoughts GettingAround This is part of CSpace, and is written by ChrisSiebenmann. * * * Atom feeds are available; see the bottom of most pages. Categories: links, linux, programming, python, snark, solaris, spam, sysadmin, tech, unix, web |