Exploring some spamblogs
I have a certain interest in the behavior of MSNbot, the MSN Search web spider. I'd like to keep track of what other people are saying about it in blogs; the obvious way is a date-based [msnbot] search using Google Blogsearch.
If you do the search you can see why the results leave me less than enthused: it is full of a cluster of spamblogs, mostly hosted at blogspot.com. They show up because they're mechanically including articles about search engine behavior and search engine optimization that they appear to pull from ezinearticles.com, most of which appear to have originally been written by Mike Banks Valentine of website101.com (for example, this article has been quite popular).
If we look at a representative posting, we can see that threaded through the web page are images and carefully keyworded captions that link to redirectors under 'clickbank.net' or on 'tietie.ru'; which one is used seems to depend on the page. (Also present are links to other spamblogs in the cluster, URLs from the original articles, and a few outbound links that may be attempts to persuade Google that they're not spamblogs.)
The images are common across all of the blogs and appear to be stock photos fetched from 'static.sxc.hu', which bills itself as 'the leading free stock photo site'. It's not clear why the spammers use images; they may be attempting to hit Google Image searches too, or maybe Google rates words in image captions higher than otherwise.
Clickbank.net is 'Click Sales Inc', with a primary website at clickbank.com; they seem to be a merchant backend for e-books, software, and other purely digital products. They offer charming services such as having their '100,000 affiliates' drive traffic to your website, and seem to be popular with people who sell things like '33 Days to Online Profits 2004 Edition'. (They also seem popular with people who spam Usenet and Google Groups.)
The tietie.ru URLs are just redirectors to the clickbank.net URLs. I'm not sure why the spammers want to cloak the presence of clickbank.net URLs, but evidently they do.
Another form of Google Blogsearch spam is all of the keywordblogger.net subdomains that show up in the [msnbot] search. While keywordblogger.net (aka pre-views.net, aka preview-search.com) is nominally in the blog searching and indexing business, their real purpose is to generate ad revenues for themselves (ironically including through Google Adwords) by drawing visitors to pages that are loaded with ads and stuff.
Keywordblogger.net seems to operate by copying entries from syndication feeds, 'indexing' them to find various common words like 'database' or 'website', and then re-presenting the search and indexing results as pseudo-blogs in subdomains that they then get Google Blogsearch to index. The syndication feeds from these pseudo-blogs then draw readers to keywordblogger.net web pages full of ads (unlike an honest blog aggregator, their RSS feeds don't point to the original URL for the entries).
You can see how little importance they attach to the real blog entries by looking at how they're presented on the web pages: in plain text in small blue type on a gray background, well down the page past all of the ads.
(Presumably keywordblogger.net is going to all of this effort so that they can say that they are a blog search company, and 'just' running ads like all of the rest. I can hope that this is not going to fool Google.)
Update: they're also keywordblogger.com and show up under that name in some Google Blogsearch searches.