Today's comment spammer trick: regurgitated comments
I log the contents of some attempted spam comments here on Wandering Thoughts (the concise summary of when is when the spammer seems to be trying hard). Usually this doesn't get anything, but today my trawl through the logs turned up a succession of bizarre and odd comment attempts. The text had misspellings and typos but it generally made sense and most of the comment attempts were even about technical things that are vaguely on topic for here. But they were invariably attempts to comment on very inapplicable entries.
When I looked at the logs in detail, one of the most striking was a series of comment attempts that looked very much like a conversation between two or more people about using git on home directories. This was very odd since none of the comments were being posted, yet the people were pretty clearly replying to each other; I began to develop all sorts of theories about disturbingly intelligent content auto-generation. Finally I noticed something in one of the comment texts and the penny dropped:
[...] Possibly related posts: (automatically generated)Heroku, the Rails app.
There is a really simple way to get this text into a spam comment: you can be scraping content from existing blog posts and/or blog comments. So my new theory is that the would-be comment spammer is is scraping comment text from other blogs, mangling them somewhat, and then spam-posting them on other blogs (including mine).
The mangled text doesn't seem to have any links or other spam-relevant text so I'm not sure why the spammers are doing this. Maybe they're fishing to see what blogs will allow their comments through moderation and will follow up with more active content on blogs where this works.
Sidebar: source details and other things
So far 30 different IP addresses have tried this here today; most IP addresses have made only one attempt each. The IP addresses cover a large range of source networks. A few of them are CBL listed but that's pretty much it as far as DNBLs are concerned. Four of the IP addresses actually belong to Microsoft (126.96.36.199, 188.8.131.52, 184.108.40.206, and 220.127.116.11; all four are currently listed on the CBL). I'm assuming that these are compromised machines, VPS servers, or both.
Many of the IP addresses also made a burst of
GET requests for various
other URLs here. Maybe they're scraping text from Wandering Thoughts
for use in their corpus for their next spam run somewhere else.