An update on comment spammer behavior here on WanderingThoughts (once again)
WanderingThoughts has been getting comment spam attempts for what is now a rather long time; I last wrote about the state of affairs two years ago. Since then nothing much has changed (at least at one level), but I've come to a general conclusion about how my current comment spammers operate.
(I'm not sure I can generalize to all comment spammers, since I only have one blog data point. But I would be rather surprised if my comment spammers were atypical.)
The big thing I've concluded is that the spammers are what I will call 'form stuffers'; they robotically find likely looking HTML forms, stuff values in all text fields, and automatically submit them. They seem to have basically no intelligence about the results or the process, which I suspect means that they don't need to for sufficiently many blogs.
(Here on WanderingThoughts the spammers run into two anti-comment-spam precautions at once with this behavior. Their form stuffing hits my invisible honeypot text field and then their indifference to the form submission results means that they fail to get past having to preview the comment before posting it (this general indifference seems to be a bit new since last time). All by itself the honeypot text field is still the single most effective anti-spam precaution I've ever added to DWiki.)
For reasons that don't fit in this entry, I've also been looking at the IP addresses that comment spammers use. Once upon a time such IP addresses were generally open proxies or compromised end-user machines. Now it appears that most of them are hosted servers in datacenters; I see very few consumer IP address ranges any more and a great deal of hosting provider ranges, often but not always in Europe. Some of the hosting providers involved are fairly big and theoretically reputable operations, too.
(I suspect that there may be layers of resellers involved.)
That this happens and in fact is the dominant IP address source is a bit depressing; it implies that hosting providers are either ignoring the spam hosting issues or have ineffectual procedures for dealing with the problem. Or both.
(Somewhat to my surprise I have seen relatively few comment spam attempts from places like Amazon's cloud. Either Amazon is too expensive for spammers, has a really effective set of abuse procedures, or spammers just assume that the AWS IP ranges are completely useless and widely blocked to start with.)
PS: I believe I've seen online slide decks from Etsy on their security work that mentioned that they see a significant amount of bad activity from datacenter IP address ranges. As a result, I suspect that this is a general shift in bad guy behavior. And why not? If you don't get terminated, a hosted server is much more convenient for all sorts of things than a random collective of flaky compromised machines.