Wandering Thoughts archives

2012-06-20

An update on comment spammer behavior here on WanderingThoughts (once again)

WanderingThoughts has been getting comment spam attempts for what is now a rather long time; I last wrote about the state of affairs two years ago. Since then nothing much has changed (at least at one level), but I've come to a general conclusion about how my current comment spammers operate.

(I'm not sure I can generalize to all comment spammers, since I only have one blog data point. But I would be rather surprised if my comment spammers were atypical.)

The big thing I've concluded is that the spammers are what I will call 'form stuffers'; they robotically find likely looking HTML forms, stuff values in all text fields, and automatically submit them. They seem to have basically no intelligence about the results or the process, which I suspect means that they don't need to for sufficiently many blogs.

(Here on WanderingThoughts the spammers run into two anti-comment-spam precautions at once with this behavior. Their form stuffing hits my invisible honeypot text field and then their indifference to the form submission results means that they fail to get past having to preview the comment before posting it (this general indifference seems to be a bit new since last time). All by itself the honeypot text field is still the single most effective anti-spam precaution I've ever added to DWiki.)

For reasons that don't fit in this entry, I've also been looking at the IP addresses that comment spammers use. Once upon a time such IP addresses were generally open proxies or compromised end-user machines. Now it appears that most of them are hosted servers in datacenters; I see very few consumer IP address ranges any more and a great deal of hosting provider ranges, often but not always in Europe. Some of the hosting providers involved are fairly big and theoretically reputable operations, too.

(I suspect that there may be layers of resellers involved.)

That this happens and in fact is the dominant IP address source is a bit depressing; it implies that hosting providers are either ignoring the spam hosting issues or have ineffectual procedures for dealing with the problem. Or both.

(Somewhat to my surprise I have seen relatively few comment spam attempts from places like Amazon's cloud. Either Amazon is too expensive for spammers, has a really effective set of abuse procedures, or spammers just assume that the AWS IP ranges are completely useless and widely blocked to start with.)

PS: I believe I've seen online slide decks from Etsy on their security work that mentioned that they see a significant amount of bad activity from datacenter IP address ranges. As a result, I suspect that this is a general shift in bad guy behavior. And why not? If you don't get terminated, a hosted server is much more convenient for all sorts of things than a random collective of flaky compromised machines.

CommentSpammerBehaviorIV written at 00:57:39; Add Comment

2012-06-10

The anti-spam implications of email being multiple things in one

One of the immediate corollaries of there being lots of different sorts of email and there being no reliable way of telling them apart is that different people can wind up basically using completely different flavours of email. For example, some people in your organization may basically never use email for real time conversations, at least not with outside people, while others may need to use it this way all the time. An important consequence of this is that at least some anti-spam precautions on incoming email are intrinsically dependent on the receiver; whether or not they're usable depends on what sort of 'email' the receiver actually uses out of all of the various options.

Let me elaborate on that. I used to feel that anti-spam precautions could be one size fits all, and the main reason to offer users options was because not all of them had been persuaded that our recommended set of options were fully trustworthy and the right answer. This is clearly wrong for at least some anti-spam options; the obvious example here is greylisting, which implicitly uses the heuristic that a new sender can't be sending real time email. This is more or less correct for many people but is also clearly incorrect for some people, who as a result intrinsically can't use greylisting.

(The argument that greylisting advocates might advance here is that someone sending you an email for the first time has no idea how fast you'll respond; only after you respond immediately do things become real-time. This is wrong because it ignores knowledge and expectations that the sender may have from stuff outside the email system.)

Thus our need to offer our users at least some options for anti-spam processing is essentially intrinsic and always going to be there. Because different people may well use email differently, there is no one size fits all set of anti-spam precautions (unless we can somehow find a lowest common denominator of precautions among everyone and then offer only that); different people need different options and we need to offer them the choice.

(This casts an interestingly different light on the question of what options we need to offer; it's not just how people think about spam filtering, it's what they need and want out of email from external people. Since I just came to this realization recently I don't have any answers, just something to think about.)

EmailDifferenceImplication written at 00:53:57; Add Comment

By day for June 2012: 10 20; before June; after June.

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.