Wandering Thoughts archives

2009-11-20

Spam and the attraction of reach

Here is a thesis: the larger or more standardized the environment for sending messages, the more spam you should expect to get in or through it. Accordingly, email is heavily abused because it is hugely standardized.

The spammer's motivation for abusing larger, standardized environments is obvious; the larger the environment, the more people you can reach with a single technique, approach, or system. Larger environments have better return on effort, since generally (but not always) most of the effort in spamming in an environment is figuring out how to do it well.

(This ties in to how spammers are lazy but not stupid, at least not in the aggregate.)

This is depressing because it implies that any well used service that allows push messages is going to have spam no matter what you do. If you build such a service or protocol and it gets popular, you'll get spam. (In fact, degree of spam is not a bad metric for degree of popularity. And if the spammers abandon you, well, worry.)

It is tempting to say that one important way to discourage spammers is to shift the relative costs so that as much effort as possible is per-message effort; if nothing else, this might make you less attractive than the next target. However, I think that the general history of people's anti-spam efforts in new systems shows that this ultimately doesn't work; if you're attractive enough for regular users, you're easy enough for spammers.

(See also DeterringAbuseProblem on this general issue.)

StandardizedSpam written at 01:01:48; Add Comment

2009-11-19

The corollary for effective anti-spam heuristics

Last time I mentioned that spammers were perfectly capable of adopting their practices to defeat anti-spam heuristics like requiring a valid EHLO or reverse DNS, and so such heuristics were, if effective and widely adopted, at best a temporary fix. This raises an obvious corollary about good anti-spam heuristics.

Since spammers will adopt when it is both useful and possible, a good anti-spam heuristic is some characteristic of the message or of how it is transmitted that the spammer cannot easily change. While people have made various stabs at this in the past (and will no doubt continue to do so in the future), the problem for anti-spam efforts is that such characteristics have been hard to find, partly because spammers have proven to be very ingenious about finding ways to change them.

(For a small example, are anti-spam systems matching on the characteristic phrases of your advance fee frauds in email? No problem, just put your pitches in file attachments. I await with resignation the day the spammers start sending PDFs, not just Word .doc files, since a sufficiently ingenious spammer can make a PDF that is very hard to analyse.)

I am not convinced that it's even theoretically possible to come up with good (under this definition) anti-spam heuristics in any sort of general environment, partly for reasons that run up against the fundamental spam problem.

(While current heuristics are effective, my strong impression is that they are a laboriously maintained and ever-evolving collection of more or less ad-hoc rules. This doesn't necessarily scale, and it's expensive.)

HeuristicsCorollary written at 00:47:39; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.