Wandering Thoughts archives

2012-03-28

Ultimately, abuse issues have to be handled by humans

Time and time again, people have tried to create entirely automated systems for detecting, identifying, and dealing with spam on their services. Time and time again, they've ultimately failed; their systems may stop a great deal of spam, but enough gets through despite it.

(Not infrequently the spam that gets through looks, from the outside, as if it should be trivial to recognize. I think there is a deep reason for this, which we'll get to.)

There is a shallow and a deep reason for this failing. The shallow reason is that humans (and spammers are humans) will relentlessly game any set of automated rules until they can find weaknesses and then drive as many trucks as possible through whatever weaknesses they've found. If your service is at all popular, there will be far more smart spammers trying to game the automation than there are smart people writing the automation, placing your automation writers in an arms race they almost certainly cannot possibly win. The deep reason is that you are guaranteed to have weaknesses, because it's essentially impossible to make automated rules as smart as they need to be due to the fundamental problem of spam of stopping bad content while letting good content through. Whatever 'bad' and 'good' are, which is one reason you need people.

(As for why spam that gets through automated systems often looks obvious to people, it's because there's no reason for spammers to add variety once they've gotten past the automated systems. In fact they can be blindingly obvious so long as they evade the automation.)

All of this means that places really do need humans to handle their abuse issues; automation can help by getting obvious things, but it will never entirely replace humans paying attention. The corollary is that places need not just some people but enough people for the volume of abuse they get. This is an extremely unpopular view since abuse is a cost center and everyone loves the idea of automating your cost centers to make them go away, but by this point we have plenty of experience that this just doesn't work for abuse.

(The corollary is that anyone who relies on automation instead of staffing up their abuse department to adequate levels is not actually serious about spam, regardless of what they say. They may not be actively for spam and spammers on their service, but to use the fine George Orwell phrase they are objectively pro-spam. Application to various Silicon Valley firms are left as an exercise for the reader.)

HumanAbuseHandling written at 00:52:53; Add Comment

2012-03-11

A CBL false positive reveals a significant issue with the CBL

We were notified today that one of our IPs, 128.100.1.90, had been listed on the CBL (and thus had been pulled in by Spamhaus in their XBL and Zen DNSBLs). There's only one problem with this: there's no machine at that IP address and never has been, and even if there was such a machine it would not have been allowed to do any external traffic by our firewall.

(This subnet is only present on a couple of switches in our machine room and is not exposed outside of it; it's not even carried on our general inside-department backbone.)

However, there is a long standing issue where some people out there in the world are using addresses in 128.100.0.* and 128.100.1.* on their internal networks. These addresses leak into Received: headers and provoke spam complaints when these companies are exploited to send spam. Now they apparently also cause CBL listings.

(Back when I first saw this it was primarily from machines in Europe, but this time it appears to be a bad machine and organization in Brazil.)

Unfortunately, this is very bad. The only way for the CBL to pick up these IP addresses is for CBL feeders to parse the Received: headers in the mail they receive. Let me repeat that: the CBL is listing IP addresses based on parsing Received: headers from untrusted third party machines. And demonstrably this parsing can and has been fooled into false positives, listing machines that are not spam sources.

What we are seeing here is only one demonstration of what can go horribly wrong when you do this. As far as I am concerned, this significantly lowers the trustworthiness of CBL results. It used to be that I could trust that everything in the CBL was listed because CBL honeypots had direct experience with bad behavior from that IP. Now it is clear that for some or perhaps many listed IPs, the CBL has at best indirect 'evidence', evidence that can easily be wrong. Probably the CBL is still mostly correct and this sort of thing is rare, but I had previously thought that this sort of false positive was actively impossible in the CBL.

CBLFalsePositiveProblem written at 00:41:23; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.