Anti-spam content scanning systems need to scan more

August 24, 2009

It's long since past the time when anti-spam content scanning systems should decode and scan all the encoded attachments of email messages, especially encoded plaintext ones. Most content scanning systems always been willing to decode base-64 encoded inline text and HTML (it's sort of a basic requirement), but I don't think very many of them scan attachments. The predictable result is that spammers have caught on that attaching their spam in a base-64 encoded attachment works, and it shouldn't.

(And this is not sophisticated spams from sophisticated operations; this is advance fee fraud and the like. I've been receiving an increasing number of these of late, many of which have been getting through the commercial system that we use.)

The sophisticated version of this is to embed the spam in a Microsoft Word .doc file, so pretty soon content scanning systems are going to need to be able to extract text from those too. I'm sure that spammers will try to obfuscate the text, just like they try to obfuscate the text in HTML messages today, but such obfuscation makes a good signature all on its own.

(Yes, accepting random .doc attachments from strangers has its own risks, but in most environments it's probably not politically acceptable to just refuse all of them, however tempting it sometimes is.)

Written on 24 August 2009.
« You should not use HTTP request parameters as filenames
The problem with the CFQ IO scheduler and our iSCSI targets »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Aug 24 00:54:40 2009
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.