|
2009-07-28 Spammers are quite dedicated in their address scrapingThis is one of those entries that require some apparently irrelevant background. The Atom syndication feed format requires that each entry have a unique
identifier assigned to it (the While you can read all the gory details here,
the simple version of
The authorityName is normally a domain; however, the spec says that you can use
'<id>@<domain>' as well. For reasons beyond the scope of this entry,
I decided to use the second format for the (In brief: the advantage of this format is that you don't have to invent a new subdomain for everything you host; you use one domain and have a unique identifier as the <id> bit.) You can see where this is going. A bit over a month after I started using this format for Atom IDs, I started getting email attempts to 'cspace@<domain>' (which were rejected; there is no requirement that such authorityNames actually are email addresses, and the domain I used doesn't even accept email to start with). After talking about this with some people, the general speculation is
not that spammers are scraping Atom feeds for
2009-07-20 Minimalistic spam, another annoyance to worry aboutI've started getting advance fee fraud spam which have as their entire contents something like this:
At first I was amused by the minimalism and lack of effort on the spammer's part; it'd be hard to get an advance fee fraud attempt in less words. But the more I think about it, the more that I think this may be more clever than it looks (whether or not it's deliberate). Modern anti-spam filters are quite good at analyzing text and detecting signs of spam. But tiny, minimal messages like this give them a problem (and indeed this one passed the spam filters with a low score), because there's almost no text for anti-spam tools to sink their teeth into. The less text there is for textual analysis, the more you're going to have to rely on some sort of meaning analysis, which has problems. (I am relatively convinced of the existence of a general trend of giving
anti-spam tools less text to work on. I've been seeing spam where the
real payload was a PDF or This text still has markers that could sort of be matched on, and probably a pure Bayesian approach would work well (since there's a number of words in there that probably don't normally appear in your email). But I'm not convinced that either will hold up in the long term; smarter spammers can eliminate the obvious markers, and probably there's a lot of room for rephrasing the message and using a less distinct set of words. (One comment.)
MinimalisticSpam written at 00:01:54; Add Comment
|
These are my WanderingThoughts GettingAround This is part of CSpace, and is written by ChrisSiebenmann. * * * Atom feeds are available; see the bottom of most pages. Categories: links, linux, programming, python, snark, solaris, spam, sysadmin, tech, unix, web |