Wandering Thoughts archives

2013-08-25

On classifying phish spam as malware

As I noted recently, our commercial anti-spam filter counts at least some varieties of phish spam as 'viruses', by which it means malware in general. I find myself with divided opinions on this.

On the one hand, phish spam does not fit the traditional definition of malware. There is no executable (however well disguised) that will do bad things to your machine; all of the bad things that phish spam does happen in the human being in front of the computer. In theory the purpose of an anti-spam and anti-virus system stripping malware from email is partly that such malware is extremely damaging and all but impossible for people to detect themselves (if they even get a chance). Phish spam doesn't have this clearly damaging property.

On the other hand, phish spam does clearly have a very bad effect on your computing environment. You would block a trojan that passively stole passwords; well, phish spam is that trojan without an executable but with getting your users to just give their passwords to the attacker. If your anti-virus filter's job is to prevent damage to your computer systems, classifying phish spam as a form of malware and stripping it from inbound email makes a decent amount of sense.

Does this issue matter in practice? It may. The problem is user expectations and especially false positives in an environment where some users do not want the mail system to do spam filtering for them.

(My feeling is that false positives on phish spam are both more likely and more dangerous than for other sorts of malware because phish spam doesn't involve code, just natural language. Lots of normal, legitimate email is natural language; very little involves executable code. Of course a lot depends on how narrow or broad the 'phish as malware' detection is, ranging from known phish attacks all the way out to things that score as sufficiently phish-like.)

PhishAsMalware written at 23:26:16; Add Comment

2013-08-23

Looking at how many viruses we've seen in email recently

Once upon a time people were very worried about viruses being spread through email and devoted a lot of time and effort to eradicating them (sometimes going so far as to refuse all zipfiles and the like). The last time I looked at this we had very few viruses being recognized, but that was a couple of years ago and today I was curious to see if things had changed.

(Technically what I am actually looking at is the amount of detected malware. Viruses are only one of the types of malware that can be spread through email.)

Because our email system does two stages of filtering I have to give two sets of numbers. All of these are over the last 30 days because I decided that that was a good time range for 'current activity'. First, in our SMTP-time milter based filtering, which only covers some email, we checked 44,000 messages and found 316 'viruses'. This is actually highly misleading because our commercial black box spam+AV filter classifies some phish messages as viruses instead of plain spam. It turns out that most of the detected viruses were in fact phishing messages; 232 out of 318, leaving 84 real viruses.

The main anti-spam processing (which every accepted email goes through) handled 503,000 messages and found 2,445 viruses. Again this includes some phishing messages but this time a lot fewer, only 913. That leaves 1,532 real viruses or a detected virus rate of 0.3% of our incoming email.

Actual malware is potentially very damaging, so I'm glad we have the anti-virus filtering even if we don't see many of them. I might feel differently if we paid any significant amount of money for it (although there are free options if we ever need them).

(I was going to say something about classifying phish spam as malware but my thoughts on this are long enough that I want to put them in a separate entry.)

EmailVirusCount-2013-08 written at 00:16:08; Add Comment

2013-07-30

Phish spam and outside events

I wrote my advance fee fraud spam aphorism about how advance fee spammers take advantage of world events for their come-ons. It strikes me as interesting that I've never seen phish spammers do that. I actually thought I had a case from this weekend and I was going to write it up here, but on looking at the phish spam again it's merely trying to get my Apple ID, not my Apple Developer ID (the latter would be topical given the commotion with the Apple Developer Center security issue).

(I don't have either and I don't think there's any suggestion anywhere that I do. But then as far as I know I've never gotten particularly targeted phish spam in general.)

Assuming that I'm not just missing out on phish spam that refers to current events, I wonder why phish spammers don't seem to do this in the same way that advance fee fraud spammers do. Possibly it's because current events are harder to exploit for phish spam because the results the spammers want are more focused and narrow. If you're not interested in Apple Developer IDs, for example, the recent security issues there are totally useless for you. By contrast advance fee fraud is always after the same thing (you giving them money) and can use many hooks to justify it.

Even with that I'm still a bit surprised that I haven't seen much or any phish spam that said something like 'in light of recent security incident <X> we're asking all of our users to ...'. Perhaps phish spammers also just don't want to remind their targets of security issues lest the targets think twice about the spam itself.

PhishEvents written at 00:06:38; Add Comment

2013-07-02

Today's question: are anti-spam statistics useful for us?

In the postscript of my recent DNS blocklist stats I basically raised a question in passing: are anti-spam stats I can general here actually useful, or they just vaguely interesting? In the jargon, are they actionable information?

When I put it this way, the answer is pretty much no. As I see it, there are two possible reasons anti-spam stats could be actionable here: they could point out some problem in our anti-spam filtering or they could help us allocate limited system resources to anti-spam things with the highest payoff (so we could, for example, eliminate an expensive anti-spam step if it wasn't doing us any good). But neither of these actually apply to us because our anti-spam stuff is basically a black box that we don't tune and the machines involved in this show no signs of being anywhere close to running out of resources.

(Arguably we should monitor our use of DNS blocklists to see if they're doing us any good. But it seems very unlikely that either the CBL or zen.spamhaus.org will stop being effective any time soon and if they do temporarily get quiet, it's not like it does any harm to have them present.)

There are somewhat actionable statistics, but they aren't really accessible. What really matters is the amount of mis-classification that's going on, ie spam that's missed and non-spam that's incorrectly tagged as spam. However we have no way of telling this; only the users can (if they bother to check) and we don't currently have any way to collect information on this.

(We assume that we would hear about it if there was a significant amount of either going on. This may be optimistic, and given that the core of our anti-spam system is a vendor black box there isn't necessarily anything we could do about it anyways.)

I'm a bit sad about this because I find these sorts of statistics to be interesting and so I'd like it if they were also useful. It also means that it doesn't really make sense to spend much time doing things like improving the mail system's logging to help out statistics gathering.

AreSpamStatsUseful written at 00:26:52; Add Comment

2013-06-30

Some very basic DNS blocklist hit information for the last 30 days

Our inbound mail gateway anti-spam stuff logs when a connection is from something listed in the CBL or in zen.spamhaus.org (and yes, we know that that's sort of redundant, it's a long story). Because of how it's implemented, we only check zen.spamhaus.org if we don't find the IP in the CBL.

(It turns out that the log message I'm looking at only fires when we accept an RCPT TO from such an IP address and I think it may fire multiple times for multiple RCPT TOs. This makes me think that I need better logging, although I've already seen that spam filter stats can be complicated.)

Over the last 30 days, we accepted RCPT TOs from 90,000 different IP addresses that were in one or the other (some were detected as being in both at different times). The CBL is the dominant source, at 77,000 or so; Zen is good for another 15,000 or so. I also have stats for RCPT TOs that we rejected due to the source IP being in one of the DNS blocklists; over the same 30 day period we rejected 13,500 different IPs (for a total of 92,000 rejected RCPT TOs), again almost all from specifically due to a CBL listing (12,000 to 1,500). Roughly 8,500 of these IPs also had some RCPT TOs accepted.

(For scale on the RCPT TO rejections, over the same time period we fully accepted somewhere around 540,000 RCPT TOs (counting email that got all the way to the end of DATA).)

Generating ad-hoc stats like this makes me think that I should work out what stats are interesting in advance and then make sure that we're logging enough information to reconstruct them. Maybe I should also put together scripts to generate stats automatically on demand (which would mean that I might look at them more).

(The advanced version is having logstash or some equivalent digest all of the logs and provide real-time versions of the stats. But while that might look pretty, it's not really useful; there is nothing actionable in these stats (to use the jargon), just things of vague interest.)

CSLabDNSBLHits2013-06-29 written at 01:09:30; Add Comment

2013-06-27

How much of our incoming email is checked at SMTP DATA time

One of our anti-spam steps is to check some messages for signs of spam at SMTP DATA time. To qualify for checking, a message must have only (accepted) RCPT TOs of people who've opted in to enough checking to make this worthwhile. I have previously done figures on how many recipients each average inbound email has, but I haven't looked directly at how much of a workout this DATA time check is getting.

Over the past 30 days we've accepted 487,000 messages and run 49,000 through SMTP DATA checks. Over roughly the same amount of time we rejected about 21,000 of those checked messages; about 190 of those rejections were detected as 'viruses' (which includes some phishing attempts because that's how the commercial filtering system we use works).

At first I was all set to be depressed about this low ratio of email checking. Then I actually looked at how many email addresses had opted in to some degree of DATA time filtering and, well, it's tiny. We have about 300 local addresses enrolled in this checking, while over the same past 30 days we've had messages sent to about 1700 different local addresses. It turns out that less than 120 local addresses have rejected any spam at SMTP DATA time over the past 30 days and thus are responsible for those 21,000 rejections.

(As you might guess, a few heavily spammed local addresses are disproportionately responsible for rejections. The most spammed address rejected over 30% of the messages, although after that the remaining very active addresses drop to the 5% level.)

Since I just generated the stats to check my work: it looks like only somewhat less than half of those enrolled addresses actually had email sent to them that went through SMTP DATA checks. If my crude log crunching is accurate there are only about 25 local addresses that did SMTP DATA checks but did not reject any spam at DATA time. I guess this makes sense; if our users bother to go out of their way to enroll themselves in this, it's because they need it.

(This does imply that the enrolled users are not getting a significantly disproportionate amount of our incoming email. About 8.5% of the destination addresses are enrolled and about 10% of the incoming email gets checked at DATA time; this is a bit higher than a completely fair distribution but not that much off for crude measurements.)

OurMilterVolumeLevel written at 01:12:44; Add Comment

2013-05-26

Empirically, modern mailing list services are spam senders

I still run a mailer on my office workstation, handling email to addresses that I've had for a very long time and which I used to use a lot in public (back in the days when the Internet was a much nicer place). To a very good approximation the only email that gets sent to it any more is spam.

(I have systematically transfered all legitimate email to other addresses elsewhere and I no longer subscribe to mailing lists and so on from it.)

Which leads to the punchline: I think I've gotten spam email sent to this machine by most if not all major providers of mailing list services. Many of them keep trying to send email to the machine over time, too.

This is what I mean when I say that empirically modern mailing list services are spam senders: they send spam. To me, from my particular vantage point, their spam sending activities outnumber their legitimate activities directed at me. These companies can protest all they want that they have plenty of legitimate customers too, but for me it is a ratio of all spam and no ham.

By the way, of course I don't bother to send complaints to these companies. It's a waste of time. From a global perspective sending complaints to these companies is what's called 'list washing'; I'll maybe get removed from this particular list or this particular spammer's collection of lists (because the spamming customer gets canceled) but they'll be back to sending me spam next week or next month or next year on behalf of the next spammer that they sign up. The only effective cure for me is to block them entirely, so that's what I do.

(I've touched on this issue before but not quite in these blunt terms. Extensions to the morality of running a mailing list service provider are left as an exercise for the reader.)

(This rant was sparked by a recent conversation with someone I know.)

MailingServicesAreSpammers written at 01:03:13; Add Comment

2013-05-19

Today's comment spammer trick: regurgitated comments

I log the contents of some attempted spam comments here on Wandering Thoughts (the concise summary of when is when the spammer seems to be trying hard). Usually this doesn't get anything, but today my trawl through the logs turned up a succession of bizarre and odd comment attempts. The text had misspellings and typos but it generally made sense and most of the comment attempts were even about technical things that are vaguely on topic for here. But they were invariably attempts to comment on very inapplicable entries.

When I looked at the logs in detail, one of the most striking was a series of comment attempts that looked very much like a conversation between two or more people about using git on home directories. This was very odd since none of the comments were being posted, yet the people were pretty clearly replying to each other; I began to develop all sorts of theories about disturbingly intelligent content auto-generation. Finally I noticed something in one of the comment texts and the penny dropped:

[...] Possibly related posts: (automatically generated)Heroku, the Rails app.

There is a really simple way to get this text into a spam comment: you can be scraping content from existing blog posts and/or blog comments. So my new theory is that the would-be comment spammer is is scraping comment text from other blogs, mangling them somewhat, and then spam-posting them on other blogs (including mine).

The mangled text doesn't seem to have any links or other spam-relevant text so I'm not sure why the spammers are doing this. Maybe they're fishing to see what blogs will allow their comments through moderation and will follow up with more active content on blogs where this works.

Sidebar: source details and other things

So far 30 different IP addresses have tried this here today; most IP addresses have made only one attempt each. The IP addresses cover a large range of source networks. A few of them are CBL listed but that's pretty much it as far as DNBLs are concerned. Four of the IP addresses actually belong to Microsoft (168.63.43.185, 168.63.62.182, 168.63.76.184, and 168.63.84.217; all four are currently listed on the CBL). I'm assuming that these are compromised machines, VPS servers, or both.

Many of the IP addresses also made a burst of GET requests for various other URLs here. Maybe they're scraping text from Wandering Thoughts for use in their corpus for their next spam run somewhere else.

RegurgitatedCommentSpam written at 22:45:29; Add Comment

2013-04-27

Some theories on why DNSBLs may be dwindling away

In yesterday's entry I mentioned that I had some theories about why anti-spam DNS blocklists might be diminishing (beyond the obvious one that running a good DNSBL is a pain in the rear and people get tired of those after a while). I don't claim to have the answer and I'm not sure that any of these theories are right, just possibly interesting. First off I think we can rule out the idea that DNSBLs are going away because email spam itself is diminishing. Put simply, email spam isn't (and probably never will until email itself dies). The anti-spam forces may win tactical victories from time to time, but long-lasting general ones seem elusive.

But the historical evolution of spam and anti-spam efforts does make a good lead in to two related theories about this. My first theory is that the kind of obvious bad sources that used to provoke such arguments (and get listed or not listed depending on the aggression of the DNSBL operator) have basically gone away. In the beginning, many spammers basically painted target signs on themselves by using static IP ranges (beyond 'bulletproof hosting' and various other disguises) and there were a lot of places that would give them connectivity. These days, well, not so much in most places. The overall effect is to leave little room for a new DNSBL because there really isn't much to argue about any more in potential listings. Either they're one of a small clutch of bad actors (and almost certainly already in Spamhaus) or you're detecting them based on more or less automated technical criteria and things like the CBL and the Spamhaus CSS have that area well covered by now.

My second theory is that DNSBLs were first generation anti-spam technology that has now been significantly supplanted. In the early days of the anti-spam fight there were very few defenses, especially easily deployed ones. DNSBLs were easy to put together, worked fairly well on first-generation spam, and DNSBL lookups and connection rejection were really easy to add to mailers, so people reached for the available solution. But that's not the case any more. Spammers got more and more sophisticated while people developed more elaborate anti-spam systems (some free, some commercial). The result is DNSBLs are nowhere near as important as they used to be which makes starting a new one much less interesting. If you want to make your mark on the anti-spam world today, a DNSBL is probably not the place to do it.

(This certainly the case here. Our main anti-spam precaution is a commercial anti-spam system that's more or less a black box as far as we're concerned; our DNSBL usage is functionally a fallback measure.)

Sidebar: One remaining area that maybe could use a DNSBL

In short, so called 'snowshoe spam'. Our stats strongly suggest that there are active ranges used by snowshoe spammers, but the Spamhaus CSS only does automated single-IP listings that expire relatively rapidly. This seems ripe for someone to watch for patterns and then start preemptively blocking active snowshoe ranges.

(It's possible that this doesn't work and that, eg, the CSS is so good at picking up new showshoe emitter IPs in bad ranges that they get blocked before they can really spam anyone anyways.)

DiminishingDNSBLTheories written at 01:52:38; Add Comment

2013-04-26

Are there less anti-spam DNS blocklists than there used to be?

Once upon a time, back when I paid a fair amount of attention to anti-spam stuff, there were quite a lot of DNS blocklists. Some had good reputations and some were more colourful, some were conservative and slow-moving while others were much more aggressive and fast to block, but there were any number of them that many people looked at. I did enough in this area that I wrote a script to look up IP(s) in all of the worthwhile DNSBLs that I knew about.

Over the past few years I've been steadily removing DNSBL after DNSBL from this script (and sometimes from the mailer configuration on my office workstation), most recently NJABL. And it doesn't seem like new DNSBLs are replacing them in a transfer of the guard from the tired to the new and eager; instead the collection of worthwhile DNSBLs just seems to have been diminishing.

(I confirmed this with someone I know who is more in touch with email anti-spam than I am these days; his view was that it was basically down to Spamhaus (and its data sources). I'm still slightly broader than that, as I sometimes look at SURBL for spam website names.)

Now I'll admit that this may be somewhat illusory in that I'm not looking in the right place for modern DNSBL discussions; after all, one of the reasons I stopped paying attention to the field is that my usual information sources turned into sewers. I have some indications that other sites out there use additional DNSBLs (although none that I really consider worthwhile ones and none that are all that new).

I have some theories on what this diminishment of DNSBLs may mean but I'll save them for another entry (partly because I want to think about them some more).

DiminishingDNSBLs written at 01:25:53; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.