Wandering Thoughts: Recent Entries

2013-05-19

Today's comment spammer trick: regurgitated comments

I log the contents of some attempted spam comments here on Wandering Thoughts (the concise summary of when is when the spammer seems to be trying hard). Usually this doesn't get anything, but today my trawl through the logs turned up a succession of bizarre and odd comment attempts. The text had misspellings and typos but it generally made sense and most of the comment attempts were even about technical things that are vaguely on topic for here. But they were invariably attempts to comment on very inapplicable entries.

When I looked at the logs in detail, one of the most striking was a series of comment attempts that looked very much like a conversation between two or more people about using git on home directories. This was very odd since none of the comments were being posted, yet the people were pretty clearly replying to each other; I began to develop all sorts of theories about disturbingly intelligent content auto-generation. Finally I noticed something in one of the comment texts and the penny dropped:

[...] Possibly related posts: (automatically generated)Heroku, the Rails app.

There is a really simple way to get this text into a spam comment: you can be scraping content from existing blog posts and/or blog comments. So my new theory is that the would-be comment spammer is is scraping comment text from other blogs, mangling them somewhat, and then spam-posting them on other blogs (including mine).

The mangled text doesn't seem to have any links or other spam-relevant text so I'm not sure why the spammers are doing this. Maybe they're fishing to see what blogs will allow their comments through moderation and will follow up with more active content on blogs where this works.

Sidebar: source details and other things

So far 30 different IP addresses have tried this here today; most IP addresses have made only one attempt each. The IP addresses cover a large range of source networks. A few of them are CBL listed but that's pretty much it as far as DNBLs are concerned. Four of the IP addresses actually belong to Microsoft (168.63.43.185, 168.63.62.182, 168.63.76.184, and 168.63.84.217; all four are currently listed on the CBL). I'm assuming that these are compromised machines, VPS servers, or both.

Many of the IP addresses also made a burst of GET requests for various other URLs here. Maybe they're scraping text from Wandering Thoughts for use in their corpus for their next spam run somewhere else.

RegurgitatedCommentSpam written at 22:45:29; Add Comment

2013-04-27

Some theories on why DNSBLs may be dwindling away

In yesterday's entry I mentioned that I had some theories about why anti-spam DNS blocklists might be diminishing (beyond the obvious one that running a good DNSBL is a pain in the rear and people get tired of those after a while). I don't claim to have the answer and I'm not sure that any of these theories are right, just possibly interesting. First off I think we can rule out the idea that DNSBLs are going away because email spam itself is diminishing. Put simply, email spam isn't (and probably never will until email itself dies). The anti-spam forces may win tactical victories from time to time, but long-lasting general ones seem elusive.

But the historical evolution of spam and anti-spam efforts does make a good lead in to two related theories about this. My first theory is that the kind of obvious bad sources that used to provoke such arguments (and get listed or not listed depending on the aggression of the DNSBL operator) have basically gone away. In the beginning, many spammers basically painted target signs on themselves by using static IP ranges (beyond 'bulletproof hosting' and various other disguises) and there were a lot of places that would give them connectivity. These days, well, not so much in most places. The overall effect is to leave little room for a new DNSBL because there really isn't much to argue about any more in potential listings. Either they're one of a small clutch of bad actors (and almost certainly already in Spamhaus) or you're detecting them based on more or less automated technical criteria and things like the CBL and the Spamhaus CSS have that area well covered by now.

My second theory is that DNSBLs were first generation anti-spam technology that has now been significantly supplanted. In the early days of the anti-spam fight there were very few defenses, especially easily deployed ones. DNSBLs were easy to put together, worked fairly well on first-generation spam, and DNSBL lookups and connection rejection were really easy to add to mailers, so people reached for the available solution. But that's not the case any more. Spammers got more and more sophisticated while people developed more elaborate anti-spam systems (some free, some commercial). The result is DNSBLs are nowhere near as important as they used to be which makes starting a new one much less interesting. If you want to make your mark on the anti-spam world today, a DNSBL is probably not the place to do it.

(This certainly the case here. Our main anti-spam precaution is a commercial anti-spam system that's more or less a black box as far as we're concerned; our DNSBL usage is functionally a fallback measure.)

Sidebar: One remaining area that maybe could use a DNSBL

In short, so called 'snowshoe spam'. Our stats strongly suggest that there are active ranges used by snowshoe spammers, but the Spamhaus CSS only does automated single-IP listings that expire relatively rapidly. This seems ripe for someone to watch for patterns and then start preemptively blocking active snowshoe ranges.

(It's possible that this doesn't work and that, eg, the CSS is so good at picking up new showshoe emitter IPs in bad ranges that they get blocked before they can really spam anyone anyways.)

DiminishingDNSBLTheories written at 01:52:38; Add Comment

2013-04-26

Are there less anti-spam DNS blocklists than there used to be?

Once upon a time, back when I paid a fair amount of attention to anti-spam stuff, there were quite a lot of DNS blocklists. Some had good reputations and some were more colourful, some were conservative and slow-moving while others were much more aggressive and fast to block, but there were any number of them that many people looked at. I did enough in this area that I wrote a script to look up IP(s) in all of the worthwhile DNSBLs that I knew about.

Over the past few years I've been steadily removing DNSBL after DNSBL from this script (and sometimes from the mailer configuration on my office workstation), most recently NJABL. And it doesn't seem like new DNSBLs are replacing them in a transfer of the guard from the tired to the new and eager; instead the collection of worthwhile DNSBLs just seems to have been diminishing.

(I confirmed this with someone I know who is more in touch with email anti-spam than I am these days; his view was that it was basically down to Spamhaus (and its data sources). I'm still slightly broader than that, as I sometimes look at SURBL for spam website names.)

Now I'll admit that this may be somewhat illusory in that I'm not looking in the right place for modern DNSBL discussions; after all, one of the reasons I stopped paying attention to the field is that my usual information sources turned into sewers. I have some indications that other sites out there use additional DNSBLs (although none that I really consider worthwhile ones and none that are all that new).

I have some theories on what this diminishment of DNSBLs may mean but I'll save them for another entry (partly because I want to think about them some more).

DiminishingDNSBLs written at 01:25:53; Add Comment

2013-03-24

Looking at how many external recipients inbound email goes to

My data on how many recipients our average inbound email has is in practice incomplete. It's quite possible for a single address here to expand into multiple destinations; some addresses are mailing lists and some people just forward their email to more than one place. So an interesting companion question is how many external recipients a typical email has. To make this more applicable to what I'm interested in, I'm looking at this only for email from the outside world.

As before, this covers 89 days of logs (but because it's a slightly different 89 days, the stats don't necessarily match up exactly). The first number is that out of 1.3 million inbound emails, only 30% had any external recipients at all; the remaining 70% went (directly or indirectly) only to internal recipients. The recipient count breaks down this way:

1 recipient 91.7%
2 recipients 4.6%
3 recipients 1.9%
4 recipients 0.5%
5 recipients 0.4%

As you might expect in an environment with mailing lists, some messages had very high external recipient counts. The champions were emails with between 247 and 266 external recipients, all of which seem to have been messages to department-wide mailing lists (which of course go to a whole lot of people who forward their email to outside addresses). But there weren't very many such emails; only 0.4% of the messages had 10 or more external recipients.

Unlike the inbound email case there don't seem to be any particular pattern for significant numbers of external recipients. This is what I'd expect given that the mapping between the number of inbound recipients and the number of external recipients is a fairly random one (since it depends on exactly who the email goes to).

RecipientsDistributionII written at 01:57:21; Add Comment

2013-03-22

Looking at how many recipients our average inbound email has

One of the niggling problems of SMTP in the modern world (at least for us) is the mixed address problem, the fact that at DATA time your answer applies to all recipients. It would be much more convenient if all email messages had only a single recipient; then you could always apply just that recipient's content filtering views and enable much more rejection at SMTP time. Which leads to the question: how many recipients does an average message here have, especially inbound messages?

(Inbound messages are the most interesting ones, because those are the ones that all of our anti-spam stuff is applied to.)

Today, I decided to answer that question for our external MX gateway. The answer turns out to be that the overwhelming majority of email has only one recipient. The stats break down like this:

1 recipient 93%
2 recipients 3.6%
3 recipients 1.2%
4 recipients 0.6%
5 recipients 0.4%
6 recipients 0.2%
10 recipients 0.2%

(I think I'll stop there.)

This is from 89 days of logs, totaling 1.29 million messages received. It counts only actual accepted recipients so some of these messages may have had some of their RCPT TOs rejected already (I suspect that this is not a really big factor but I haven't looked).

The largest number of (accepted) recipients for a single message is 82 recipients (one messages). There are a similar handful of other messages with large recipient counts. Interestingly the largest 'large' message count is for 20 recipients (but it's still only 0.09% of all messages). There seems to be a hard break at 20 recipients; only 98 messages out of the 1.29 million had more recipients than that.

This has been interesting. Before I did these stats I would not have expected single-recipient messages to be so totally dominating (even though I'm familiar with things like VERP that strongly bias some traffic towards that). Possibly much more of our inbound email is mailing lists (including spam lists) than I expect.

Sidebar: detailed message counts for 7-20 recipients

This actually forms an interesting pattern so I'm going to give you the raw data:

   cnt   recipients
  1210   20
   641   19
   372   18
   184   17
   136   16
   113   15
   153   14
   173   13
   289   12
   820   11
  2081   10
  1428   9
  1568   8
  1925   7
  2156   6

(for 2-7 there is a steady dropoff.)

My guess is that a bunch of mailing list software really prefers to cut things at nice even (small) numbers of recipients.

RecipientsDistribution written at 17:08:05; Add Comment

2013-02-28

Looking at whether Zen-listed IPs keep trying to send us email

Here's a question: when an IP address listed in the Spamhaus Zen gets rejected, does it come back later or are most visits a one-time thing? This time I pulled 90 days worth of logs, extracted each day's rejections from Zen-listed IPs, and checked to see how many IPs showed up in more than one day's logs.

(Because an IP could be trying to deliver stuff right when the logs roll, the safe question is how many IPs show up in more than two days worth of logs.)

The first answer is that we have some persistent IPs but not anything that is really hammering on us. Well, at least if you look at the data this way. Here, have a table:

212.174.85.130 24 days SBL107558
89.204.63.228 20 days SBL168886 and the PBL
189.112.34.215 18 days SBL153384
82.165.159.34 15 days web.de; SBL175032
82.165.159.35 13 days web.de and SBL175032 again
82.165.159.3 10 days web.de but now SBL175030, which is basically the same as SBL175032; web.de is clearly good at getting SBL-listed.
217.133.203.34 10 days SBL157999
115.93.88.50 10 days In the PBL
82.165.159.2 9 days web.de yet again, SBL175030
218.38.136.79 9 days SBL146938
216.104.35.85
216.104.35.86
216.104.35.90
9 days No longer listed.
200.68.99.196 9 days SBL CSS
186.1.192.23 9 days SBL172432

(This table probably doesn't look that nice in the syndication feed.)

Now things get interesting, because I noticed a pattern and went digging. All of the IPs from 216.104.35.83 through 216.104.35.94 got rejected by us at various times in the 90 days, and all of them were rejected on multiple days. Even more interesting, the rejections stretch from day 11 through day 90 (although not continuously).

(The gaps in rejections could be either because they stopped sending to email addresses that were rejecting them, because they dropped out of Zen temporarily, or both of the above.)

This prompted me to look at /24-based reoccurrence, and there things get more interesting:

173.242.121.0/24 46 days One IP still in the SBL CSS
198.64.159.0/24 45 days 13 of 23 IPs still in the SBL CSS
216.104.35.0/24 43 days Nothing still listed out of the 12 IPs we rejected from this
82.165.159.0/24 30 days web.de, mentioned above; all four IPs still in their SBL listings
177.47.102.0/24 27 days SBL136747, a /24 listing dating from August 14, 2012
212.174.85.0/24 26 days SBL107558; one of the single IPs made it into the single-IP list
178.210.168.0/24 25 days Multiple IPs still in the SBL CSS
216.229.59.0/24 22 days Multiple IPs still in the SBL CSS

I'm going to stop here because the next '/24' is actually due to a single IP (89.204.63.228) so we're reaching the crossover point (besides, I'm doing this all more or less by hand).

What really surprises me from looking at the by-/24 breakdown is how active the SBL CSS clearly is. If someone told me that the SBL CSS was now the largest single contributor for spam rejections, I wouldn't be surprised.

(I can't verify that without changing our mail configuration to add more logging (since SBL CSS listings expire, we'd have to capture the Zen results at the time of the actual rejection). Sadly my curiosity is not worth that.)

(This is kind of a followup to looking to see if IP addresses persist in Zen.)

Sidebar: a way in which these results may not be representative

We do Zen-based rejections only for some email addresses (only those that have opted in to it). So a Zen-listed sending IP wouldn't necessarily see continuous rejections if they kept sending to us. It depends on what email addresses they are sending to that day and they could have a day with no rejections.

I haven't tried to dig into the raw logs to see if this is happening for some of these IPs, or in general if these IPs saw a mix of successful deliveries and rejections or if they saw uniform rejections. I don't know if I'll ever do this level of analysis, since it's going past what I can easily bash together with shell scripts and awk. Past the land of shell scripts lies the land of real work.

ZenRepeatHits-2013-02 written at 00:56:59; Add Comment

2013-02-26

Looking at whether (some) IP addresses persist in zen.spamhaus.org

After writing my entry on the shifting SBL I started to wonder how many IP addresses we reject for being SBL listed stop being SBL listed after a (moderate) while. I can't answer that directly, because we actually use the combined Zen Spamhaus list and we don't log the specific return codes, but I can answer a related question: how many Zen-listed IP addresses seem to stay in the Zen lists?

To check this, I pulled 10 days of records from January 18th through January 27th, extracted all of the distinct IPs that we found listed in zen.spamhaus.org, and re-queried Zen now to see how many of them are still there. Over that ten day period we had 613 Zen-listed IP addresses; today, 534 of them are still in the Zen. So a fairly decent number stay present for 30 days or more.

(Technically some of them could have disappeared and then reappeared.)

I also pulled specific return codes for all of those IP addresses, so I can now give you a breakdown of why those 534 addresses are still present:

  • 420 of them are in Spamhaus-maintained PBL data. There's no single really big source, but 46 of them are from Beltelecom in Belarus (AS6697) and 23 are from Chinanet (AS4134).

  • 70 of them are in the XBL, specifically in the CBL.

  • 56 are in the SBL. There's no really big source, but five IPs are from 177.47.102.0/24 aka SBL136747, four are from 5.135.106.0/27 aka SBL173923, and two are from 212.174.85.0/24 aka SBL107558.

    (Two of those SBL listings are depressingly old, not that I am really surprised by long-term SBL listings by this point.)

  • 47 of them are in ISP-maintained PBL data.
  • 9 of them are in the SBL CSS, which is pretty impressive and depressing because SBL CSS listings expire fairly fast.

An equally interesting question is how many of those 79 now-unlisted IPs are listed in some other DNS blocklist. The answer turns out to be a fair number; 60 are still listed on some DNS blocklist that I have in my program to check IPs against a big collection of DNSBls. Many but not all of the hits are for b.barracudacentral.org (which is not a DNSBl that I consider to be really high quality; it seems to be more of a hair-trigger lister).

(I'm out of touch with what's considered a high-quality DNSBl versus lower-quality ones so I'm not going to offer further reporting or opinions.)

ZenPersistence-2013-02 written at 00:00:07; Add Comment

2013-01-31

The shifting SBL, as experienced here

I still sort of run a mail server which gets a low enough connection volume that I can monitor the logs directly. This MTA rejects connections from SBL listed IPs, at a sufficiently low volume that I almost always look into the actual SBL listing (partly because I may want to apply my own blocks, including IP-level ones).

In the beginning, the volume of SBL hits was low but most of the actual SBL listings were for network ranges (not just single IPs) owned by what I privately characterized as 'the worst of the worst'. These were the people and organizations who spammed so many people so often that they finally convinced the SBL that they were very definitely dirty. Hits were rare partly because there never were really large numbers of these people, partly because I and other DNS blocklists blocked such people before the SBL, and perhaps partly because these people just didn't target me very often.

(I and a fair number of other people felt that the SBL was far too conservative and gave spammers way too many chances, but the SBL had its standards and that was it.)

I'm not sure when things started shifting, but this is not the pattern that I see today. The modern SBL experience is that most SBL hits are from single IPs that are listed as probably compromised or, to a lesser extent, from IPs that are on the SBL CSS. Hits from genuine SBL listed dirty blocks seem to be rare.

Out of curiosity I pulled eight days of records from the department's main mail gateway and looked through them for SBL rejections. Of the 80 IPs that (still) had SBL listings, the SBL CSS accounts for 35, 177.47.102.0/24's SBL136747 listing for four, and a random sampling of everything else shows single (compromised) IPs.

(Yesterday is a bit different. There are 27 IPs that are still SBL listed, with 21 of them on the SBL CSS. But two of the remaining were for bad netblocks and one IP was listed for spammer hosting. The other three were the usual single compromised machine pattern.)

I don't know what this means, if anything; I just find it interesting.

(I can come up with all sorts of potential theories but I will spare you all; they're generally obvious anyways. Just in case there's any doubt, I should note that I'm all for the SBL listing all sorts of spam sources and so I have no objection to the apparent new inclusion of compromised machines that are spewing advance fee fraud and phish spam and so on.)

ShiftingSBL written at 23:11:20; Add Comment

2013-01-05

What I think changed to make spam deliveries not cost-free

As I covered in my entry on why stupid spamming is wasteful, I used to think that spam deliveries were basically free (and so spammers shotgunned everything because, well, why not) and now I feel otherwise. This is not just a shift of my view; I actually feel that the situation itself changed. Which raises the obvious question of what changed to do this.

My tentative answer is that spamming became commercialized, and specifically that it became a sophisticated business. As it did so, we saw it increasingly segment into subfields with specialists and services as people realized both that you could make money selling the specialized services and that it made more sense to buy the services than do the work yourself (or alternatively, the existence of buyable services drew people into spamming who previously would not have done so). In particular, one thing that happened is that people began to rent out and sell spam sending capacity in various forms; as the spam business became sophisticated, people could buy and sell so much time on so many compromised proxies or so many delivery attempts or the like. This put a value on sending capacity, even if it was your own organically developed sending capacity (since you could always make money by renting it out to other people instead of trying to send out your own spam).

I also think that sending may have gotten more harder and expensive (in terms of time and lost opportunities). Back in the early parts of the 00s, things were in a sense really bad; there were oceans of open proxies (and before them oceans of open relays), ISPs generally didn't care, anti-spam precautions were relatively undeveloped (even at large providers), and so on. Since then many things have shifted quite far. The open proxy problem has gotten much better on many fronts (ISP cooperation, effective DNS blocklists, etc), anti-spam precautions have gotten more sophisticated in ways that hinder rapid sending, and so on.

(One inobvious but important shift is that many mailers will now drop your SMTP connection if you try to do unauthorized pipelining. Back at the height of the open proxy era spam senders simply blasted an entire SMTP conversation at you in one go, ignoring return codes and speeding up their lives. Now that doesn't really work (and spammers have by and large stopped trying to do it as a result).)

WhyExpensiveSpamToday written at 02:25:17; Add Comment

2012-12-29

Why I think that stupid spamming is actively wasteful

In reaction to my last entry, a commentator wrote:

You assume it's more cost efficient for the spammer to fix his system rather than just have a slightly higher percentage of broken addresses in his list than otherwise. I'd guess the broken addresses cost the spammer virtually nothing in resources or time.

I used to feel this way, that spamming was basically free, but I've shifted my views over time. My current belief is that in today's Internet environment, sending spam to addresses is not so cheap that it's pointless to measure and I actually suspect that modern spammers are often email-rate-limited and so sending to bad addresses directly displaces email that could go to potentially good addresses.

First off, let's take an easy case, that of people exploiting webmail systems via compromised accounts (as happened with us). Whether the spammers are using 'mules' to enter things by hand or they're driving the webmail systems by automation, it seems extremely likely that the spammer will have a relatively low sending rate limit (either the mules can only type and click so fast, or the webmail server software can and will only respond so fast). Thus, every clearly bad email address emailed to is a possibly good email address not mailed to.

(I'm making what I feel is the safe assumption that spammers have basically an infinite supply of potentially good email addresses they could spam.)

But let's suppose that the spammer has no message submission problems; they can stuff the queue with as much email to as many addresses as they want. The next limitation is the sending mailer itself. Spammers very often use compromised machines with whatever MTA setup the machine already has, a setup that is extremely unlikely to be set up for high sending volumes. The MTA will likely only be able to do DNS lookups and route messages so fast and make so many simultaneous delivery attempts at once, either through software limits or through machine capacity limits. Here again, bad addresses clearly displace potentially good ones.

(It's not uncommon for me to connect to the SMTP port on a machine that's sending out spam and have it report a temporary failure because of resources exceeded.)

Finally we have the actual delivery. Ignoring greylisting, I've seen clear evidence that large mail providers pay attention to delivery volumes and especially delivery volumes to bad addresses. Even here we've periodically seen temporary SMTP failures from the likes of GMail with messages to the effect of 'slow down, you're trying to send us too much too fast'. Every address a spammer tries to send to at such providers is one more point in their internal scoring systems for 'this IP is probably sending spam', and probably even more so for bad addresses; again bad addresses are displacing potentially productive ones and pushing the sending IP that much closer to when the provider will choke it off. Greylisting has similar but smaller effects (since it won't necessarily choke off future potentially good email addresses, just delay things). The effects of all of this is going to be magnified if the spammer is hijacking a compromised machine with a normal MTA that's set up for normal mail volumes.

You can build very custom infrastructure that has no problems with all of this (although you're still going to run into issues with destinations choking you off for too much volume). But I don't think most spammers these days are using anything that sophisticated, so all of those spammers are very likely to be email-rate-limited in their spamming.

SpamAttemptsAndWaste written at 01:59:18; Add Comment

These are my WanderingThoughts
(About the blog)

GettingAround
Full index of entries
Recent comments

This is part of CSpace, and is written by ChrisSiebenmann.
Twitter: @thatcks

* * *

Atom feeds are available; see the bottom of most pages.

This is a DWiki.
(Help)

Categories: links, linux, programming, python, snark, solaris, spam, sysadmin, tech, unix, web

Search:
(Previous 10 or go back to December 2012 at 2012/12/28)

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.