Some quick SMTP connection statistics

April 7, 2006

Recently I've been wondering about the usage pattern of zombie machines. Do spammers typically make only a few connections from each zombie and move on, or do they use the same machines over and over?

Through my weekly spam stats I know that some machines that we reject at connection time try again and again. But what's the distribution like? For example, do most IP addresses get refused once or twice and then go away? So I grabbed our logs and started looking.

All of these figures are for the past 28 (full) days, and for IP addresses that have connected to us at least twice at least five seconds apart (so we're already dealing with machines with some retrying or reuse).

What Different IPs 1 try 2 tries 3 tries 4 tries 5-10 tries more
all refused 46,583 60% 17% 7.4% 4.1% 8% 3.7%
'dynamic' 25,430 59% 17% 7.7% 4.2% 8.2% 3.3%
bad reverse DNS 15,582 63% 17% 6.6% 3.4% 6.3% 3.3%
CBL 4,237 49% 19% 9% 6.2% 12% 4.3%

'CBL' is the people we rejected for being CBL listed. Unfortunately for my nice neat stats, we only check DNS blocklists after doing 30 minutes of greylisting (or more, for people with bad DNS information). So these are the creme of the crop of CBL listed IP addresses, which explains the relatively high persistence. It also makes the 49% 'only rejected once' interesting; I theorize that spammers are now using at least some zombie handling programs that don't give up after 4xx series SMTP replies, but do after 5xx ones.

At the moment, 7,511 of the 'bad reverse DNS' IP addresses and 11,518 of the 'dynamic' IP addresses are currently in the CBL (since the CBL ages things out, it's possible that more of them were originally there). Broken apart into 'in the CBL' and 'not currently in the CBL' sets, we get:

What Different IPs 1 try 2 tries 3 tries 4 tries 5-10 tries more
'CBL' 19,022 56% 18% 8.3% 4.7% 9.3% 4.2%
non-CBL 21,969 65% 17% 6.4% 3.2% 5.9% 2.5%

I don't have any really clever theories about the difference in persistence. It does make me want to move the CBL to early on in our processing so I can generate better numbers. (Prior experience suggests that most of our rejections will be in the CBL.)

Written on 07 April 2006.
« Some things about smpatch
A pleasing Python regularity with __future__ »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Apr 7 03:57:08 2006
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.