Some quick SMTP connection statistics
Recently I've been wondering about the usage pattern of zombie machines. Do spammers typically make only a few connections from each zombie and move on, or do they use the same machines over and over?
Through my weekly spam stats I know that some machines that we reject at connection time try again and again. But what's the distribution like? For example, do most IP addresses get refused once or twice and then go away? So I grabbed our logs and started looking.
All of these figures are for the past 28 (full) days, and for IP addresses that have connected to us at least twice at least five seconds apart (so we're already dealing with machines with some retrying or reuse).
What | Different IPs | 1 try | 2 tries | 3 tries | 4 tries | 5-10 tries | more |
all refused | 46,583 | 60% | 17% | 7.4% | 4.1% | 8% | 3.7% |
'dynamic' | 25,430 | 59% | 17% | 7.7% | 4.2% | 8.2% | 3.3% |
bad reverse DNS | 15,582 | 63% | 17% | 6.6% | 3.4% | 6.3% | 3.3% |
CBL | 4,237 | 49% | 19% | 9% | 6.2% | 12% | 4.3% |
'CBL' is the people we rejected for being CBL listed. Unfortunately for my nice neat stats, we only check DNS blocklists after doing 30 minutes of greylisting (or more, for people with bad DNS information). So these are the creme of the crop of CBL listed IP addresses, which explains the relatively high persistence. It also makes the 49% 'only rejected once' interesting; I theorize that spammers are now using at least some zombie handling programs that don't give up after 4xx series SMTP replies, but do after 5xx ones.
At the moment, 7,511 of the 'bad reverse DNS' IP addresses and 11,518 of the 'dynamic' IP addresses are currently in the CBL (since the CBL ages things out, it's possible that more of them were originally there). Broken apart into 'in the CBL' and 'not currently in the CBL' sets, we get:
What | Different IPs | 1 try | 2 tries | 3 tries | 4 tries | 5-10 tries | more |
'CBL' | 19,022 | 56% | 18% | 8.3% | 4.7% | 9.3% | 4.2% |
non-CBL | 21,969 | 65% | 17% | 6.4% | 3.2% | 5.9% | 2.5% |
I don't have any really clever theories about the difference in persistence. It does make me want to move the CBL to early on in our processing so I can generate better numbers. (Prior experience suggests that most of our rejections will be in the CBL.)
|
|