2007-09-30
Understanding Exim's weird way of doing retries
First, some terminology. A top level address is an address that a
message starts out being sent to; for example, every (accepted) RCPT
TO
in SMTP creates a top level address for the message. A destination
is a place that a message is ultimately going to be delivered to, and
may include things like files. A top level address may turn into more
than one destination through means like .forward
files, aliases, and
mailing list files. At a conceptual level, all MTAs have two main jobs:
mapping top level addresses to destinations, and then delivering to
destinations.
The MTA we use now takes what I think of as the straightforward two-phase approach to this. Top level addresses are mapped to destinations in a process called 'routing', the MTA tries to deliver to all destinations, and any destinations that had some temporary delivery failure are remembered to be retried later. When the MTA does retries, it looks up all still undelivered destinations and tries to deliver to them again.
(This ignores retry times, bounces, incomplete DNS lookups, and so on.)
The important effect of the two-phase approach is that a given message's destinations never change; they are determined once when it is received and then frozen.
Exim does not operate this way. Instead of remembering undelivered destinations and retrying them directly, it remembers delivered destinations and whether or not a top level address was completely delivered. Then each time exim retries a message, it re-maps any top level address that has not been completely delivered to destinations, throws away any destination that has already been delivered to, and (re)attempts deliveries on any remaining destinations. (If there are no remaining destinations, the top level address is done.)
So a given message's destinations can change during retries. The
consequence of this is that messages being retried will pick up changes
to .forward
s, file-based mailing lists, and so on; as the Exim
documentation notes, this can result in things like new subscribers to
a mailing list receiving messages that were sent to the mailing list
before they actually subscribed.
(Exim has a one_time
option for redirect
-based routers that
will turn destination addresses into top-level addresses. But because
top-level addresses have to be real addresses, Exim has to outlaw pipe
and file destinations if you turn this on, and this is not an option for
us.)
This approach does let you correct a routing problem on the fly; you
don't need to change a routing rule and then manually change the
destinations of a pile of stalled messages. But it makes it hard to
see what destination is causing messages to stall (and what the error
message is), since undelivered destinations only exist during retries;
mailq
and so on will only tell you what top level addresses haven't
been completely delivered (and what destinations have been delivered).
(Technically the information can be dug out of the logs with sufficient work.)
(This is one of those entries I write to make sure that I understand the issue myself.)
Weekly spam summary on September 29th, 2007
This week, we:
- got 11,909 messages from 265 different IP addresses.
- handled 26,934 sessions from 2,995 different IP addresses.
- received 297,885 connections from at least 101,029 different IP addresses.
- hit a highwater of 16 connections being checked at once.
Volume is a bit up from last week. Looking at the numbers I am reminded of how striking the number of different IP addresses is; the average connection source made less than three connections to us, where the average session source made nine connections (and the average mail source probably did even better, since that is an average of about 44 messages per IP).
Day | Connections | different IPs |
Sunday | 40,875 | +14,708 |
Monday | 39,537 | +16,197 |
Tuesday | 38,779 | +14,952 |
Wednesday | 59,611 | +17,304 |
Thursday | 49,560 | +14,939 |
Friday | 37,500 | +10,877 |
Saturday | 32,023 | +12,052 |
Apparently the spammers are back to abusing us on Wednesdays.
Kernel level packet filtering top ten:
Host/Mask Packets Bytes 72.249.13.64/26 19977 1096K otcpicknews.com 213.180.130.0/24 17928 1076K onet.pl 89.18.190.60 13567 814K 68.168.78.0/24 11478 551K adelphia.net 213.29.7.0/24 10808 648K centrum.cz 66.15.119.165 9019 422K 68.230.240.0/23 8400 408K cox.net 139.55.101.14 8287 421K 202.5.93.20 8082 388K 212.170.236.211 6257 375K
Volume is significantly up from last week.
- 89.18.190.60 returns from last week.
- 66.15.119.165 kept trying to send us bad
HELO
s and returns from a previous appearance in Feburary. - 139.55.101.14 is something we consider a dynamic IP.
- 202.5.93.20 is an APNIC IP address with broken reverse DNS.
- 212.170.236.211 kept trying with a bad
HELO
.
(It warms the black cockles of my heart to see that throwing otcpicknews.com's other netblock straight into our kernel filters was absolutely the right thing to do.)
Connection time rejection stats:
83117 total 41427 bad or no reverse DNS 35442 dynamic IP 4001 class bl-cbl 332 class bl-dsbl 291 acceleratebiz.com 261 class bl-pbl 255 class bl-sdul 188 class bl-sbl 125 qsnews.net 86 class bl-njabl 42 officepubs.com 24 verticalresponse.com
Perversely, volume is down here compared to last week. The highest source of SBL rejections this week was SBL58952 with 66 rejections (a recent listing for a spam source), followed by last week's leading contents of SBL53319 with 25 rejections and SBL48694 with 23 rejections. (Better luck next time, you two! Oh wait, what am I saying? Please drop off the Internet.)
Seventeen of the top 30 most rejected IP addresses were rejected
100 times or more this week; the leader is 124.157.174.227 (1,412
rejections), followed by 203.134.218.225 (1,375 rejections) and
61.7.132.40 (301 rejections). Five are currently in the CBL, two are
currently in bl.spamcop.net
, six are currently in the PBL, and a grand
total of (only) eight are zen.spamhaus.org. I don't know why these
numbers are so low.
(Locally, 20 were rejected for bad or missing reverse DNS, 8 for being dynamic IP addresses, one for being in the NJABL, one for being in the DSBL. Two of those have since changed their status and would not be blocked now.)
This week, Hotmail had:
- 4 messages accepted.
- no messages rejected because they came from non-Hotmail email addresses.
- 27 messages sent to our spamtraps.
- no messages refused because their sender addresses had already hit our spamtraps.
- 1 message refused due to its origin IP address being from the Cote d'Ivoire.
And the final numbers:
what | # this week | (distinct IPs) | # last week | (distinct IPs) |
Bad HELO s |
5489 | 399 | 1379 | 190 |
Bad bounces | 1521 | 1115 | 287 | 200 |
Ah. Well. That would explain a certain amount of everything; we seem to
have been forged as a spam origin in a big way, judging by how these
numbers have jumped so dramatically. The leading source of bad HELO
s
this week was 64.109.69.81 (218 attempts), followed by 84.12.142.111
(89 attempts), 202.134.71.85 (83 attempts), and then a lot more.
Bad bounces were sent to 1,421 different bad usernames this week, with
the most popular one being grabes
with 19 attempts, followed by
NortonPinero
with 10. SHOUGEE
returns from last week with 3
attempts, mixed in with all sorts of others that I am not going to try
to pick through, including ex-users.
My pick for the most ironic source of bad bounces this week has to be
AntiSpam.Awesome.net
. (No and no, respectively.)