2013-03-06
How we make Exim cut off bounce loops
Under certain circumstances it's possible to get bounce loops; a message bounces, the bounce for the message bounces, the bounce of the bounce then bounces, and you repeat endlessly (possibly until something explodes). When this happened to us, we decided to fix our Exim configuration so that it would detect and suppress these bounce loops (or at least as many of them as possible).
How we do it is a close relative of how we make Exim discard bounces
of spam. First, we created a custom bounce
message by setting bounce_message_file
to an appropriately formatted
file. This lets us add headers to bounce messages and those headers can
refer to headers in the message bouncing. So we added a very special
header:
X-CS-Bounce-Count: I$h_x-cs-bounce-count:
I have bolded the little bit of magic. What this does is count how many
bounce-of-bounces we've seen by giving every such bounce one more 'I
'
than the previous bounce had (if there's no previous bounce the header
is empty and we start out with one I
).
We could handle detecting and discarding messages with too high a bounce count entirely in an Exim router, but it turns out that it is much easier and more convenient to do most of the work in an Exim filter. So what we have is a simple router that runs all messages through an Exim filter (although now that I look at it, we should make the router conditional on the presence of the header because there's no point otherwise). Thus we get the router:
eat_looping_bounces: debug_print = "R: eat_looping_bounces for $local_part@$domain" driver = redirect file = <dir>/exim-loop-filter allow_filter allow_freeze user = Debian-exim no_verify no_expn
(This router should be among your very first routers to insure that it runs before any other router that could conceivably generate a bounce.)
The Exim filter itself (with comments removed) is simply:
logfile /var/log/exim4/discardlog if ${strlen:$h_x-cs-bounce-count:} is above 4 then logwrite "$tod_log junked looping-bounce $message_id from <$sender_address> to <$local_part@$domain> subject: $h_subject:" seen finish endif
(We allow more than one bounce just in case, as a safety measure. When I put all of this together I didn't want to sit down and go through the work to carefully make sure that we could never wind up with a bounce counter greater than one in a legitimate situation.)
Unlike our discarding of bounces of spam we don't carefully guard this router (and this filter) with conditions to make sure that we're really truly dealing with local bounces. The thing about bounce loops is that they can easily involve outside machines as well, so we want to squelch them whenever they pass through our mail machine.
Sidebar: how bounce loops happen
Some of my readers may now be wondering how on earth you get a
bounce loop, since bounces are supposed to be sent using a null
sender address (often written as '<>
') and all messages to the
null sender (bounces included) are just discarded (to stop exactly
this sort of loop). The sad answer is that not all programs that
resend messages are careful to preserve the null sender address on
email they send out. In particular this includes the mail forwarding
done by our version of procmail when it's run by a user from their
.forward
; instead it winds up changing the sender address to the
user's email address (because it actually resubmits the email,
exactly as if the user had sent it in the first place). This creates
an immediate loop if the destination of the forwarding ever refuses
some piece of email; the bounce of the refusal goes to the user,
their .forward
and procmail re-forwards it to the destination,
the destination refuses it again, the system generates a new bounce
to the user, repeat endlessly.
(The incident that saw us discover this issue managed to somehow multiply the email as it looped the bounces around. The result was quite dramatic.)
Turning off delays on failed password authentications
Today I got around to something on my office and home workstations that
I should have done years ago: I turned off all delays after you mistype
a password, both for ssh logins and for local things like su
.
I've been an advocate against network authentication delays for quite some time. Over time I've come to realize that the same logic more or less applied to local authentication delays too. In theory they're there to slow down mass password guessing attacks, but in practice all they were doing was irritating me.
(They generally didn't slow me down because I was well trained that if
su
wasn't instantly successful, I'd mistyped the password and I should
open up another 'su to root' window and do it again.)
As you'd expect (and hope), on Linux this is controlled through PAM.
On Fedora 17, it's sufficient to change the 'auth' usage of the
pam_unix.so
module to have the 'nodelay
' parameter; this tells it
to not ask the whole PAM system for a standard delay if authentication
fails. I had to change both /etc/pam.d/passwd-auth
(apparently used by
sshd) and /etc/pam.d/system-auth
(used by su). A typical line is now:
auth sufficient pam_unix.so nullok try_first_pass nodelay
(On my Fedora 17 machines both files have big warnings about their contents being autogenerated and they'll get overwritten by authconfig. Since I can't remember the last time that I ran authconfig, I didn't let this worry me.)
This gives you no delay at all. If you'd still like a little bit of
delay you need to add a mention of the pam_faildelay.so
module.
I believe that it goes before pam_unix.so
and it should look
something like:
auth optional pam_faildelay.so delay=250000
(This delay is a quarter of a second. See the manpage.)
I haven't tested an Ubuntu system, but inspection shows that it does
things a little bit differently. Based on looking at files, it appears
that you want to modify /etc/pam.d/common-auth
and then either remove
the mention of pam_faildelay.so
from /etc/pam.d/login
or modify
the delay time.
Having no delay on local password authentication is a potential security
exposure to local users; it allows a local user to automate guessing
attacks as fast as a program can run su
, passwd
, or the like. If
this concerns you, use pam_faildelay.so
to add a small delay; even a
tenth of a second of delay will drastically slow down an attacker.
PS: my excuse for not doing anything about network authentication delays on my own systems for so long is that I just use SSH keys, so sshd almost never asks me for a password in the first place.