Wandering Thoughts archives

2013-03-06

How we make Exim cut off bounce loops

Under certain circumstances it's possible to get bounce loops; a message bounces, the bounce for the message bounces, the bounce of the bounce then bounces, and you repeat endlessly (possibly until something explodes). When this happened to us, we decided to fix our Exim configuration so that it would detect and suppress these bounce loops (or at least as many of them as possible).

How we do it is a close relative of how we make Exim discard bounces of spam. First, we created a custom bounce message by setting bounce_message_file to an appropriately formatted file. This lets us add headers to bounce messages and those headers can refer to headers in the message bouncing. So we added a very special header:

X-CS-Bounce-Count: I$h_x-cs-bounce-count:

I have bolded the little bit of magic. What this does is count how many bounce-of-bounces we've seen by giving every such bounce one more 'I' than the previous bounce had (if there's no previous bounce the header is empty and we start out with one I).

We could handle detecting and discarding messages with too high a bounce count entirely in an Exim router, but it turns out that it is much easier and more convenient to do most of the work in an Exim filter. So what we have is a simple router that runs all messages through an Exim filter (although now that I look at it, we should make the router conditional on the presence of the header because there's no point otherwise). Thus we get the router:

eat_looping_bounces:
    debug_print = "R: eat_looping_bounces for $local_part@$domain"
    driver = redirect
    file = <dir>/exim-loop-filter
    allow_filter
    allow_freeze
    user = Debian-exim
    no_verify
    no_expn

(This router should be among your very first routers to insure that it runs before any other router that could conceivably generate a bounce.)

The Exim filter itself (with comments removed) is simply:

logfile /var/log/exim4/discardlog
if ${strlen:$h_x-cs-bounce-count:} is above 4
then
    logwrite "$tod_log junked looping-bounce $message_id from <$sender_address> to <$local_part@$domain> subject: $h_subject:"
    seen finish
 endif

(We allow more than one bounce just in case, as a safety measure. When I put all of this together I didn't want to sit down and go through the work to carefully make sure that we could never wind up with a bounce counter greater than one in a legitimate situation.)

Unlike our discarding of bounces of spam we don't carefully guard this router (and this filter) with conditions to make sure that we're really truly dealing with local bounces. The thing about bounce loops is that they can easily involve outside machines as well, so we want to squelch them whenever they pass through our mail machine.

Sidebar: how bounce loops happen

Some of my readers may now be wondering how on earth you get a bounce loop, since bounces are supposed to be sent using a null sender address (often written as '<>') and all messages to the null sender (bounces included) are just discarded (to stop exactly this sort of loop). The sad answer is that not all programs that resend messages are careful to preserve the null sender address on email they send out. In particular this includes the mail forwarding done by our version of procmail when it's run by a user from their .forward; instead it winds up changing the sender address to the user's email address (because it actually resubmits the email, exactly as if the user had sent it in the first place). This creates an immediate loop if the destination of the forwarding ever refuses some piece of email; the bounce of the refusal goes to the user, their .forward and procmail re-forwards it to the destination, the destination refuses it again, the system generates a new bounce to the user, repeat endlessly.

(The incident that saw us discover this issue managed to somehow multiply the email as it looped the bounces around. The result was quite dramatic.)

sysadmin/EximStopBounceLoops written at 22:55:47; Add Comment

Turning off delays on failed password authentications

Today I got around to something on my office and home workstations that I should have done years ago: I turned off all delays after you mistype a password, both for ssh logins and for local things like su.

I've been an advocate against network authentication delays for quite some time. Over time I've come to realize that the same logic more or less applied to local authentication delays too. In theory they're there to slow down mass password guessing attacks, but in practice all they were doing was irritating me.

(They generally didn't slow me down because I was well trained that if su wasn't instantly successful, I'd mistyped the password and I should open up another 'su to root' window and do it again.)

As you'd expect (and hope), on Linux this is controlled through PAM. On Fedora 17, it's sufficient to change the 'auth' usage of the pam_unix.so module to have the 'nodelay' parameter; this tells it to not ask the whole PAM system for a standard delay if authentication fails. I had to change both /etc/pam.d/passwd-auth (apparently used by sshd) and /etc/pam.d/system-auth (used by su). A typical line is now:

auth   sufficient    pam_unix.so  nullok try_first_pass nodelay

(On my Fedora 17 machines both files have big warnings about their contents being autogenerated and they'll get overwritten by authconfig. Since I can't remember the last time that I ran authconfig, I didn't let this worry me.)

This gives you no delay at all. If you'd still like a little bit of delay you need to add a mention of the pam_faildelay.so module. I believe that it goes before pam_unix.so and it should look something like:

auth   optional     pam_faildelay.so delay=250000

(This delay is a quarter of a second. See the manpage.)

I haven't tested an Ubuntu system, but inspection shows that it does things a little bit differently. Based on looking at files, it appears that you want to modify /etc/pam.d/common-auth and then either remove the mention of pam_faildelay.so from /etc/pam.d/login or modify the delay time.

Having no delay on local password authentication is a potential security exposure to local users; it allows a local user to automate guessing attacks as fast as a program can run su, passwd, or the like. If this concerns you, use pam_faildelay.so to add a small delay; even a tenth of a second of delay will drastically slow down an attacker.

PS: my excuse for not doing anything about network authentication delays on my own systems for so long is that I just use SSH keys, so sshd almost never asks me for a password in the first place.

linux/NoMorePasswdAuthDelays written at 00:26:14; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.