2012-07-06
Why Exim has a single queue for all email
In a recent entry I mentioned that Exim puts all pending email into a single queue instead of having multiple queues (with one per destination domain or the like). Given that multiple queues make it much easier to insure that a single slow destination doesn't affect email to other places, you might wonder why Exim has made such a choice. The answer is that a single queue is effectively forced on Exim because of a core design decision.
All mailers have two conceptual stages for processing a message: they take the initial top level addresses and map them into destinations, then deliver the email to the various destinations. Most mailers do this mapping process once and then save all of the destination addresses, but Exim instead repeats the mapping process every time it retries a message. And this is what makes the difference.
When you do this mapping process only once, it's easy to have per-destination queues; you determine the destinations at the start, add the message to the appropriate queues, and you're done; you can then go through each queue separately. When the mailer is redoing the mappings every time (and the results can change), what you generally wind up with is not queues but limits on how many simultaneous deliveries you can have to any single destination. You can implement this, but it requires much more coordination and complexity than simply starting N delivery processes for a particular queue.
(You wind up with a situation where each delivery process has to check in with something every time it wants to deliver to a new destination.)
For a mailer that redoes mappings every time, the straightforward approach is what Exim has: a single queue of messages and completely independent delivery processes (with per-message locking). The downside of this is that you can wind up with a bunch of your delivery processes all trying to deliver to the same unresponsive destination.
Exploring an ARP mystery: a little Linux surprise
Lately, some of our OpenBSD machines have been periodically logging kernel messages like the following:
arp: attempt to overwrite entry for IPADDR on bge0 by MAC-ADDR on nfe0
What this message means is that the OpenBSD machine had previously acquired an ARP entry for IPADDR on bge0 but now it was seeing the same IP address advertised in an ARP message from MAC-ADDR on nfe0. There are a number of things that can cause this; most of them are alarming.
Now I need to describe the network topology. This OpenBSD machine is obviously dual-homed on bge0 and nfe0; nfe0 is 'net-3', the primary subnet that most of our servers live on, and bge0 is 'net-5', a secondary subnet that we still have some machines on due to history. IPADDR is the net-5 IP address for our Samba server, which is also dual-homed on net-3 and net-5. Due to history again, the official IP address that everyone uses for the Samba server is IPADDR on net-5, not the Samba server's net-3 IP address.
When we fired up tcpdump on the Samba server's net-3 interface, we observed two things. The first was that it was sending TCP replies to net-3 machines out on net-3 with IPADDR (on net-5) as the source IP address. A bit of thought showed that this was the expected behavior of a traditional dual-homed host; given that outgoing traffic is normally routed based purely on the destination IP address, any traffic to a net-3 host would be routed out the machine's net-3 interface even if it was a reply to something that came in on the net-5 interface to a net-5 IP address.
(Such asymmetric routing normally only causes problems if you have a firewall in the way on one path, which isn't the case here.)
Second, we saw the Samba server generating ARP requests on its net-3 interface that looked like:
Request who-has <net-3 IP address> tell IPADDR
This was a bit surprising. Normally you would expect a machine to send ARP messages with the reply IP address set to an IP address that is actually on the interface and the subnet that the ARP request is directed to. In this case you'd expect that the Samba server would ARP listing its net-3 IP address, not its net-5 one.
(We could easily reproduce these ARP messages and show that they caused the OpenBSD kernel messages by deleting the ARP cache entry for a net-3 machine that had a connection to the Samba server. The next time the Samba server needed to send a reply packet to the net-3 machine, ding, out went an ARP message with IPADDR as the reply IP address.)
My only theory right now is that under some circumstances, Linux will send out ARP requests using not the address of the interface in question but instead the source IP address of the local IP packet that it wants to send (and thus that caused the ARP request to be generated). This is, in a particular view, a sensible thing to do. But as we can see, it's something that can cause other machines to twitch and I think it's at least a little bit surprising. Okay, quite a lot surprising.
(It'll take another entry to try to justify this as a sensible thing in the right view.)