Exploring an ARP mystery: a little Linux surprise
Lately, some of our OpenBSD machines have been periodically logging kernel messages like the following:
arp: attempt to overwrite entry for IPADDR on bge0 by MAC-ADDR on nfe0
What this message means is that the OpenBSD machine had previously acquired an ARP entry for IPADDR on bge0 but now it was seeing the same IP address advertised in an ARP message from MAC-ADDR on nfe0. There are a number of things that can cause this; most of them are alarming.
Now I need to describe the network topology. This OpenBSD machine is obviously dual-homed on bge0 and nfe0; nfe0 is 'net-3', the primary subnet that most of our servers live on, and bge0 is 'net-5', a secondary subnet that we still have some machines on due to history. IPADDR is the net-5 IP address for our Samba server, which is also dual-homed on net-3 and net-5. Due to history again, the official IP address that everyone uses for the Samba server is IPADDR on net-5, not the Samba server's net-3 IP address.
When we fired up tcpdump on the Samba server's net-3 interface, we observed two things. The first was that it was sending TCP replies to net-3 machines out on net-3 with IPADDR (on net-5) as the source IP address. A bit of thought showed that this was the expected behavior of a traditional dual-homed host; given that outgoing traffic is normally routed based purely on the destination IP address, any traffic to a net-3 host would be routed out the machine's net-3 interface even if it was a reply to something that came in on the net-5 interface to a net-5 IP address.
(Such asymmetric routing normally only causes problems if you have a firewall in the way on one path, which isn't the case here.)
Second, we saw the Samba server generating ARP requests on its net-3 interface that looked like:
Request who-has <net-3 IP address> tell IPADDR
This was a bit surprising. Normally you would expect a machine to send ARP messages with the reply IP address set to an IP address that is actually on the interface and the subnet that the ARP request is directed to. In this case you'd expect that the Samba server would ARP listing its net-3 IP address, not its net-5 one.
(We could easily reproduce these ARP messages and show that they caused the OpenBSD kernel messages by deleting the ARP cache entry for a net-3 machine that had a connection to the Samba server. The next time the Samba server needed to send a reply packet to the net-3 machine, ding, out went an ARP message with IPADDR as the reply IP address.)
My only theory right now is that under some circumstances, Linux will send out ARP requests using not the address of the interface in question but instead the source IP address of the local IP packet that it wants to send (and thus that caused the ARP request to be generated). This is, in a particular view, a sensible thing to do. But as we can see, it's something that can cause other machines to twitch and I think it's at least a little bit surprising. Okay, quite a lot surprising.
(It'll take another entry to try to justify this as a sensible thing in the right view.)