A justification for some odd Linux ARP behavior

February 2, 2016

Years ago I described an odd Linux behavior which attached the wrong source IP to ARP replies and said that I had a justification for why this wasn't quite as crazy as it sounds. The setup is that we have a dual-homed machine on two networks, call them net-3 and net-5. If another machine on net-3 tries to talk to the dual-homed machine's net-5 IP address, it would send out an ARP request on net-3 of the form:

Request who-has <net-3 client machine IP address> tell <net-5 IP address>

As I said at the time, this was a bit surprising as normally you'd expect a machine to send ARP requests with the 'tell ...' IP address set to an IP address that is actually on the interface that the ARP request is sent out on.

What Linux appears to be doing instead is sending the ARP request with the IP address that will be the source IP of the eventual actual reply packet. Normally this will also be the source IP for the interface the ARP request is done on, but in this case we have asymmetric routing going on. The client machine is sending to the dual homed server's net-5 IP address, but the dual homed machine is going to just send its replies directly back out its net-3 interface. So the ARP request it makes is done on net-3 (to talk directly to the client) but is made with its net-5 IP address (the IP address that will be on the TCP packet or ICMP reply or whatever).

This makes sense from a certain perspective. The ARP request is caused by some IP packet to be sent, and at this point the IP packet presumably has a source IP attached to it. Rather than look up an additional IP address based on the interface the ARP is on, Linux just grabs that source IP and staples it on. The resulting MAC to source IP address association that many machines will pick up from the ARP request is even valid, in a sense (in that it works).

(Client Linux machines on net-3 do pick up an ARP table entry for the dual homed machine's net-5 IP, but they continue to send packets to it through the net-3 to net-5 gateway router, not directly to the dual homed machine.)

There is probably a Linux networking sysctl that will turn this behavior off. Some preliminary investigation suggests that arp_announce is probably what we want, if we care enough to set any sysctl for this (per the documentation). We probably don't, since the current behavior doesn't seem to be causing problems.

(We also don't have very many dual-homed Linux hosts where this could come up.)


Comments on this page:

By Ewen McNeill at 2016-02-02 04:50:29:

This is a "feature" of Linux -- its ARP implementation has an unusual (eg, unlike other unixes) interpretation of the ARP RFCs, which will lead to multihomed hosts sending ARP requests with the "wrong" IP address in some instances. And that generally causes great confusion elsewhere. (IIRC the Linux behaviour is claimed to be permitted by the RFCs, and the wording in the RFCs is broad enough to suggest it probably is permitted. It just doesn't seem an especially sane default to me.)

Yes "arp_announce" is the one you want from memory. IIRC you want "2" as a value (but "1" might solve the immediate symptoms); "0" is the default. Beware that there are multiple places to set it -- an "any", an "all", and each individual interface. (The sysctl.txt docs you link to claim "The max value from conf/{all,interface}/arp_announce is used.", but generally I end up setting "all", the interface, and "any" to the same value to minimise later surprises.)

While there you may want to set "arp_ignore" (the receive side of "arp_announce", basically), and possibly "arp_filter" depending on your topology. Typically I end up remembering why I had to set all of these every time I set up a multi-homed Linux host...

Ewen

Yep. Linux's default behavior is counter intuitive.

I think Linux's view is that the IP belongs to the host, not the NIC. As such, the traffic for the IP ~> host can come in and go out any interface.

There is a /proc tunable that can be fobbed to change this behavior. (I believe one of the other comments mentions more details.) This is just an explanation as I understand it of the "why".

Written on 02 February 2016.
« One thing I don't like about Fedora is slow security updates
You aren't entitled to good errors from someone else's web app »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Tue Feb 2 00:38:13 2016
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.