The puzzle of packets to your host that your host doesn't respond to

March 13, 2015

Today we tried to replace an old machine by having a newly built version of it take over from it. We built the new version using a temporary name and IP address, then at the transition time shut down the old version, reconfigured the new version to use the real name, primary IP address, and IP aliases, and rebooted it so it would come up with the new configuration. Unfortunately, when it came up it had a very weird problem: machines on the local network could talk to all of its IP addresses, but machines on other networks could only talk to its primary IP address, not any of the IP aliases. The other IP aliases didn't respond to packets.

To make it more mysterious, during the troubleshooting attempt my coworker ran tcpdump on the server itself and actually saw his pings to an IP alias coming in but not being answered:

www.cs # tcpdump -i eth0 "host workstation.cs"
08:16:22.929594 IP workstation.cs -> support.cs: ICMP echo request, id 18707, seq 33, length 64
08:16:23.929546 IP workstation.cs -> support.cs: ICMP echo request, id 18707, seq 33, length 64

Then after a while (but not a short while) the problem went away; you could ping and otherwise talk to the IP aliases from machines on other networks. Oh, and we could reproduce this (we did it when failing back to the old version, which made us very alarmed).

(There's no firewall involved here, just to cover that.)

What's going on here is the inverse of something I've seen before with outgoing traffic:

You can't tell if packets are really going to your machine without checking the destination Ethernet address. The destination IP alone is not good enough.

Sure, these packets look like they're going to our server. But actually they aren't; they're being sent to the Ethernet address of the old version of the server, not the Ethernet address of the current one. The new version of the server is seeing them for two reasons. First, the switches on the network have aged out the Ethernet address to port association for the old Ethernet address, so the switches have to flood these packets to all ports. Second, tcpdump is running in its default promiscuous mode so it's picking up this flooded traffic (and displaying only the IP level information). The kernel knows better and is quietly ignoring these packets just like it ignores all sorts of other random crud that shows up on the network port.

(If we weren't running tcpdump with the interface in promiscuous mode, the packets probably would be ignored at the hardware level and not even reach the kernel.)

The reason that the packets had the old Ethernet address is that our top level router was caching the IP to Ethernet address association for a surprisingly long time. Hosts on the local network were directly re-ARPing for the IP aliases and getting the new server's Ethernet address, so they could talk to it, but packets from other networks went through the router and the router just used the old Ethernet address it had cached. As for traffic to the server's primary IP working, we think that the Ethernet address for the server's primary IP was getting updated on the router because the server generates outgoing traffic from that IP address, forcing the router to update. The problem went away after a while because the router timed out its cached Ethernet address information, re-ARPed, and finally had the correct new Ethernet addresses.

(Once we went searching on the Internet, we discovered that this is known behavior of our particular make of router. Fortunately there's a way to forcefully purge such a cached entry; unfortunately we're going to have to remember to do this on any migration or manual failover of any machine that has IP aliases. And it's a good thing we're not trying to do automated failover of IP aliases between machines.)

Written on 13 March 2015.
« My feelings about GRUB 1 versus GRUB 2
Using an automounter doesn't always help with bad NFS servers »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Mar 13 01:21:54 2015
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.