A fun tale of network troubleshooting involving VLANs and MACs

December 18, 2015

The following is not my story; it comes from my co-workers, who had the real fun of trying to figure this one out and then finding a fix.

To start with, let's set the background. We have an (OpenBSD) routing firewall machine that sits on a network segment whose egress router is not under our control. Actually, we have two of them, one active and one as a warm spare (it's on and being updated, but it is not connected to any of the production networks because otherwise it would fight the live firewall for the public IPs). A while back, as part of trying to fail over from the live machine to the warm spare, we discovered that the egress router for the network caches ARP information for a long time. Like, apparently, hours. This was obviously no good for being able to switch over (such as in the case of hardware failure). Since the egress router is not under our control, the only thing we could really do was explicitly set the warm spare to have the same Ethernet address as the active machine.

(This was tested at the time it was set up and worked, but we believe the test at the time was misleading.)

Recently my co-workers wanted to swap from the active machine to the warm spare, because the active machine had been up for literally years (we don't update OpenBSD all that often). Unfortunately, when they made the swap the (ex-)warm spare was not reachable on its public IP, so they failed back to the active machine and took the warm spare off for testing. Testing established that the warm spare was showing 'incomplete' for other machines in its ARP cache, although other machines picked it up fine for their ARP caches. Further, trying to inspect the traffic with tcpdump made the network suddenly work, but things broke again when they stopped tcpdump. Oh, and the problem was specific to using our preferred Intel Ethernet cards; if the warm spare was switched to use non-Intel network hardware, everything worked.

Now, it happens that this machine has a slightly unusual network configuration. Because it needs to talk to a number of external networks, it actually gets all of its external networks as tagged VLANs over a single physical network port. When we changed the machine to use the MAC of the active machine, we had set the Ethernet address on the VLAN for that particular network, because that was the network that mattered; we didn't change the MAC of anything else.

It turned out that this was the problem. Using Intel cards on our (old) version of OpenBSD, when the MAC of the VLAN differed from the MAC of the underlying physical interface and the interface was not in promiscuous mode, ARP (at least) didn't work because the kernel apparently never received the replies to its ARP queries. If you put the interface into promiscuous mode, such as by running tcpdump, things suddenly worked; the kernel received ARP replies and so on. We think that the whole setup worked when tested because we likely tested it with tcpdump running to watch traffic (and verify what MACs were being used).

(The obvious suspect here is hardware level receive filtering; perhaps the hardware is only being set by the driver to recognize the physical port MAC as its MAC. This is a driver and/or hardware issue, but these things happen.)

Once my co-workers figured out what the problem was, the fix was simple: explicitly set the MACs of both the physical port and all the VLANs on it to the active machine's MAC. But getting there took a whole frustrating and puzzling journey. This wasn't exactly a Heisenbug, but until my co-workers noticed the pattern that running tcpdump made it disappear it did look like one.

(Using 'tcpdump -p' is the obvious thing for the future, but I don't know if it would actually have worked in this situation. Still, it's something to try to remember for the next time around. Maybe tcpdump should default to -p these days.)


Comments on this page:

From 80.229.66.11 at 2015-12-19 07:39:21:

Give CARP at try perhaps?

By James (trs80) at 2015-12-19 09:44:54:

The other option is to send a gratuitous ARP on that interface which should update the ARP cache on the router. Pacemaker (well, Cluster Resource Agents, which it uses - Linux HA is a right mess of intertwined by theoretically independent projects) ships a binary called send_arp for this purpose. Not sure what the BSD option is, apart from carp, as mentioned.

By dozzie at 2015-12-19 09:57:41:

Give CARP at try perhaps?

Better not. CARP is a protocol implemented in nothing but BSD's daemon. VRRP is much better choice, especially that it's a standard.

Also, VRRP daemons (or keepalived at least) send ARP updates out by themselves after failover.

By cks at 2015-12-21 11:40:41:

For the record, send_arp turns out to be a version of arping, which may be more widely available and installed. I'm not yet completely confident I understand the right usage, but two versions worked for me:

arping -U -I IFACE -s YOURIP ROUTERIP

arping -U -I IFACE YOURIP

The latter seems preferable, if only because it's shorter.

Thank you, James! This is going to be handy.

Written on 18 December 2015.
« Some things about the XSettings system
There are three places spam filtering can happen these days »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Dec 18 23:56:37 2015
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.