Wandering Thoughts archives


Today I learned that a syslog server can be very silent on the network

Let's start with my tweet:

TIL that a UDP based syslog server can be so quiet that it falls out of switch MAC tables, causing syslog msgs to it to flood everywhere.

We have a central syslog server that people running 'sandbox' machines point their servers at to aggregate all of their syslog data at a single point (and off individual servers). Mostly because we're old fashioned, it uses the plain old UDP based method of syslog forwarding where client machines simply fire UDP packets in its direction.

Today, due to various recent events and questions I was running tcpdump on one of the machines here that's on the same network as the syslog server, partly to see what kind of crud I would discover swirling around on the network (there is always some). To my surprise I saw a whole burst of syslog traffic that was going to the machine (and it wasn't broadcast, either); the traffic was coming from a bunch of machines behind one firewall. I scratched my head for a bit until the penny dropped that the machine had fallen out of switch MAC to port tables.

The direct reason this could happen is that a UDP based syslog server doesn't naturally send out any packets. Unlike TCP streams, where it would at least be sending out ACKs and so refreshing the switch MAC tables, receiving UDP streams is entirely passive. The machine does do some things that generate packets periodically (such as NTP), but apparently it was doing them so infrequently that at least some switches timed its MAC entry out and started flooding traffic far enough to get to my observer machine. At the same time, the gateway for the sandbox network that was sending the syslog traffic didn't time out its ARP entry, so it never re-ARPed and thus provoked the syslog server into generating some packets to re-prime the switches.

(Or perhaps the outgoing packets it did generate didn't flow over enough of the switches involved.)

There's two lessons I draw from this. First, MAC table timeouts may vary significantly across different machines. They can vary not only in how long they are but also in what keeps an MAC table entry active. Either the switches timed out their MAC entries much faster than the gateway or the gateway's entries stayed alive when they were used for outgoing traffic while the switches didn't do that.

(I can come up with at least a justification for why a switch should be fairly aggressive about aging out MACs that it hasn't seen traffic from. Incorrect switch MAC tables can do significant damage, so better safe than sorry if something is silent.)

The second is that it may take only a single switch losing a MAC entry to cause significant flooding. If a machine doesn't generate broadcast traffic, many switches may not have MAC entries for it in the first place (if traffic to or from it never transits them). If a top level switch loses the MAC entry, it will flood the traffic to all of its ports and thus to many of those switches, who then flood it down to all of their ports and so on. The narrower the normal traffic flow is (for example, if it's mostly between one gateway and the machine), the fewer switches there are that have the MAC association in the first place and thus are in a position to stop such a flood.

(There are probably all sorts of interesting dynamics in this situation in terms of where outgoing traffic from the 'mostly silent' machine goes, what switches it passes through, and thus whether or not it will cause all of the relevant switches to pick up MAC entries again. The moral here is that nothing beats forcing the machine to generate some broadcast traffic in some way. There's direct traffic generation with eg arping, or there's just pinging a nonexistent IP every few minutes to force an ARP broadcast.)

sysadmin/SyslogAndSilence written at 22:43:38; Add Comment

A limitation of tcpdump is that you can't tell in from out

Suppose that you are running tcpdump on a network that's experiencing problems, on a machine which you know is supposed to be sending out broadcast ARP requests. When you do something that provokes an ARP request, you see two or three ARP broadcast packets from the machine in close succession (but not back to back; timestamps say there's a little bit of time between each). That sounds okay, doesn't it? Or at least it's not too crazy. There's a plausible case for rapid repeated ARPs in cases where the first request didn't get an immediate reply, and probably Unixes behave differently here so experience on say Linux doesn't necessarily tell you what to normally expect on OpenBSD or Illumos.

Except that there's a problem here. As far as I know, there is no way to tell from tcpdump output whether you're seeing these packets because the system is transmitting them or because it's receiving them. Of course normally you shouldn't (re-)receive packets that your system initially transmitted, but, well, network loops can happen.

Some versions of tcpdump have a switch to control whether it listens to input packets, output packets, or both. On OpenBSD this is -D, on different versions of Linux it is either -P (Ubuntu, CentOS) or -Q (Fedora); FreeBSD doesn't seem to have an option for this. Of course to use this option (if it's available) you have to remember that some sort of echo-back situation might be happening, but at least you can check for it.

This is definitely something that I'm going to have to try to remember for future network troubleshooting. Sadly it's not as simple as always using 'in' initially, because often you want to see both what the machine is sending and what it's getting back; you just would like to be able to tell them apart immediately.

(I believe that this is a limitation of the underlying kernel interfaces that tcpdump uses, in that most or all implementations simply don't tag packets with whether they're 'out' or 'in' packets.)

(This is just one of a number of ways that I've found to be misled by not looking sufficiently closely at what tcpdump seems to be telling me. Eg tcpdump -p versus not, various firewall and IPSec settings causing received packets to be dropped or even IPSec re-materializing packets, and not looking at MAC addresses (also).)

sysadmin/TcpdumpInOutLimitation written at 00:46:47; Add Comment

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.