One reason why people buy Ethernet taps

October 28, 2008

There are a number of people who will sell you rather expensive network tap boxes (eg here). Since hearing about them and discovering the actual prices, I've felt that traffic monitoring switches with mirroring ports (despite the VLAN issue) and dual-NIC PCs running bridging software made them pointless except for people with very high end needs; they were neat in a theoretical way, but not something we would ever need in practice, since the alternatives were perfectly good.

(There is a lot of elaborate equipment that would be cool to have around but I must reluctantly admit we don't exactly need.)

Let me retract that blithely optimistic view of mine.

We have lately been attempting to debug a switch issue involving performance problems with traffic between a 100 mbit machine and a gigabit machine, and we think that part of the problem may be related to inter-switch flow control issues (specifically, who does it when). As we've been discovering, the problem with monitoring switches and bridges is that they are not completely transparent; both switches and bridges change the layer 2 behavior of the network, things like how pause frames or STP broadcasts are handled, and often at a level that's too low for you to really monitor or influence.

(At least some switches generate or don't generate pause frames on links based on low-level negotiations with whatever is on the other end of the link; put a bridge in, and you may have just changed what gets negotiated. Plus, pause frames do not pass through bridges, or at least not through the bridge implementations that we've been trying to use.)

Most of the time this doesn't matter and you don't think about it. But right now this matters quite a lot to us, and it has been very frustrating to find out that there is basically nothing we can do to monitor our testing to find out what is going on, because anything we add to the test environment changes the behavior (or at least could be doing so, which means that we can't trust the results).

Let me tell you, network taps are looking awfully tempting right about now. (We probably still can't justify the expense, though; this is hopefully a one-time problem.)

Comments on this page:

From at 2008-10-28 13:05:42:

You always have the option of constructing your own network tap. It's not as hard as one might think. I wrote a blog post a while back discussing how I did it for my home environment.

Of course this solution maybe isn't enterprise ready, but it's cheap and it works.

By rdump at 2008-10-28 15:42:58:

In general, for performance, stability and security monitoring, taps are preferrable.

First consider that you're using optimized switches because you need to switch traffic quickly. A SPAN is a secondary function that saps switch CPU and internal bandwidth resources. Plus, a SPAN won't be as solid a feature, as it's not the core function of the device.

Second consider bandwidth contention. You're using the same switches to transfer your SPANned data as are carrying the first copy of the traffic. It's not hard to get into a state where the doubling of traffic through a switch or over a link, with only the first half of it able to do TCP congestion backoff, causes outages for both the original and the copy.

Third consider data quality. The switch is necessarily going to prioritize regular traffic over a SPAN when it comes to momentary contention. This means dropped frames on the SPAN that never the less went through on the regular ports. Measuring this is often not easy to get right. It can provide extra load itself, but it also has to be done at each point along the data path.

Finally consider what has to be turned off first in the event of problems. Given the contention issues noted above, when debugging a network outage or responding to a high bandwidth security problem it's very often going to be necessary to turn off the SPAN-based monitoring. But that's exactly what you don't want to do when you need the monitoring to assess the problem.

Our answer is to prefer taps with dedicated connections for the data feeds for any permanent installation. This avoids a continuing maintenance hassle for the SPANs. It also avoids contention for switch and bandwidth resources with regular traffic, keeping the monitoring out of the way of what's being monitored.

In the end, the only thing that'll push us to using SPANs instead of taps for our security monitoring is the lack of light budget in existing fiber runs for the splitter. That will likely be solved in the long term with careful engineering as network devices and links are upgraded.

Written on 28 October 2008.
« What we keep track of for ZFS pools
Problems I have seen with switch port mirroring »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Tue Oct 28 01:22:38 2008
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.