Wandering Thoughts archives

2008-10-31

Why realistic UDP bandwidth testing is hard

Since very few real applications just blindly fire UDP packets at a target, when you want to know what UDP bandwidth you can get you are generally interested in the speed at which you can have a UDP-based conversation between two sides. In other words, you need a bidirectional tester, not just the sort of simple unidirectional one you can use with TCP bandwidth testing.

(Of course, TCP is actually bidirectional too. It's just that the conversation inherent in TCP is already handled by your TCP stack.)

Bidirectionality is not hard in and of itself. But when you write your own code to do conversations, you have to explicitly handle all of the aspects of the conversation. This means that you have to decide things like how to behave in the face of delayed or dropped packets (how do you notice? what is your retry policy?) and how much you're going to do in parallel and how (are you going to have an outstanding window size or multiple simultaneous outstanding requests? the two are subtly different).

And this is the real problem: you are presumably testing your UDP bandwidth because you are trying to assess how some real application or protocol will perform. Because everyone has to answer these questions themselves when writing UDP-based protocols, different applications can come up with quite different ones. If you do not match how your tester deals with these issues with how your target application does, your performance numbers may not actually match the real world, which means that your tester is not much good.

So: how do you find out what your target protocol does (or will do) in these situations, so that you can faithfully imitate it in your tester (in all of its possible complexity)?

All I can say is 'good luck with that', because now you know why realistic UDP bandwidth testing is hard. It's not the coding; it's figuring out what to code in order to get meaningful answers.

(Speaking from personal experience, it is very easy to create a UDP bandwidth tester that gives hopelessly optimistic and meaningless answers. And it's probably equally easy to create one that gives hopelessly pessimistic answers.)

UDPBandwidthTestProblems written at 23:08:23; Add Comment

2008-10-29

Problems I have seen with switch port mirroring

For my sins, I have spent a modest amount of time testing how well a few switches did port mirroring. The depressing result was that neither of the switches I tested had port mirroring that worked really well for our needs, but on the other hand it has given me an appreciation for the issues that port mirroring can have.

(We use port mirroring for simple network traffic volume monitoring, and we would like to be able to use it for network debugging; at the time I was investigating if we could use some cheap switches for either purpose, instead of having to dedicate some of our relatively expensive 24-port gigabit switches to the task.)

Two problems stick in my mind:

  • one switch would every so often de-VLAN a frame on the mirroring port; a frame that I knew had gone into the switch with a VLAN tag on it would come out the mirroring port without the VLAN tag, as an untagged frame.

    (Other traffic was mirrored with VLAN tags intact, which is what we need.)

  • one switch choked the total bandwidth of the port being mirrored down to whatever the monitoring port could support; frames that could not be mirrored to the monitoring port were just dropped instead of passed through. The net result was that the port's total bandwidth (send plus receive) was reduced to half of what it would be without port mirroring turned on.

This very nicely illustrates the perils of port mirroring, because while the first switch had a bug, the second switch was making a rational design decision. If you were using port mirroring for security monitoring, you would not want an attacker to be able to sneak packets past you because he has saturated your monitoring point, so dropping frames that cannot be mirrored is the right decision. But we have different priorities; preserving our internal NAT firewall's total bandwidth is more important to us than having our traffic monitoring box see every last packet.

SwitchMirroringProblems written at 01:41:40; Add Comment

2008-10-24

A little neat detail of the BitTorrent protocol

To simplify, BitTorrent breaks files up into blocks, checksums each block, and then has all of the clients (the peers) swap blocks back and forth. The checksums for all the blocks are in the .torrent metainfo file that each client has, so you know when you get a bad copy of a block (and as a side effect, a malicious client can't serve you trojan data).

(One might think that trojan data would be relatively harmless. But BitTorrent is used to distribute things that are then executed, such as bootable ISO images of Linux distributions; allowing an attacker to substitute their own version of such things would be very bad.)

Well, sort of. One of the little neat bits of the BitTorrent protocol is that there are actually two different blocks and two different block sizes: the metainfo block size and the peer transfer block size. The metainfo block size is set when you create the torrent metainfo file and is usually fairly large; the peer transfer block size is decided by the peers, and is usually much smaller than the metainfo block size.

(The typical peer block size is 16 KB; the typical metainfo block size ranges from 256 KB to 1 MB. I'm using 'block' for both here, but the official term for metainfo blocks is 'pieces'.)

The block size difference is ultimately because the goals of the two sorts of block sizes conflict. You want a big metainfo block size so that you don't have very many checksums and you keep the size of torrent metainfo files down, while you want a small peer block size so that peers do small quick transfers instead of big long ones. The small peer block size has other consequences, including that you can get the parts of a single metainfo block from several peers at once, which is important when peers often have asymmetric bandwidth with slow upstream rates.

BitTorrentBlocksizes written at 00:55:56; Add Comment

2008-10-19

The advantages of iSCSI's MC/S for multipathing

In theory, iSCSI has a feature called 'multiple connections per session', commonly abbreviated as MC/S. In iSCSI terminology, a 'session' is a single logical connection between an iSCSI initiator and an iSCSI target, and a 'connection' is a TCP connection. MC/S lets a session be composed of multiple TCP connections, each of which can use a different set of IP addresses and thus a different network path.

(In practice, apparently very few iSCSI initiators and targets actually support MC/S. I suspect that MC/S is quite complicated at the protocol level, much like the rest of iSCSI, and this has created a disincentive to actually implement it.)

MC/S isn't the only way to do multipathing with iSCSI; an iSCSI client can also do it at a higher level, by creating multiple sessions (each with its own set of network parameters, so it uses a different network path). So why have MC/S at all, and why have I said in the past that MC/S would be the best way?

The simple answer is that MC/S is the best way because when you use MC/S, everyone involved actually knows that there is multipathing going on. The problem with multiple sessions is that at least the iSCSI target has no idea that these two separate sessions are a single object; instead, it treats them as entirely separate.

For example, the two sessions will probably have separate command ordering constraints; if there is a write barrier or command flush on one, it won't affect commands flowing over the other session. The result is that your write barriers are only partial barriers unless the high-level multipathing code in the initiator handles write barriers specially, which may cause heartburn for your filesystems. (You can probably think of other potential problems.)

In theory the initiator's top level multipathing code can cope with this and should; however, there is likely to be at least a performance penalty. Consider how you would have to implement write barriers. It is not good enough to simply send write barriers down both sessions, because there is nothing that forces cross-session synchronization (so that commands on session A cannot go to the disk until after the write barrier on session B has completed); instead you are going to need to do something like send write barriers down both sessions and then send writes to only one session until both barriers are reported as complete.

(Here I'm assuming that you don't have to worry about reads from one session crossing writes from the other session and returning what is from your perspective stale data, since you should be satisfying such reads out of your own local cache. However, there are probably situations where this is not entirely true.)

My understanding is that it is for this reason that documentation for various iSCSI targets I've read strongly suggests not using multiple sessions from the same initiator. One of the reasons that I feel that we can get away with it in our setup is that we are using ZFS, which is already reasonably cautious about disks lying to it.

(Also, we don't have a choice; we have to have connection redundancy, and neither end of our iSCSI setup supports MC/S right now.)

ISCSIMCSAdvantages written at 23:13:38; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.