The processing flow of a network copying program

April 9, 2010

For my sins, I have dabbled in writing netcat-like programs for some time, things that take standard input, send it off to somewhere over the network, and write to standard out what they get back from the network. In the process I have formed very definite opinions about how these programs should behave in order to be most useful (in scripts and so on), and I feel like writing it down.

For the most part, a network copying program is straightforward; you use select() or poll(), handle buffering carefully, and bounce data back and forth. The tricky things are how you handle various ways that the conversation can end:

  • when you see end of file on standard input, signal to the network that there is no more input coming but keep processing network input. On standard TCP connections, you do a shutdown(fd, SHUT_WR); I don't know what you do with SSL, but I hope there's some similar equivalent. (You're on your own with UDP.)

    You have to keep reading from the network because it may well still be sending you output, especially in a scripted situation where you are creating an immediate end of file on standard input by doing things like:

    (echo A; echo B) | tcp host port

    You have to tell the network server that you've stopped sending things, because otherwise some network servers will never close the connection; you'll just sit there forever. Also, it makes it simpler to use the program to do certain sorts of checks (eg, to see if a port is answering at all) without needing special program features.

  • when you see end of file on network input, just exit. This is not symmetrical with the handling of standard input, but it turns out to be the most useful in general; there are very few situations where a network server will shut down only the output direction of a TCP conversation. (If you think you're going to run into one of those, feel free to put a switch into your program to control this.)

Doing a really correct implementation of buffering in a network copying program is a big pain in the rear end and most of the time you will never need it, so feel free to cheat here. If you really want to do it correctly, my approach was to have a buffer for each sink (network output and standard output) and to stop poll()'ing on the corresponding source (standard input and network input) when a sink's buffer filled up. I guarantee that you will have a bunch of twitch inducing logic about polling and not polling sinks and sources under various sorts of circumstances.

(I know this because I once decided to simultaneously learn the bstring library and poll() at the same time by writing a neurotically correct buffering network copying program that used bstrings as the buffering mechanism. I am fond of my btcp program (source here) in the abstract, but I never want to have to think about the logic again.)

(PS: in the credit where credit is due department, I learned the shutdown() on stdin EOF trick (a very long time ago) from Marc Moorcroft.)

Written on 09 April 2010.
« A little script: sshup
Why commands can never afford to get it wrong in a version »

Page tools: View Source.
Search:
Login: Password:

Last modified: Fri Apr 9 00:45:31 2010
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.