The processing flow of a network copying program
For my sins, I have dabbled in writing netcat-like programs for some time, things that take standard input, send it off to somewhere over the network, and write to standard out what they get back from the network. In the process I have formed very definite opinions about how these programs should behave in order to be most useful (in scripts and so on), and I feel like writing it down.
For the most part, a network copying program is straightforward; you use
select()
or poll()
, handle buffering carefully, and bounce data back
and forth. The tricky things are how you handle various ways that the
conversation can end:
- when you see end of file on standard input, signal to the network
that there is no more input coming but keep processing network
input. On standard TCP connections, you do a
shutdown(fd, SHUT_WR
); I don't know what you do with SSL, but I hope there's some similar equivalent. (You're on your own with UDP.)You have to keep reading from the network because it may well still be sending you output, especially in a scripted situation where you are creating an immediate end of file on standard input by doing things like:
(echo A; echo B) | tcp host port
You have to tell the network server that you've stopped sending things, because otherwise some network servers will never close the connection; you'll just sit there forever. Also, it makes it simpler to use the program to do certain sorts of checks (eg, to see if a port is answering at all) without needing special program features.
- when you see end of file on network input, just exit. This is not symmetrical with the handling of standard input, but it turns out to be the most useful in general; there are very few situations where a network server will shut down only the output direction of a TCP conversation. (If you think you're going to run into one of those, feel free to put a switch into your program to control this.)
Doing a really correct implementation of buffering in a network
copying program is a big pain in the rear end and most of the time you
will never need it, so feel free to cheat here. If you really want
to do it correctly, my approach was to have a buffer for each sink
(network output and standard output) and to stop poll()
'ing on the
corresponding source (standard input and network input) when a sink's
buffer filled up. I guarantee that you will have a bunch of twitch
inducing logic about polling and not polling sinks and sources under
various sorts of circumstances.
(I know this because I once decided to simultaneously learn the
bstring library and poll()
at the same time by
writing a neurotically correct buffering network copying program
that used bstrings as the buffering mechanism. I am fond of my btcp
program (source here) in the abstract,
but I never want to have to think about the logic again.)
(PS: in the credit where credit is due department, I learned the
shutdown()
on stdin EOF trick (a very long time ago) from Marc
Moorcroft.)
|
|