It's long since time for languages to provide sensible network IO

June 20, 2011

You may not have noticed, but stream-based network IO is famously not like regular file IO. Well, mostly network read IO. Believing that it is is a common socket programming error.

A great deal of the time, programs and programmers do not want to care about this. When they say 'read ten bytes', they want to get back exactly ten bytes (assuming that the stream is not closed on them) and they do not care how long they have to wait for those ten bytes to all show up. In most cases, if you return less than ten bytes all they will do is turn around and immediately try to read more bytes.

(Programmers who actively want short reads tend to set their network sockets to nonblocking mode and do other special things.)

By now it is far too late to change the base behavior of systems. But if your language has a network IO package, it is high time that you provided not just a read() operation but also a readall() operation to go with it. In fact, I would go so far as to say that what you should really provide is read() and readshort(); readshort() would be the current 'read whatever the OS says is there', and read() would do 'read all' unless the socket was in non-blocking mode.

(If you have a moral objection to doing this in your basic network support layer, please provide it as a common mixin or simple additional layer that is easy to stack on top of a bare network socket.)

Such an interface would be an error-shielding interface. It would avoid a class of errors, or at least avoid requiring programmers to repeatedly write the same code to do network re-reading correctly. In almost all cases the overall code would get simpler; either the higher layer wouldn't have to worry about this any more or programmers using the higher layer on network streams wouldn't have to shim something to do re-reads in between the higher layer and the actual network socket.

It's quite possible that making this change would expose the fact that you need additional interfaces, for example a 'read until you see character X' interface for simple (text) line-based network protocols like SMTP.

(Simple servers for such protocols mostly work today because read() almost always returns a full line and only a single line.)

A similar thing can and should be done for write() on any platform where a write to a blocking network socket doesn't necessarily write out all of the data. (This is not all of them.)

(This grump is brought to you by me having to deal with this issue yet again.)

By the way and as a side note, this means that a great many simple examples of network programming in high level languages are wrong. By ignoring this issue (either through ignorance or a desire for simplicity) they lure programmers into writing subtly erroneous network code, code that works most of the time but not always.

(That not ignoring this issue would make your simple network IO examples 'too complex' is in fact a sign that your language needs a simpler interface to this issue.)

Comments on this page:

From at 2011-06-20 14:57:53:

What exactly do you expect this magic readall() function to do? Block until the remote FINs? This isn't a very affective approach if you expect to support long-lived connections.

Your read() operations need to fill a buffer which is then processed by upper layer handlers. If handlers run short on data, read() again. Since read() is a effectively a layer 4 operation, it has absolutely no idea where one layer 7 message ends and the next begins.

By cks at 2011-06-20 17:07:34:

I've clearly not been clear enough. The network IO read problem is that when you do read(N), you may get less than N bytes back. readall(N) promises not to return until it has either read N bytes or hit end of stream. In this it mimics the (usual) behavior of read() on files.

From at 2011-06-20 17:28:45:

So something like recv() with MSG_WAITALL flag. If this is really what you want, it would be trivial to write your own wrapper around stock socket read function to emulate this.

I think you'll find that those wishing for a simple network IO interface stick to readline type functions. It's not always optimal but often easier to wrap your head around.

Written on 20 June 2011.
« What I need for stream decoding and encoding binary protocols
Abusing Python classes as namespaces »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Jun 20 13:18:08 2011
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.