2006-04-08
A common socket programming mistake: not handling short IO
With normal file IO, when you do do 'read(fd, buf, len)' you'll almost
always get back len bytes unless you hit EOF or a disk IO error. This
breeds a certain sloppyness when filling buffers; an awful lot of code
effectively ignores the return value of read() except to check it for
errors.
This can and will bite you on the rear when writing socket code, because networks only give you so much data at once. Short reads are routine for socket IO; you can't assume that you can get all of what you want in a single read.
The mistake is especially pernicious because the mistaken code almost always works. Usually the lines or transactions you're reading from the network are small; usually you test on a fast local network. Speaking from personal experience, it's easy to forget this and then not notice.
(Today's case was some of my code that assumed it could read all of
a HTTP POST body in one read().)
Whether or not a normal (blocking) write() has similar issues is
probably system dependent. Linux seems to only return from a socket
write() once all data has been pushed out, but I don't know what
other systems do in practice.
(According to the Single Unix Specification page for
write(),
in theory you can count on this behavior on any SuS compliant system.)
2006-04-04
Why I don't like resorting to caching
One of my little twitches is that I really don't like putting caching in my programs. It's not because cache invalidation is hard, although it is. It's because resorting to caching is an admission of defeat: it's admitting that I can't make the actual code go fast enough.
In other words, I've failed to write code that's fast enough. I don't think any programmer likes to fail; certainly I don't. I don't like it even when it's not entirely my fault; for example, when the underlying language features and libraries aren't fast enough.
(Of course, sometimes the problem is intrinsically slow no matter how much you optimize. I don't feel bad about those; that's just running into a physical limitation.)
Caching is a compromise; it's a tradeoff of a quick fix against the right fix, however much effort the right fix may take. Of course, that's the perfection trap singing its siren song to me.
The net result is that adding caching too often feels like slathering a bright coat of paint on a rickety house and hoping that no one notices. Doing that sort of thing always leaves me feeling grumpy and dissatisfied.
(And of course one has to measure the overhead of caching, cache checking, and cache invalidation, just to make sure that the cache is actually speeding things up. Benchmarking: enough tedium for the whole family.)