Wandering Thoughts archives

2008-01-20

Layering buffering on top of other buffering is usually a bad idea

Here's something that I've seen illustrated more than once: layering buffering on top of buffering is almost always a bad idea.

One of the reasons is that it can be tricky to write a correct version. For example, consider all the things that you have to get right if you write a file IO buffering layer on Unix, including:

  • you must remember that you've seen an EOF, and return this indication to the next level up without doing another underlying read().
  • you must return short reads, unless you're explicitly doing forced buffering.

(If you do forced buffering, you need your own code to handle record oriented reading, including line at a time; otherwise you will be unusable for things like network protocols.)

It can also be hard to understand the performance implications of multiple levels of buffering. A related problem is that another level of buffering can accidentally cover up terrible performance in your own code, simply because it usually doesn't get run enough for you to notice. (I'm not talking about unoptimized code; I'm talking about code that has catastrophically bad performance if tickled the right way.)

Does this mean that Unix's standard IO library is a bad idea, since the kernel already does buffering? No, for two reasons. First, the kernel doesn't guarantee that it will keep those buffers around, so your IO might not be buffered after all. Second, standard IO also 'buffers' system calls, which lets it turn what would otherwise be very expensive operations into cheap ones.

(For example, reading only a single line without buffering means that you need to read each character separately. That would be a lot of system calls.)

programming/BufferingOnBuffering written at 23:39:46; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.