Read IO is generally synchronous (unlike write IO)

October 17, 2010

One of the important general difference between read IO and write IO is that read IO is generally synchronous (in its normal form), and not just because the operating system interfaces are usually synchronous. Read IO is synchronous in general because the program is almost always going to immediately look at and do something with the data that it's read, and it can't go on until it's done this. It's a relatively rare program that doesn't look at the data but instead either discards it or turns around and hands it to another system interface.

(Okay, it used to be a relatively rare program that did this. In the modern web world there are a number of cases where the program will indeed never look at a lot of the data itself and instead just passes it along to somewhere else.)

This implies that regardless of the interface you offer to programs and regardless of how you implement it behind the scenes, one way or another you have to get that data off the those rusty platters before the program can go on to do anything else. Cleverness in mmap() or zero-copy read IO or the like only gets you so far, because you have to materialize the data either way. The only way to get out of this is to somehow read the data off the platters before the program tries to look at it by using some sort of readahead; whether this is done by the program (with asynchronous IO or something similar) or by the operating system (by noticing access patterns) is somewhat of an implementation detail.

Explicit asynchronous read IO interfaces are still useful for two reasons. First, the program may know more about its access patterns than the operating system can feasibly deduce. Second, there are cases where the program is working on several things at once and could proceed on whichever one of them gets off the disk first.

In theory write IO is equally synchronous, in that programs often want to reuse the buffers that they just wrote without changing what they wrote. In practice this is usually effectively asynchronous IO, because programs usually don't have to wait until the data gets all of the way down to the spinning rust; it's enough that the operating system has taken it off their hands.


Comments on this page:

From 89.27.51.67 at 2010-10-18 01:52:43:

I thought you were taking about I/O in general rather thank disk I/O until you mentioned rusty platters somewhere around the middle of the text.

By cks at 2010-10-18 08:16:30:

This is true of IO in general, but I can't think of much other sorts of IO that you can even potentially defer reads for behind the back of the program.

It's an interesting question why this is so. I think it's that disk IO (or plain file IO in general) is the only case where you definitely know that the data exists and how much there is before you actually have it in hand.

Written on 17 October 2010.
« The attraction of the milter protocol
The cache eviction death spiral »

Page tools: View Source, View Normal.
Search:
Login: Password:

Last modified: Sun Oct 17 03:47:33 2010
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.