2024-12-05
Buffered IO in Unix before V7 introduced stdio
I recently read Julia Evans' Why pipes sometimes get "stuck":
buffering.
Part of the reason is that almost every Unix program does some
amount of buffering for what it prints (or writes) to standard
output and standard error. For C programs, this buffering is built
into the standard library, specifically into stdio, which includes
familiar functions like printf()
. Stdio is one of the many things
that appeared first in Research Unix V7.
This might leave you wondering if this sort of IO was buffered in
earlier versions of Research Unix and if it was, how it was done.
The very earliest version of Research Unix is V1, and in V1 there is putc.3 (at that point entirely about assembly, since C was yet to come). This set of routines allows you to set up and then use a 'struct' to implement IO buffering for output. There is a similar set of buffered functions for input, in getw.3, and I believe the memory blocks the two sets of functions use are compatible with each other. The V1 manual pages note it as a bug that the buffer wasn't 512 bytes, but also notes that several programs would break if the size was changed; the buffer size will be increased to 512 bytes by V3.
In V2, I believe we still have putc and getw, but we see the first appearance of another approach, in putchr.s. This implements putchar(), which is used by printf() and which (from later evidence) uses an internal buffer (under some circumstances) that has to be explicitly flush()'d by programs. In V3, there's manual pages for putc.3 and getc.3 that are very similar to the V1 versions, which is why I expect these were there in V2 as well. In V4, we have manual pages for both putc.3 (plus getc.3) and putch[a]r.3, and there is also a getch[a]r.3 that's the input version of putchar(). Since we have a V4 manual page for putchar(), we can finally see the somewhat tangled way it works, rather than having to read the PDP-11 assembly. I don't have links to V5 manuals, but the V5 library source says that we still have both approaches to buffered IO.
(If you want to see how the putchar() approach was used, you can look at, for example, the V6 grep.c, which starts out with the 'fout = dup(1);' that the manual page suggests for buffered putchar() usage, and then periodically calls flush().)
In V6, there is a third approach that was added, in /usr/source/iolib, although I don't know if any programs used it. Iolib has a global array of structs, that were statically associated with a limited number of low-numbered file descriptors; an iolib function such as cflush() would be passed a file descriptor and use that to look up the corresponding struct. One innovation iolib implicitly adds is that its copen() effectively 'allocates' the struct for you, in contrast to putc() and getc(), where you supply the memory area and fopen()/fcreate() merely initialize it with the correct information.
Finally V7 introduces stdio and sorts all of this out, at the cost
of some code changes. There's still getc() and
putc(), but
now they take a FILE *
, instead of their own structure, and you
get the FILE *
from things like fopen() instead of supplying
it yourself and having a stdio function initialize it. Putchar()
(and getchar()) still exist but are now redone to work with stdio
buffering instead of their own buffering, and 'flush()' has become
fflush() and
takes an explicit FILE *
argument instead of implicitly flushing
putchar()'s buffer, and generally it's not necessary any more. The
V7 grep.c still
uses printf(), but now it doesn't explicitly flush anything by
calling fflush(); it just trusts in stdio.