Buffered IO in Unix before V7 introduced stdio

December 5, 2024

I recently read Julia Evans' Why pipes sometimes get "stuck": buffering. Part of the reason is that almost every Unix program does some amount of buffering for what it prints (or writes) to standard output and standard error. For C programs, this buffering is built into the standard library, specifically into stdio, which includes familiar functions like printf(). Stdio is one of the many things that appeared first in Research Unix V7. This might leave you wondering if this sort of IO was buffered in earlier versions of Research Unix and if it was, how it was done.

The very earliest version of Research Unix is V1, and in V1 there is putc.3 (at that point entirely about assembly, since C was yet to come). This set of routines allows you to set up and then use a 'struct' to implement IO buffering for output. There is a similar set of buffered functions for input, in getw.3, and I believe the memory blocks the two sets of functions use are compatible with each other. The V1 manual pages note it as a bug that the buffer wasn't 512 bytes, but also notes that several programs would break if the size was changed; the buffer size will be increased to 512 bytes by V3.

In V2, I believe we still have putc and getw, but we see the first appearance of another approach, in putchr.s. This implements putchar(), which is used by printf() and which (from later evidence) uses an internal buffer (under some circumstances) that has to be explicitly flush()'d by programs. In V3, there's manual pages for putc.3 and getc.3 that are very similar to the V1 versions, which is why I expect these were there in V2 as well. In V4, we have manual pages for both putc.3 (plus getc.3) and putch[a]r.3, and there is also a getch[a]r.3 that's the input version of putchar(). Since we have a V4 manual page for putchar(), we can finally see the somewhat tangled way it works, rather than having to read the PDP-11 assembly. I don't have links to V5 manuals, but the V5 library source says that we still have both approaches to buffered IO.

(If you want to see how the putchar() approach was used, you can look at, for example, the V6 grep.c, which starts out with the 'fout = dup(1);' that the manual page suggests for buffered putchar() usage, and then periodically calls flush().)

In V6, there is a third approach that was added, in /usr/source/iolib, although I don't know if any programs used it. Iolib has a global array of structs, that were statically associated with a limited number of low-numbered file descriptors; an iolib function such as cflush() would be passed a file descriptor and use that to look up the corresponding struct. One innovation iolib implicitly adds is that its copen() effectively 'allocates' the struct for you, in contrast to putc() and getc(), where you supply the memory area and fopen()/fcreate() merely initialize it with the correct information.

Finally V7 introduces stdio and sorts all of this out, at the cost of some code changes. There's still getc() and putc(), but now they take a FILE *, instead of their own structure, and you get the FILE * from things like fopen() instead of supplying it yourself and having a stdio function initialize it. Putchar() (and getchar()) still exist but are now redone to work with stdio buffering instead of their own buffering, and 'flush()' has become fflush() and takes an explicit FILE * argument instead of implicitly flushing putchar()'s buffer, and generally it's not necessary any more. The V7 grep.c still uses printf(), but now it doesn't explicitly flush anything by calling fflush(); it just trusts in stdio.

Written on 05 December 2024.
« Sorting out 'PCIe bifurcation' and how it interacts with NVMe drives
Common motherboards are supporting more and more M.2 NVMe drive slots »

Page tools: View Source.
Search:
Login: Password:

Last modified: Thu Dec 5 23:16:54 2024
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.