V7 Unix programs are often not written the way you would expect

April 19, 2019

Yesterday I wrote that V7 ed read its terminal input in cooked mode a line at a time, which was an efficient, low-CPU design that was important on V7's small and low-power hardware. Then in comments, frankg pointed out that I was wrong about part of that, namely about how ed read its input. Here, straight from the V7 ed source code, is how ed read input from the terminal:

	if (read(0, &c, 1) <= 0)
		return(lastc = EOF);
	lastc = c&0177;

	while ((c = getchr()) != '\n') {

(gettty() reads characters from getchr() into a linebuf array until end of line, EOF, or it runs out of space.)

In one way, this is surprising; it's very definitely not how we'd write this today, and if you did, many Unix programmers would immediately tell you that you're being inefficient by making so many calls to read() and you should instead use a buffer, for example through stdio's fgets(). Very few modern Unix programs do character at a time reads from the kernel, partly because on modern machines it's not very efficient.

(It may have been comparatively less inefficient on V7 on the PDP-11, if for example the relative cost of making a system call was lower than it is today. My impression is that this may have been the case.)

V7 had stdio in more or less its modern form, complete with fgets(). V6 had a precursor version of stdio and buffered IO (see eg the manpage for getc()). However, many V7 and V6 programs didn't necessarily use them; instead they used more basic system calls. This is one of the things that often gives the code for early Unix programs (V7 and before) an usual feel, along with the short variable names and the lack of comments.

The situation with ed is especially interesting, because in V5 Unix, ed appears to have still been written in assembly; see ed1.s, ed2.s, and ed3.s here in 's1' of the V5 sources. In V6, ed was rewritten in C to create ed.c (still in a part of the source tree called 's1'), but it still used the same read() based approach that I think it used in the assembly version.

(I haven't looked forward from V7 to see if later versions were revised to use some form of buffering for terminal input.)

Sidebar: An interesting undocumented ed feature

Reading this section of the source code for ed taught me that it has an interesting, undocumented, and entirely characteristic little behavior. Officially, ed commands that have you enter new text have that new text terminate by a . on a line by itself:

$ ed newfile
this is new text that we're adding.

This is how the V7 ed manual documents it and how everyone talks about. But the actual ed source code implements this on input is, from that gettty() function:

if (linebuf[0]=='.' && linebuf[1]==0)

In other words, it turns a single line with '.' into an EOF. The consequence of this is that if you type a real EOF at the start of a line, you get the same result, thus saving you one character (you use Control-D instead of '.' plus newline). This is very V7 Unix behavior, including the lack of documentation.

This is also a natural behavior in one sense. A proper program has to react to EOF here in some way, and it might as well do so by ending the input mode. It's also natural to go on to try reading from the terminal again for subsequent commands; if this was a real and persistent EOF, for example because the pty closed, you'll just get EOF again and eventually quit. V7 ed is slightly unusual here in that it deliberately converts '.' by itself to EOF, instead of signaling this in a different way, but in a way that's also the simplest approach; if you have to have some signal for each case and you're going to treat them the same, you might as well have the same signal for both cases.

Modern versions of ed appear to faithfully reimplement this convenient behavior, although they don't appear to document it. I haven't checked OpenBSD, but both FreeBSD ed and GNU ed work like this in a quick test. I haven't checked their source code to see if they implement it the same way.

Comments on this page:

By frankg at 2019-04-20 02:38:50:

Thanks for the shout out.

Although ed reads 1 char at a time the kernel in cooked mode will not return from the read until a full line has been typed. See the function canon in tty.c:

  * transfer raw input list to canonical list,
  * doing erase-kill processing and handling escapes.
  * It waits until a full line has been typed in cooked mode,
  * or until any character has been typed in raw mode.

canon(tp) register struct tty *tp;

So you are right line buffering is being done. It's just in the kernel not in ed. The last known version of ed from labs was in Plan9 and the code looks almost unchanged but does use a buffered I/O library to implement getchar. GNU ed while being significantly different uses the getchar stdio function which is also buffered.

By Blissex at 2019-04-24 18:58:51:

That read-character-at-a-time is likely an oversight, but 'getchr()' really needs to read just 1 character, even in line buffered mode, to handle Ctrl-D.

Try the following: command 'a', newline, type "x", newline, type Ctrl-D, type "p", newline. The "p" gets executed as a command and prints "x". Note the absence of newline between Ctrl-D and type "p".

Other than that as to documentation the Edition 7 documentation, however terse and somewhat incomplete, was amazingly and still is amazingly good, and the lack of comment was due to the code being really mostly self-explanatory and short. A 192KiB PDP-11/34 could run 2-3 timesharing users editing with 'vi' and compiling with 'cc'.

By cks at 2019-04-24 21:43:35:

Reading character at a time isn't necessary to handle Ctrl-D the way ed does; in cooked mode terminal input, you would get the same results if you tried to read in a whole buffer at a time (well, with the right structure to your program's code). This is because of what typing Ctrl-D really does on Unix.

(Correctly handling EOF from a cooked mode terminal with buffering in the picture does require some care, but it can be done.)

For what it's worth, the read(1) command in modern FreeBSD's sh(1) uses a similar read(2) character-at-a-time method because sh(1) still uses plain fds instead of stdio for input. It would require some care to retrofit sh to use stdio, if it is even possible, because of syntax like "( read f; command_foo ) < bar". (A subshell strips off the first line, then the rest of input is passed to command_foo.) This would require forwarding any buffered input past the first line to the invoked command, instead of just having command_foo inherit stdin, as it does today.

That particular hurdle probably isn't insurmountable, but there might be other requirements I'm forgetting.

By cks at 2019-04-28 20:03:38:

I think that read in shells is genuinely best implemented with character at a time read() calls. I think that you could make buffered reading of standard input for read work with enough effort, but some of the time it would require the shell to materialize (or retain) extra processes simply to forward input. I expect that the resulting complexity would not be worth it.

(I also suspect that this complexity would create bugs, since I've already seen something similar to that with output buffering in a fun Bash bug.)

Written on 19 April 2019.
« Links: A Practitioner's Guide to System Dashboard Design (with a bonus)
My view on upgrading Prometheus (and Grafana) on an ongoing basis »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Apr 19 23:49:59 2019
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.