2019-04-19
V7 Unix programs are often not written the way you would expect
Yesterday I wrote that V7 ed
read its terminal input in cooked
mode a line at a time, which was an
efficient, low-CPU design that was important on V7's small and
low-power hardware. Then in comments, frankg pointed out that I was
wrong about part of that, namely about how ed
read its input.
Here, straight from the V7 ed
source code,
is how ed read input from the terminal:
getchr() { [...] if (read(0, &c, 1) <= 0) return(lastc = EOF); lastc = c&0177; return(lastc); } gettty() { [...] while ((c = getchr()) != '\n') { [...] }
(gettty()
reads characters from getchr()
into a linebuf
array
until end of line, EOF, or it runs out of space.)
In one way, this is surprising; it's very definitely not how we'd
write this today, and if you did, many Unix programmers would
immediately tell you that you're being inefficient by making so
many calls to read()
and you should instead use a buffer, for
example through stdio's fgets()
. Very few modern Unix programs
do character at a time reads from the kernel, partly because on
modern machines it's not very efficient.
(It may have been comparatively less inefficient on V7 on the PDP-11, if for example the relative cost of making a system call was lower than it is today. My impression is that this may have been the case.)
V7 had stdio in more or less its modern form, complete with fgets()
.
V6 had a precursor version of stdio and buffered IO (see eg the
manpage for getc()
).
However, many V7 and V6 programs didn't necessarily use them; instead
they used more basic system calls. This is one of the things that often
gives the code for early Unix programs (V7 and before) an usual feel,
along with the short variable names and the lack of comments.
The situation with ed
is especially interesting, because in V5
Unix, ed
appears to have still been written in assembly; see
ed1.s
, ed2.s
, and ed3.s
here in 's1' of the V5 sources.
In V6, ed
was rewritten in C to create ed.c
(still in a part of the source tree called 's1'), but it still used
the same read()
based approach that I think it used in the assembly
version.
(I haven't looked forward from V7 to see if later versions were revised to use some form of buffering for terminal input.)
Sidebar: An interesting undocumented ed feature
Reading this section of the source code for ed
taught me that it
has an interesting, undocumented, and entirely characteristic little
behavior. Officially, ed
commands that have you enter new text
have that new text terminate by a .
on a line by itself:
$ ed newfile a this is new text that we're adding. .
This is how the V7 ed
manual documents it and
how everyone talks about. But the actual ed
source code implements
this on input is, from that gettty()
function:
if (linebuf[0]=='.' && linebuf[1]==0) return(EOF); return(0);
In other words, it turns a single line with '.' into an EOF. The consequence of this is that if you type a real EOF at the start of a line, you get the same result, thus saving you one character (you use Control-D instead of '.' plus newline). This is very V7 Unix behavior, including the lack of documentation.
This is also a natural behavior in one sense. A proper program has to react to EOF here in some way, and it might as well do so by ending the input mode. It's also natural to go on to try reading from the terminal again for subsequent commands; if this was a real and persistent EOF, for example because the pty closed, you'll just get EOF again and eventually quit. V7 ed is slightly unusual here in that it deliberately converts '.' by itself to EOF, instead of signaling this in a different way, but in a way that's also the simplest approach; if you have to have some signal for each case and you're going to treat them the same, you might as well have the same signal for both cases.
Modern versions of ed
appear to faithfully reimplement this
convenient behavior, although they don't appear to document it. I
haven't checked OpenBSD, but both FreeBSD ed and GNU ed work like
this in a quick test. I haven't checked their source code to see
if they implement it the same way.
Links: A Practitioner's Guide to System Dashboard Design (with a bonus)
A Practitioner's Guide to System Dashboard Design is a four article series on system dashboard design by Cory Watson of One Mo' Gin. The parts are:
If you like these (and I did), you probably also want to read Cory's The CASE Method: Better Monitoring For Humans, and perhaps peruse the full articles index for additional things to read.
(Via somewhere that I've now forgotten and can't find again. Perhaps it was Twitter or Mastodon.)