The original Unix ed(1) didn't load files being edited into memory

October 20, 2018

These days almost all editors work by loading the entire file (or files) that you're editing into memory, either into a flat buffer or two or into some more sophisticated data structure, and then working on them there. This approach to editing files is simple, straightforward, and fast, but it has an obvious limitation; the file you want to edit has to fit into memory. These days this is generally not much of an issue.

V7 was routinely used on what are relatively small machines by modern standards, and those machines were shared by a fairly large number of people, so system memory was a limited resource. Earlier versions of Research Unix had to run on even smaller machines, too. On one of those machines, loading the entire file you wanted to edit into memory was somewhere between profligate and impossible, depending on the size of the file and the machine you were working on. As a result, the V7 ed does not edit files in memory.

The V7 ed manpage says this explicitly, although it's tempting to regard this as hand waving. Here's the quote:

Ed operates on a copy of any file it is editing; [...]. The copy of the text being edited resides in a temporary file called the buffer.

The manual page is being completely literal. If you started up V7 in a PDP-11 emulator and edited a file with ed, you would find a magic file in /tmp (called /tmp/e<something>, the name being created by the V7 mktemp()). That file is the buffer file, and you will find much or all of the file you're editing in it (in some internal format that seems to have null bytes sprinkled around through it).

(V6 is substantially the same, so you can explore this interactively here. I was surprised to discover that V6 doesn't seem to have sed.)

I've poked through the V7 ed.c to see if I could figure out what ed is doing here, but I admit the complete lack of comments has mostly defeated me. What I think it's doing is only allocating and storing some kind of index to where every line is located, then moving line text in and out of a few small 512-byte buffers as you work on them. As you add text or move things around, I believe that ed writes new copies of the line(s) you've touched to new locations in the buffer file, rather than try to overwrite the old versions in place. The buffer file has a limit of 256 512-byte blocks, so if you do enough editing of a large enough file I believe you can run into problems there.

(This agrees with the manpage's section on size limitations, where it says that ed has a 128 KB limit on the temporary file and the limit on the number of lines you can have is the amount of core, with each line taking up one PDP-11 'word' (in the code this is an int).)

Exploring the code also led me to discover how ed handled errors internally, which is by using longjmp() to jump back to main() and re-enter the main command loop from there. This is really sort of what I should have expected from a V7 program; it's straight, minimal, and it works. Perhaps it's not how we'd do that today, but V7 was a different and smaller place.

PS: If you're reading the V7 ed source and want to see where this is, I think it runs through getline(), putline, getblock(), and blkio(). I believe that the tline variable is the offset that the next line will be written to by putline(), and it gets stored in the dynamically allocated line buffer array that is pointed to by zero. The more I look at it, the more that the whole thing seems pretty clever in an odd way.

(My looking into this was sparked by Leah Neukirchen's comment on my entry on why you should be willing to believe that ed is a good editor. Note that even if you don't hold files in memory, editing multiple files at once requires more memory. In ed's scheme, you would need to have multiple line-mapping arrays, one for each file, and probably you'd want multiple buffer files and some structures to keep track of them. You might also be more inclined to do more editing operations in a single session and so be more likely to run into the size limit of a buffer file, which I assume exists for a good reason.)

Written on 20 October 2018.
« Some things on delays and timings for Prometheus alerts
Some tradeoffs of having a Certificate Authority in your model »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Oct 20 01:17:05 2018
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.