2019-08-03
Sharing file descriptors with child processes is a clever Unix decision
One of the things that happens when a Unix process clones itself
and executes another program is that the first process's open file
descriptors are shared across into the child (well, apart from the
ones that are marked 'close on exec'). This is not just sharing
that the new process has the same files or IO streams open, the way
that it would have if it open()
'd them independently; this shares
the actual kernel level file descriptors. This full sharing means
that if one process changes the properties of file descriptors,
those changes are experienced by the other processes as well.
(This inheritance of file descriptors sometimes has not entirely
desirable consequences, as does that file
descriptor properties are shared. Running a program that leaves
standard input set to O_NONBLOCK
is often still a reliable way
to get your shell to immediately exit after the program finishes.
Many shells reset the TTY properties, but often
don't think of O_NONBLOCK
.)
This full sharing is probably easier to implement in the kernel
than making an independent copy of the file descriptor (unless you
also changed how dup()
works). But it has another important
property that makes it a clever choice for Unix, which is that the
file offset is part of what is shared and this means that the
following subshell operation can work as intended:
(sed -e 10q -e 's/^/a: /'; sed -e 10q -e 's/^/b: /') <afile
(Let's magically assume that sed
doesn't use buffered reads
and so will read only exactly ten lines each time. This isn't
true in practice.)
If the file offset wasn't shared between all children, it's not clear how this would work. You'd probably have to invent some sort of pipe-like file descriptor that either shared the file offset or was a buffer and didn't support seeking, and then have the shell use it (probably along with some other programs).
Sharing the file offset is also the natural way to handle multiple processes writing standard output (or standard error) to a file, as in the following example:
(program1; program2; program3) >afile
If the file offset wasn't shared, each process would start writing
at the start of afile
and they'd overwrite each other's results.
Again, you'd need some pipe-like trick to make this work.
(Once you have O_APPEND
, you can use it for this, but O_APPEND
appears to postdate V7 Unix; it's not in the V7 open(2)
manpage.)
PS: The implementation of shared file descriptors across processes in old Unixes is much simplified by the fact that they're uniprocessor environments, so the kernel has no need to worry about locking for updating file offsets (or much of anything else to do with them). Only one process can be in the kernel manipulating them at any given time.
Link: ASCII table and history (Or, why does Ctrl+i insert a Tab in my terminal?)
This ASCII table page (via) answers the question it poses in its title, and the answer is quite interesting. I have a long standing interest in this area, and this page's table explains things like why Ctrl-@ is a common way to generate a 0 byte. The table also makes it clear that at least one case is handled specially, that of Ctrl-? often being DEL. So now that I look at it, the table is interesting reading, not just the history.
(The straightforward implementation of Ctrl masks off bit 7, or perhaps bits 6 and 7, which turns @, binary 10 00000, into binary 00 00000 and thus gives you NUL. But you cannot go from ?, binary 01 11111, to DEL, binary 11 11111, by masking off bits; you have to turn on a bit instead. And notice that it is common to make Ctrl-_ generate byte 31, so we have binary 10 11111 turning into binary 00 11111 through masking, so this is not a general special treatment when the low five bits are 11111.)
PS: There's also this version of a four column ASCII table, but it doesn't have the history and doesn't look as nice as this new one.