2022-01-02
Why "process substitution" is a late feature in Unix shells
A while ago, I read Julia Evans' Teaching by filling in knowledge gaps and hit the section using Evans' shell brackets cheat sheet as an example. One of the uses of brackets in Bash and other shells is "process substitution" (also Wikipedia), where you can use a redirection with a process instead of a file as an argument to commands:
diff <(rpm -qa) <(ssh server2 "rpm -qa")
Process substitution is a great little feature and it feels very
Unixy, but it took a surprisingly long time to appear in Unix and
in shells. This is because it needed a crucial innovation, namely
names in the filesystem for file descriptors, names that you
can open()
to be connected to the file descriptor.
Standard input, standard output, and so on are file descriptors, which (from the view of Unix processes) are small integers that refer to open files, pipes, network connections, and other things that fall inside the Unix IO model. File descriptors are specific to each process and are an API between processes and the kernel, where the process tells the kernel that it wants to read from (eg) file descriptor zero and the kernel provides it whatever is there. Conventionally, Unix processes are started with three file descriptors already open, those being standard input (fd 0), standard output (fd 1), and standard error (fd 2). However, you can start processes with more file descriptors already open and connected to something if you want to.
Normal Unix programs don't expect to be passed any extra file
descriptors and there's no standard approach in Unix for telling them
that hey have been given extra file descriptors and they should read or
write to them for some purpose. Instead, famously, Unix programs like
diff
expect to be provided file names as arguments, and then they
open the file names themselves. Some programs accept a special file name
(often '-
', a single dash) to mean that they should read from standard
input or write to standard output, but this is only a convention; there's
no actual '-
' filename that you can open yourself.
To implement process substitution, the shell needs to bridge these two different worlds. The process substitution commands will write to their standard output, but the overall command must be given file names as input. There are two ways to implement this, the inefficient one that's been possible since the beginning of Unix, and the efficient one that became possible later. The inefficient way is to write the output of the commands to a file, turning the whole thing into something like this:
rpm -qa >/tmp/file-a.$$ ssh server2 "rpm -qa" >/tmp/file-b.$$ diff /tmp/file-a.$$ /tmp/file-b.$$ rm /tmp/file-a.$$ /tmp/file-b.$$
I believe that some Unix shells may have implemented this, but it
was never very popular for various reasons (especially since this
was back in the days when /tmp
was generally on a slow hard disk).
Once named FIFOs
were available on Unixes, you could use them instead of actual
files, which improved the efficiency but still had some issues.
The best way is to have filesystem names for file descriptors, so
that when you open the filename, you're connected to the file
descriptor (you may or may not get that file descriptor returned
by the kernel from open()
). Then the shell can start the diff
process with some extra file descriptors open that are the input
sides of the pipes that the two process substitution commands are
writing their output too, and it can provide the filesystem names
for these file descriptors as command line arguments to diff
.
Diff thinks it's operating on files (although odd ones, since they're
not seekable among other issues), and generally it will be happy.
Everything is automatically cleaned up when things exit and it's
about as efficient as you could ask for. The conventional modern
filesystem name for file descriptors is /dev/fd/N
(for file
descriptor N).
I think every modern Unix has a /dev/fd
of some sort (although the
implementations vary), but coming up with the idea of /dev/fd
, having
it implemented, and then having it spread widely enough that shells
could reliably use it took a while. My impression is that process
substitution in shells didn't start to be common until then, and even
today isn't necessarily in wide use.
(Unfortunately I'm not sure where /dev/fd was first invented and
introduced. It's possible that it comes from later versions of
Research Unix, since
the V10 version of rc
apparently had this and I can't imagine the
Bell Labs people implementing it with named FIFOs. /dev/fd itself
took some Unix innovations after V7, but
that's for another entry.)
PS: Considering that Bash apparently had process substitution no later than 1994, my standards for a 'late shell feature' may be a bit off from many people's. However, I think process substitution is still not in the shell section of the current version of POSIX, although named FIFOs are.
Why I'm not interested in rolling back to snapshots of Linux root filesystems
One perpetual appeal of about advanced filesystems like btrfs and ZFS is the idea of making a snapshot of your root filesystem, trying an upgrade, and then reverting to the snapshot if you feel that things have gone wrong. In my entry yesterday on why I use ext4 for my root filesystems, I mentioned that I didn't expect doing this to work as well as you'd like, and Aristotle Pagaltzis expressed interest in an elaboration of this. Well, never let it be said that I don't take requests.
(I covered some of the general ground in an old entry on rollbacks versus downgrades, but today I'll be more specific.)
The first problem is that Linux doesn't separate out the different
types of things that are in /var
; it contains a mess of program data,
user data, and logs. You must roll back anything containing program
data along with /usr
, because your upgrade may have done things like
changed the database format or updated your package database. But this
will lose the new log data in /var/log
(and perhaps elsewhere) and
perhaps user data in /var/mail
and anywhere else it may be lurking.
(For example, you might have mail flowing through your system under
/var/spool
. If you sent an email message but it hasn't been fully
delivered yet, you don't really want it to vanish in a rollback.)
Then there is the problem of /etc
, which contains a mixture of
manually maintained files, manually updated package files, automatically
maintained state files, and automatically updated package files.
Much like /var
, you must roll back /etc
along with /usr
and
that will cost you anything you've done by hand since the upgrade,
or any state updates for things that live outside of the root
filesystem.
(In some environments, state files are potentially significant. For
example, ZFS normally maintains state information about your active
pools in /etc/zfs/zpool.cache
.)
On Linux, rolling back the roof filesystem basically requires a reboot, making it a relatively high impact operation on top of everything else. Some of this is simply the general problem that running programs will no longer have the right versions of shared libraries, configuration files, databases, and so on in the filesystem. Some of this is because the Linux kernel contains internal data structures for active files (and known files more generally) that it doesn't entirely expect to be yanked out from underneath it.
These problems are all magnified if you don't notice problems right away, and if you make routine use of the system before noticing problems. The longer the post-snapshot system is running in normal use, the more user data, changes, and running programs you will have accumulated. The more things that have accumulated, the more disruptive any rollback will be.
Given that you're balancing the disruption, loss, and risks of a rollback against the disruption, loss, and risks of whatever is wrong after the upgrade, it may not take too long before the second option is less disruptive. A related issue is that if you can solve your problems by reinstall back to an older version of one or more packages, it's basically guaranteed to be less disruptive than a root filesystem rollback. This means that root filesystem rollbacks are only worth while in situations with a lot of changes all at once that you can't feasibly roll back, like distribution version upgrades. These are the situations where maintaining a snapshot takes the most amount of disk space, since so much changes.
(In addition, pragmatically things don't go majorly wrong with major upgrades all that often, especially if you wait a while before doing them to let other people discover the significant issues. And remember to read the upgrade notes.)
A very carefully managed system can avoid all of these problems. If you've move all user data into a separate filesystem, change the system through automation (also stored in a separate filesystem), push logs to a separate area, do significant testing after an upgrade before putting things in production, and can reboot with low impact, rollbacks could work great. But this is not very much like typical Linux desktop systems; it's more like a "cattle" style cloud server. Very little in a typical Fedora, Debian, or Ubuntu system will help you naturally manage it this way.
(There are other situations where rollbacks are potentially useful. For example, if you have a test system with no user data, no important logs, and no particular manually maintained changes that you frequently try out chancy updates on. If everything on the system is basically expendable anyway, reverting to a snapshot is potentially the fastest way to return it to service.)
Sidebar: Snapshots by themselves can be valuable
A snapshot by itself provides you an immediate, readily accessible
backup of the state of things before you made the change. This can
be useful and sometimes quite valuable in ordinary package updates.
For example, if you upgrade a package and the package updates the
format of its database in /var
so that you can't revert back to
an older package version, a snapshot potentially lets you fish out
the old pre-upgrade database and drop it into place to go with the
older package version.