2025-01-12
The history and use of /etc/glob
in early Unixes
One of the innovations that the V7 Bourne shell introduced was built
in shell wildcard globbing, which is to say expanding things like
*
, ?
, and so on. Of course Unix had shell wildcards well
before V7, but in V6 and earlier, the shell didn't implement globbing
itself; instead this was delegated to an external program, /etc/glob
(this affects things like looking into the history of Unix shell
wildcards, because you have to know to look at
the glob
source, not the shell).
As covered in places like the V6 glob(8) manual page, the
glob
program was passed a command and its arguments (already split
up by the shell), and went through the arguments to expand any
wildcards it found, then exec()'d the command with the now expanded
arguments. The shell operated by scanning all of the arguments for
(unescaped) wildcard characters. If any were found, the shell exec'd
/etc/glob with the whole show; otherwise, it directly exec()'d the
command with its arguments. Quoting wildcards used a hack that
will be discussed later.
This basic /etc/glob behavior goes all the way back to Unix V1,
where we have sh.s
and in it we can see that invocation of /etc/glob. In V2, glob is
one of the programs that have been rewritten in C (glob.c), and
in V3 we have a sh.1 that
mentions /etc/glob
and has an interesting BUGS note about it:
If any argument contains a quoted "*", "?", or "[", then all instances of these characters must be quoted. This is because
sh
calls theglob
routine whenever an unquoted "*", "?", or "[" is noticed; the fact that other instances of these characters occurred quoted is not noticed byglob
.
This section has disappeared in the V4 sh.1 manual page, which suggests that the V4 shell and /etc/glob had acquired the hack they use in V5 and V6 to avoid this particular problem.
How escaping wildcards works in the V5 and V6 shell is that all characters in commands and arguments are restricted to being seven-bit ASCII. The shell and /etc/glob both use the 8th bit to mark quoted characters, which means that such quoted characters don't match their unquoted versions and won't be seen as wildcards by either the shell (when it's deciding whether or not it needs to run /etc/glob) or by /etc/glob itself (when it's deciding what to expand). However, obviously neither the shell nor /etc/glob can pass such 'marked as quoted' characters to actual commands, so each of them strips the high bit from all characters before exec()'ing actual commands.
(This is clearer in the V5 glob.c source; look
for how cat()
ands every character with octal 0177 (0x7f) to drop
the high bit. You can also see it in the V5 sh.c source, where
you want to look at trim()
, and also the #define for 'quote
' at the
start of sh.c and how it's used later.)
PS: I don't know why expanding shell wildcards used a separate program in V6 and earlier, but part of it may have been to keep the shell smaller and more minimal so that it required less memory.
PPS: See also Stephen R. Bourne's 2015 presentation from BSDCan [PDF], which has a bunch of interesting things on the V7 shell and confirms that /etc/glob was there from V1.