What the original 4.2 BSD csh hashed (which is not what I thought)
Recently, Unix shells keeping track of where they'd found commands came up on the Fediverse again, as it does every so often; for instance, last year I advocated for doing away with the whole thing. As far as I know, (Unix) shell command hashing originated with BSD Unix's csh. which added command hashing and a 'rehash' builtin. However, if you actually read the 4.2 BSD csh(1) manual page, it says something a bit odd (emphasis mine):
rehash
: Causes the internal hash table of the contents of the directories in thepath
variable to be recomputed. This is needed if new commands are added to directories in thepath
while you are logged in. [...]
The way command hashing typically works in modern shells is that
the shell remembers the specific full path to a given command (or
sometimes that the command doesn't exist).
This is explicitly described in the Bash manual, which says (for
example) 'Bash uses a hash table to remember the full pathnames of
executable files'. In this case, if you or someone else adds a new
command to something in $PATH
and you've never run that command
before (because it didn't used to exist), you're fine and don't need
to rehash; your shell will automatically go looking for a new command
in $PATH
.
It turns out that the 4.2 BSD csh did not hash commands this way. Instead, well, let's quote a comment from sh.exec.c:
Xhash is an array of HSHSIZ chars, which are used to hash execs. If it is allocated, then to tell whether ``name'' is (possibly) present in the i'th component of the variable path, you look at the i'th bit of xhash[hash("name")]. This is setup automatically after .login is executed, and recomputed whenever ``path'' is changed.
To translate that, csh does not 'hash' where commands are found the
way modern shells do. Instead of looking up commands and then
remembering where it found them, it scans all of the directories
on your $PATH
and remembers the hash values of the names it saw
in each of them. When csh tries to run a command, it gets the hash
value of the command name, looks it up in the hash table, and skips
all $PATH
entries that hash value definitely isn't in. If you run
a newly added command, the odds are very low that its name will hash
to a hash value that has the right bit set in its hash table entry.
There can be hash value collisions between different command names and
if you have more than 8 $PATH
entries, more than one entry can set
the same bit, so finding a set bit merely means that potentially the
command is there. So this is not as good as remembering exactly where
the command is, but on the other hand it takes up a lot less memory; the
default csh hash size is 511 bytes. It also means that you definitely
want to do 'rehash
' when you or someone else modifies any directory
on your $PATH
, because the odds are very high that any new additions
won't be properly recognized.
(What 'rehash
' does is that it re-runs the code that sets up this
hash table, which is also run when $PATH
is changed and so on.)
|
|