2023-11-25
Unix shells and the current directory
Famously, Unix has the concept of a process's 'current directory', including for your shell processes. Recently, I saw an interesting Fediverse discussion on some aspects of the current directory which aren't necessarily obvious, partly because both Unix kernels and Unix shells have become more complicated over time.
The Unix kernel keeps track of your current directory not as a text path but as a reference to a kernel object, normally the directory's inode. In the old days this was all the kernel actually knew about the current directory, but today Linux (and perhaps other Unixes) have developed kernel caches of the mappings between names and inodes; in Linux, these are dnodes (I believe for 'directory (entry) node'). Linux's dnodes mean that the kernel almost always knows the name of your current directory, if it has one.
(Your current directory may not have a name, for example because the directory has been removed. If you do 'mkdir /tmp/example; cd /tmp/example; rmdir /tmp/example' your current directory still exists in some sense, but it's lost its name and most everything else.)
One of the ways that Linux uses its knowledge of the file (path)
names of things is with /proc/*/cwd and more generally all of
the file names in /proc/*/fd. Another way is that the Linux
kernel has a getcwd()
system call
that returns this information to you, which is what getcwd(3)
normally uses. Interested parties can see this system call being
used in the depths of 'strace /bin/pwd'. On systems that don't have
a getcwd()
system call, how /bin/pwd works is much more brute
force. The traditional implementation is to stat() '.', the current
directory, and then read through '..', the parent directory, until
you find the name for the current directory (which you can identify
by its inode number). Then repeat for the next level up until you've
reached the root directory, and put all of the name components
together into the path.
(It appears that FreeBSD also has a similar system call, based on what 'truss /bin/pwd' reports, and the OpenBSD getcwd(3) manual page says that OpenBSD does too. FreeBSD appears to implement this with a name cache in the kernel. I haven't looked at the OpenBSD kernel source.)
Complicating this picture is shells. For a long time, many shells
have kept track of a name for their current directory themselves,
often materializing this in the '$PWD
' environment variable. The
shell has to keep track of this name as a text string or the rough
equivalent, which makes it potentially less accurate than the
kernel's version. However, it has some advantages, because unlike
the kernel, the shell knows what name you typed in order to get to
the directory, which may not be the actual filesystem name of the
directory because of things like symbolic links. Shells often use
this knowledge so that names like '..' and even '.' work on the
text version, not the filesystem version.
(Sometimes people then write shell scripts and other code that
assumes '$PWD
' is accurate if it's present, which is not necessarily
true. Sadness often ensues. Because '$PWD
' is a regular environment
variable, it's not automatically updated when someone's code does
a chdir() call.)
That sounds abstract so here's an example. If you typed 'cd /u/cks' and /u/cks is actually a symbolic link to /h/281/cks, the kernel only knows your current directory as '/h/281/cks'. If you ask the kernel to change directory to '..', the parent, you will wind up in /h/281. If you ask the shell to 'cd ..', the shell can put you in /u, the textual parent of '/u/cks', the path you typed to get to your current directory. Shells with this behavior usually have a 'pwd' builtin that prints their 'logical' text view of your current directory, as opposed to the filesystem view that /bin/pwd will print.
All of this shell behavior (and Unix kernel behavior) isn't necessarily well known (or perhaps clearly documented, although all of the pieces are there if you read enough things). Usually this doesn't matter because everything works well enough and does what people expect.
(Some people have strong reactions to shells using their 'logical path' instead of the filesystem path for things like 'cd ..'. These people are less happy with the current state of affairs. If you're one of these people and use Bash, you want 'set -P' or 'set -o physical'.)
(This elaborates on what I said on the Fediverse.)