Unix shells and the current directory

November 25, 2023

Famously, Unix has the concept of a process's 'current directory', including for your shell processes. Recently, I saw an interesting Fediverse discussion on some aspects of the current directory which aren't necessarily obvious, partly because both Unix kernels and Unix shells have become more complicated over time.

The Unix kernel keeps track of your current directory not as text path but as a reference to a kernel object, normally the directory's inode. In the old days this was all the kernel actually knew about the current directory, but today Linux (and perhaps other Unixes) have developed kernel caches of the mappings between names and inodes; in Linux, these are dnodes (I believe for 'directory (entry) node'). Linux's dnodes mean that the kernel almost always knows the name of your current directory, if it has one.

(Your current directory may not have a name, for example because the directory has been removed. If you do 'mkdir /tmp/example; cd /tmp/example; rmdir /tmp/example' your current directory still exists in some sense, but it's lost its name and most everything else.)

One of the ways that Linux uses its knowledge of the file (path) names of things is with /proc/*/cwd and more generally all of the file names in /proc/*/fd. Another way is that the Linux kernel has a getcwd() system call that returns this information to you, which is what getcwd(3) normally uses. Interested parties can see this system call being used in the depths of 'strace /bin/pwd'. On systems that don't have a getcwd() system call, how /bin/pwd works is much more brute force. The traditional implementation is to stat() '.', the current directory, and then read through '..', the parent directory, until you find the name for the current directory (which you can identify by its inode number). Then repeat for the next level up until you've reached the root directory, and put all of the name components together into the path.

(It appears that FreeBSD also has a similar system call, based on what 'truss /bin/pwd' reports, and the OpenBSD getcwd(3) manual page says that OpenBSD does too. FreeBSD appears to implement this with a name cache in the kernel. I haven't looked at the OpenBSD kernel source.)

Complicating this picture is shells. For a long time, many shells have kept track of a name for their current directory themselves, often materializing this in the '$PWD' environment variable. The shell has to keep track of this name as a text string or the rough equivalent, which makes it potentially less accurate than the kernel's version. However, it has some advantages, because unlike the kernel, the shell knows what name you typed in order to get to the directory, which may not be the actual filesystem name of the directory because of things like symbolic links. Shells often use this knowledge so that names like '..' and even '.' work on the text version, not the filesystem version.

(Sometimes people then write shell scripts and other code that assumes '$PWD' is accurate if it's present, which is not necessarily true. Sadness often ensues. Because '$PWD' is a regular environment variable, it's not automatically updated when someone's code does a chdir() call.)

That sounds abstract so here's an example. If you typed 'cd /u/cks' and /u/cks is actually a symbolic link to /h/281/cks, the kernel only knows your current directory as '/h/281/cks'. If you ask the kernel to change directory to '..', the parent, you will wind up in /h/281. If you ask the shell to 'cd ..', the shell can put you in /u, the textual parent of '/u/cks', the path you typed to get to your current directory. Shells with this behavior usually have a 'pwd' builtin that prints their 'logical' text view of your current directory, as opposed to the filesystem view that /bin/pwd will print.

All of this shell behavior (and Unix kernel behavior) isn't necessarily well known (or perhaps clearly documented, although all of the pieces are there if you read enough things). Usually this doesn't matter because everything works well enough and does what people expect.

(Some people have strong reactions to shells using their 'logical path' instead of the filesystem path for things like 'cd ..'. These people are less happy with the current state of affairs. If you're one of these people and use Bash, you want 'set -P' or 'set -o physical'.)

(This elaborates on what I said on the Fediverse.)

Written on 25 November 2023.
« A peculiarity of the GNU Coreutils version of 'test' and '['
The HTML viewport mess »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Nov 25 22:53:01 2023
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.