Wandering Thoughts archives

2023-12-02

Why Unix kernels have grown caches for directory entries ('name caches')

An interesting feature of modern Unix kernels is that they generally know the names of things like current directories and open files. Traditionally the only thing Unix knew about open files, current directories, active memory mapped files, and so on was their inode (as an in-kernel data structure, including pointers to the inode's mount point and so on). However, some time back various Unixes added in kernel caches of directory entry names and associated data (in Linux these are dentries and the dcache; in FreeBSD there is the name cache). Once a Unix kernel had such a general cache, it could pin all of the entries for active file and directory objects and so generally be able to supply their names, either for system monitoring purposes (such as Linux's /proc/<pid>/fd subdirectory) or so they could support a system call to return the name of the current directory if it had one.

The reason that several Unixes all added these name caches is straightforward; running Unix systems generally do a lot of directory name lookups. The steady addition of shared libraries (which may live in a number of different places), data files for locales and timezones, lots of $PATH entries, and so on didn't improve the situation. Before name caches, each of these lookups had to call into the specific filesystem, which would generally check through whatever the on-disk data structure for directories was; hopefully the actual disk blocks for these directories would already be in the kernel's disk cache, so they didn't have to be read in.

A kernel name cache provides a fast path for all of these lookups. This cache is especially useful for looking up things that are almost certainly already in active use, such as /bin/sh, the core shared library loader, or the C shared library. These are almost always in memory already, so with the right efficient in-memory data structures for name caches, the kernel can go from "/bin/sh" to an inode quite efficiently (and directly, without having to do a bunch of indirection through things like its Virtual Filesystem Switch.

An explicit kernel name cache also has the additional benefit that it can store negative entries (in Linux, negative dentries), which say that a particular name isn't present. There are a fair number of situations on modern Unixes where programs will attempt to find a file in a succession of directories; with negative entries, those checks of all of the directories that the file isn't in can still be pretty efficient. Without some sort of support for 'this name is definitely not here' in the name cache, the kernel would have no choice but to ask the filesystem to search the on-disk directory for the name.

I don't know if there are performance studies for current name caches in current Unix kernels, but I'm sure that they make a real difference (both in lookup speed and in reducing kernel CPU usage). Even in the late 1980s, name lookups were a quite common thing and they relied very heavily on high hit rates in the kernel block cache (I was once involved in studying this in a BSD derived kernel, and I remember hit rates in the high 90%s).

(An interesting read on kernel name translation overhead and optimizing it is the relevant sections in the 4.4 BSD Lite "System Performance" paper.)

PS: Since I looked it up, all of Linux, FreeBSD, OpenBSD, and NetBSD have some form of kernel name caches. I don't know about Illumos or the few surviving commercial Unixes.

Sidebar: The (potential) names of filesystem objects

A directory in a conventional Unix filesystem has either one name or no name (if it's been removed). Because of this, the kernel's name cache can always know the directory's current name if it has one. If it wants to, the name cache can go further and provide the last name that the directory was known by before it was deleted, along with a mark that it was deleted.

A file can have no name (if it's been removed since it was opened or mmap()'d), it can have one name, or it can have several names because there are several hardlinks to it. Because of this the kernel name cache may not necessarily know the current name of an open file. If it started out having multiple hardlinks, was opened through one hardlink, and then that hardlink was removed, the name cache may not know the name of the other remaining hardlink(s).

Even if the name cache does know other names for the file, it's a policy decision if the name cache should provide them or if it should return the original name the file was opened under, along with an indication that the name was removed. In at least some implementations of /proc/<pid>/fd or the equivalent, you can still read the data of now-deleted files, so you don't need a current name to do this and knowing the original now-deleted name the program used may be more useful than knowing a current alternate name.

unix/KernelNameCachesWhy written at 22:51:48;


Page tools: See As Normal.
Search:
Login: Password:

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.