Wandering Thoughts

2024-03-05

A peculiarity of the X Window System: Windows all the way down

Every window system has windows, as an entity. Usually we think of these as being used for, well, windows and window like things; application windows, those extremely annoying pop-up modal dialogs that are always interrupting you at the wrong time, even perhaps things like pop-up menus. In its original state, X has more windows than that. Part of how and why it does this is that X allows windows to nest inside each other, in a window tree, which you can still see today with 'xwininfo -root -tree'.

One of the reasons that X has copious nested windows is that X was designed with a particular model of writing X programs in mind, and that model made everything into a (nested) window. Seriously, everything. In an old fashioned X application, windows are everywhere. Buttons are windows (or several windows if they're radio buttons or the like), text areas are windows, menu entries are each a window of their own within the window that is the menu, visible containers of things are windows (with more windows nested inside them), and so on.

This copious use of windows allows a lot of things to happen on the server side, because various things (like mouse cursors) are defined on a per-window basis, and also windows can be created with things like server-set borders. So the X server can render sub-window borders to give your buttons an outline and automatically change the cursor when the mouse moves into and out of a sub-window, all without the client having to do anything. And often input events like mouse clicks or keys can be specifically tied to some sub-window, so your program doesn't have to hunt through its widget geometry to figure out what was clicked. There are more tricks; for example, you can get 'enter' and 'leave' events when the mouse enters or leaves a (sub)window, which programs can use to highlight the current thing (ie, subwindow) under the cursor without the full cost of constantly tracking mouse motion and working out what widget is under the cursor every time.

The old, classical X toolkits like Xt and the Athena widget set (Xaw) heavily used this 'tree of nested windows' approach, and you can still see large window trees with 'xwininfo' when you apply it to old applications with lots of visible buttons; one example is 'xfontsel'. Even the venerable xterm normally contains a nested window (for the scrollbar, which I believe it uses partly to automatically change the X cursor when you move the mouse into the scrollbar). However, this doesn't seem to be universal; when I look at one Xaw-based application I have handy, it doesn't seem to use subwindows despite having a list widget of things to click on. Presumably in Xaw and perhaps Xt it depends on what sort of widget you're using, with some widgets using sub-windows and some not. Another program, written using Tk, does use subwindows for its buttons (with them clearly visible in 'xwininfo -tree').

This approach fell out of favour for various reasons, but certainly one significant one is that it's strongly tied to X's server side rendering. Because these subwindows are 'on top of' their parent (sub)windows, they have to be rendered individually; otherwise they'll cover what was rendered into the parent (and naturally they clip what is rendered to them to their visible boundaries). If you're sending rendering commands to the server, this is just a matter of what windows they're for and what coordinates you draw at, but if you render on the client, you have to ship over a ton of little buffers (one for each sub-window) instead of one big one for your whole window, and in fact you're probably sending extra data (the parts of all of the parent windows that gets covered up by child windows).

So in modern toolkits, the top level window and everything in it is generally only one X window with no nested subwindows, and all buttons and other UI elements are drawn by the client directly into that window (usually with client side drawing). The client itself tracks the mouse pointer and sends 'change the cursors to <X>' requests to the server as the pointer moves in and out of UI elements that should have different mouse cursors, and when it gets events, the client searches its own widget hierarchy to decide what should handle them (possibly including client side window decorations (CSD)).

(I think toolkits may create some invisible sub-windows for event handling reasons. Gnome-terminal and other Gnome applications appear to create a 1x1 sub-window, for example.)

As a side note, another place you can still find this many-window style is in some old fashioned X window managers, such as fvwm. When fvwm puts a frame around a window (such as the ones visible on windows on my desktop), the specific elements of the frame (the title bar, any buttons in the title bar, the side and corner drag-to-resize areas, and so on) are all separate X sub-windows. One thing I believe this is used for is to automatically show an appropriate mouse cursor when the mouse is over the right spot. For example, if your mouse is in the right side 'grab to resize right' border, the mouse cursor changes to show you this.

(The window managers for modern desktops, like Cinnamon, don't handle their window manager decorations like this; they draw everything as decorations and handle the 'widget' nature of title bar buttons and so on internally.)

XWindowsAllTheWayDown written at 21:26:30; Add Comment

2024-03-04

An illustration of how much X cares about memory usage

In a comment on yesterday's entry talking about X's server side graphics rendering, B.Preston mentioned that another reason for this was to conserve memory. This is very true. In general, X is extremely conservative about requiring memory, sometimes to what we now consider extreme lengths, and there are specific protocol features (or limitations) related to this.

The modern approach to multi-window graphics rendering is that each window renders into a buffer that it owns (often with hardware assistance) and then the server composites (appropriate parts of) all of these buffers together to make up the visible screen. Often this compositing is done in hardware, enabling you to spin a cube of desktops and their windows around in real time. One of the things that clients simply don't worry about (at least for their graphics) is what happens when someone else's window is partially or completely on top of their window. From the client's perspective, nothing happens; they keep drawing into their buffer and their buffer is just as it was before, and all of the occlusion and stacking and so on are handled by the composition process.

(In this model, a client program's buffer doesn't normally get changed or taken away behind the client's back, although the client may flip between multiple buffers, only displaying one while completely repainting another.)

The X protocol specifically does not require such memory consuming luxuries as a separate buffer for each window, and early X implementations did not have them. An X server might have only one significant-sized buffer, that being screen memory itself, and X clients drew right on to their portion of the screen (by sending the X server drawing commands, because they didn't have direct access to screen memory). The X server would carefully clip client draw operations to only touch the visible pixels of the client's window. When you moved a window to be on top of part of another window, the X server simply threw away (well, overwrote) the 'under' portion of the other window. When the window on top was moved back away again, the X server mostly dealt with this by sending your client a notification that parts of its window had become visible and the client should repaint them.

(X was far from alone with this model, since at the time almost everyone was facing similar or worse memory constraints.)

The problem with this 'damage and repaint' model is that it can be janky; when a window is moved away, you get an ugly result until the client has had the time to do a redraw, which may take a while. So the X server had some additional protocol level features, called 'backing store' and 'save-under(s)'. If a given X server supported these (and it didn't have to), the client could request (usually during window creation) that the server maintain a copy of the obscured bits of the new window when it was covered by something else ('backing store') and separately that when this window covered part of another window, the obscured parts of that window should be saved ('save-under', which you might set for a transient pop-up window). Even if the server supported these features in general it could specifically stop doing them for you at any time it felt like it, and your client had to cope.

(The X server can also give your window backing store whether or not you asked for it, at its own discretion.)

All of this was to allow an X server to flexibly manage the amount of memory it used on behalf of clients. If an X server had a lot of memory, it could give everything backing store; if it started running short, it could throw some or all of the backing store out and reduce things down to (almost) a model where the major memory use was the screen itself. Even today you can probably arrange to start an X server in a mode where it doesn't have backing store (the '-bs' command line option, cf Xserver(1), which you can try in Xnest or the like today, and also '-wm'). I have a vague memory that back in the day there were serious arguments about whether or not you should disable backing store in order to speed up your X server, although I no longer have any memory about why that would be so (but see).

As far as I know all X servers normally operate with backing store these days. I wouldn't be surprised if some modern X clients would work rather badly if you ran them on an X server that had backing store forced off (much as I suspect that few modern programs will cope well with PseudoColor displays).

PS: Now that I look at 'xdpyinfo', my X server reports 'options: backing-store WHEN MAPPED, save-unders NO'. I suspect that this is a common default, since you don't really need save-unders if everything has backing store enabled when it's visible (well, in X mapped is not quite 'visible', cf, but close enough).

XServerBackingStoreOptional written at 22:02:53; Add Comment

2024-03-03

X graphics rendering as contrasted to Wayland rendering

Recently, Thomas Adam (of fvwm fame) pointed out on the FVWM mailing list (here, also) a difference between X and Wayland that I'd been vaguely aware of before but hadn't actually thought much about. Today I feel like writing it down in my own words for various reasons.

X is a very old protocol (dating from the mid to late 1980s), and one aspect of that is that it contains things that modern graphics protocols don't. From a modern point of view, it isn't wrong to describe X as several protocols in a trenchcoat. Two of the largest such protocols are one for what you could call window management (including event handling) and a second one for graphics rendering. In the original vision of X, clients used the X server as their rendering engine, sending a series of 2D graphics commands to the server to draw things like lines, rectangles, arcs, and text. In the days of 10 Mbit/second local area networks and also slow inter-process communication on your local Unix machine, this was a relatively important part of both X's network transparency story and X's performance in general. We can call this server (side) rendering.

(If you look at the X server drawing APIs, you may notice that they're rather minimal and generally lack features that you'd like to do modern graphics. Some of this was semi-fixed in X protocol extensions, but in general the server side X rendering APIs are rather 1980s.)

However, X clients didn't have to do their rendering in the server. Right from the beginning they could render to a bitmap on the client side and then shove the bitmap over to the server somehow (the exact mechanisms depend on what X extensions are available). Over time, more and more clients started doing more and more client (side) rendering, where they rendered everything under their own control using their own code (well, realistically a library or a stack of them, especially for complex things like rendering fonts). Today, many clients and many common client libraries are entirely or almost entirely using client side rendering, in part to get modern graphics features that people want, and these days clients even do client side (window) decoration (CSD), where they draw 'standard' window buttons themselves.

(This tends to make window buttons not so standard any more, especially across libraries and toolkits.)

As a protocol designed relatively recently, Wayland is not several protocols in a trenchcoat. Instead, the (core) Wayland protocol is only for window management (including event handling), and it has no server side rendering. Wayland clients have to do client side rendering in order to display anything, using whatever libraries they find convenient for this. Of course this 'rendering' may be a series of OpenGL commands that are drawn on to a buffer that's shared with the Wayland server (what is called direct rendering (cf), which is also the common way to do client side rendering in X), but this is in some sense a detail. Wayland clients can simply render to bitmaps and then push those bitmaps to a server, and I believe this is part of how waypipe operates under the covers.

(Since Wayland was more or less targeted at environments with toolkits that already had their own graphics rendering APIs and were already generally doing client side rendering, this wasn't seen as a drawback. My impression is that these non-X graphics APIs were already in common use in many modern clients, since it includes things like Cairo. One reason that people switched to such libraries and their APIs even before Wayland is that the X drawing APIs are, well, very 1980s, and don't have a lot of features that modern graphics programming would like. And you can draw directly to a Wayland buffer if you want to, cf this example.)

One implication of this is that some current X programs are much easier to port (or migrate) to Wayland than others. The more an X program uses server side X rendering, the more it can't simply be re-targeted to Wayland, because it needs a client side library to substitute for the X server side rendering functionality. Generally such programs are either old or were deliberately written to be minimal X clients that didn't depend on toolkits like Gtk or even Cairo.

(Substituting in a stand alone client side drawing library is probably not a small job, since I don't think any of them so far are built to be API compatible with the relevant X APIs. It also means taking on additional dependencies for your program, although my impression is that some basic graphics libraries are essentially standards by now.)

XRenderingVsWaylandRendering written at 22:56:12; Add Comment

2024-02-15

(Some) X window managers deliberately use off-screen windows

I mentioned recently that the X Window System allows you to position (X) windows so that they're partially or completely off the screen (when I wrote about how I accidentally put some icons off screen). Some window managers, such as fvwm, actually make significant use of this X capability.

To start with, windows can be off screen in any direction, because X permits negative coordinates for window locations (both horizontally and vertically). Since the top left of the screen is 0, 0 in the coordinate system, windows with a negative X are often said to be off screen to the left, and ones with a negative Y are off screen 'above', to go with a large enough positive X being 'to the right' and a positive Y being 'below'. If a window is completely off the screen, its relative location is in some sense immaterial, but this makes it easier to talk about some other things.

(Windows can also be partially off screen, in which case it does matter that negative Y is 'above' and negative X is 'left', because the bottom or the right part of such a window is what will be visible on screen.)

Fvwm has a concept of a 'virtual desktop' that can be larger than your physical display (or displays added together), normally expressed in units of your normal monitor configuration; for example, my virtual desktop is three wide by two high, creating six of what Fvwm calls pages. Fvwm calls the portion of the virtual desktop that you can see the viewport, and many people (me included) keep the viewport aligned with pages. You can then talk about things like flipping between pages, which is technically moving the viewport to or between pages.

When you change pages or in general move the viewport, Fvwm changes the X position of windows so that they are in the right (pixel) spot relative to the new page. For instance, if you have a 1280 pixel wide display and a window positioned with its left edge at 0, then you move one Fvwm page to your right, Fvwm changes the window's X coordinate to be -1280. If you want, you can then use X tools or other means to move the window around on its old page, and when you flip back to the page Fvwm will respect that new location. If you move the window to be 200 pixels away from the left edge, making it's X position -1080, when you change back to that page Fvwm will put the window's left edge at an X position of 200 pixels.

This is an elegant way to avoid having to keep track of the nominal position of off-screen windows; you just have X do it for you. If you have a 1280 x 1024 display and you move one page to the left, you merely add 1280 pixels to the X position of the (X) windows being displayed. Windows on the old page will now be off screen, while windows on the new page will come back on screen.

I think most X desktop environments and window managers have moved away from this simple and brute force approach to handle windows that are off screen because you've moved your virtual screen or workspace or whatever the environment's term is. I did a quick test in Cinnamon, and it didn't seem to change window positions this way.

(There are other ways in X to make windows disappear and reappear, so Cinnamon is using one of them.)

XOffscreenWindowsUse written at 22:51:41; Add Comment

2024-02-08

Accidentally making windows vanish in my old-fashioned Unix X environment

One of the somewhat odd things about my old fashioned X Window System environment is that when I 'iconify' or 'minimize' a window, it (mostly) winds up as an actual icon on my root window (what in some environments would be called the desktop), in contrast to the alternate approach where the minimized window is represented in some sort of taskbar. I have strong opinions about where some of these icons should go, and some tools to automatically arrange this for various windows, including the GNU Emacs windows I (now) use for reading email.

Recently I started a new MH-E GNU Emacs session from home, did some stuff, iconified it, and GNU Emacs disappeared entirely. There were no windows and no icons. I scratched my head, killed the process, started it up again, and the same thing happened all over again. Only when I was starting to go through the startup process a third time did I realize what was going on and the mistake I'd made. You see, I'd told my window manager to put the GNU Emacs icons off screen (to the right) and my window manager had faithfully obliged me. Normally I could have recovered from this by moving my virtual screen over to the right of where it had previously been, but I'd also told my window manager to position the icons for GNU Emacs relative to the current virtual screen, not the one it had been iconified on.

(In fvwm terms, I'd set GNU Emacs to have a 'sticky' icon, which normally means that it stays on your screen as you move around between virtual screens.)

How I could do this starts with how I was setting the icon position for the GNU Emacs I was reading email. Unlike (some) X programs, GNU Emacs doesn't take icon positions as a command line argument (as far as I know), but it does support setting icon positions through Lisp. However, I use GNU Emacs on one of our servers to read my email (with X forwarding) from both my work desktop and my home desktop, and they have different display configurations; work has two side by side 4K displays, with the GNU Emacs icons on the right display, and at home I have a single display (and I make more use of multiple virtual screens). Since the icons are positioned at different spots, I have two Lisp functions to set the icon position ('home-icon-position' and 'work-icon-position', more or less).

So that morning, what I did was I started GNU Emacs from my home machine and ran the 'work-icon-position', which told my window manager (via GNU Emacs) that I wanted the icon to have the left position of '5070' pixels. Since I was using a single display that is only 3840 pixels wide, fvwm dutifully carried out my exact instructions and put the icon 1230 pixels or so off to the right of my actual display.

(And then fvwm kept the icon 1230 pixels off the right side when I switched virtual screens, because that's also what I'd told fvwm to do.)

Icons are (little) windows and X is perfectly happy to let you position windows off screen (in any direction, you can put windows at negative coordinates if you want). As you'd expect, a window that is positioned entirely off screen isn't visible. So the actual mechanics of this icon position setting was no problem, and fvwm isn't the kind of program that second-guesses you when you position an icon off screen. So when I positioned the GNU Emacs icons off screen, fvwm put them off screen and they disappeared.

PS: I could have recovered the iconified Emacs in various ways, for example by locating it in various ways and having it deiconify, or explicitly moving its icon back onto the screen. It was just simpler and faster, in my state that morning, to terminate an Emacs I hadn't done much with and try again.

XOffscreenIconMistake written at 23:13:17; Add Comment

2024-01-11

An old Unix mistake you could make when signaling init (PID 1)

Init is the traditional name for the program that is run to be process ID 1, which is the ultimate ancestor of all Unix processes and historically in charge of managing the system. Process ID 1 is sufficiently crucial to the system that either it can't be killed or the system will reboot if it exits (or both, and this reboot is a hack). These days on Linux, PID 1 often isn't literally a binary and process called 'init', but the *BSDs have stuck with an 'init' binary.

Historically there have been a number of reasons for the system administrator to send signals to init, which you can still see documented for modern Unixes in places like the FreeBSD init(8) manual page. One of them was to reread the list of serial ports to offer login prompts on and often in the process to re-offer logins on any ports init had given up on, for example because the serial getty on them was starting and exiting too fast. Traditionally and even today, this is done by sending init a SIGHUP signal.

The kill program has supported sending signals by name for a long time, but sysadmins are lazy and we tend to have memorized that SIGHUP is signal 1 (and signal 9 is SIGKILL). So it was not unusual to type this as 'kill -1 1', sending signal 1 (SIGHUP) to process ID 1 (init). However, this version is a bit dangerous, because it's one extra repeated character away from a version with much different effects:

kill -1 -1

This is only one accidental unthinking repetition of '-1' (instead of typing '1') away from the version you want. Unfortunately the change is very bad.

(My view is that using 'kill -HUP 1' makes this much less likely because now you can't just repeat the '-1', although you can still reflexively type a '-' in front of both arguments.)

The destination process ID '-1' is very special, especially if you're root at the time. In both kill(1) and the kill(2) system call, using -1 as root means '(almost) all processes on the system'. So the addition of one extra character, a repeat of one you were just using, has turned this from sending a SIGHUP signal to init to sending a SIGHUP to pretty much every user and daemon process that's currently running. Some of them will have harmless reactions to this, like re-reading configuration files or re-executing themselves, but many processes will exit abruptly, including some number of daemon processes.

Back in the days when you were more likely to be SIGHUP'ing init in the first place, doing this by accident was not infrequently a good way to have to reboot your system. Even as recently as a decade ago, doing a 'kill -1 -1' as root by accident (for another reason) was a good way to have to reboot.

(At this point I can't remember if I ever accidentally made this mistake back in the old days, although I have typed 'kill -1 -1' in the wrong context.)

InitOldSignalMistake written at 23:06:11; Add Comment

2024-01-02

Why Unix's lseek() has that name instead of 'seek()'

Over on the Fediverse Matthew Garrett said something which sparked a question from Nicolás Alvarez:

@mjg59: This has been bothering me for literally decades, but: why is the naming for fstat/lstat not consistent with fseek/lseek

@nicolas17: why is it even called lseek instead of seek?

The most comprehensive answer to both questions came from Zack Weinberg's post, with a posting by наб and also some things from me adding additional historical information about lseek(). So today I'm going to summarize the situation with some additional information that's not completely obvious.

The first version of Unix (V1) had a 'seek()' system call. Although C did not yet exist, this system call took three of what would be ints as arguments, Since Unix was being written on the Digital PDP-11, a '16-bit' computer, these future ints were the natural register size of the PDP-11, which is to say they were 16 bits. Even at the time this was recognized as a problem; the OCR'd V1 seek() manual page says (transformed from hard formatting, and cf):

BUGS: A file can conceptually be as large as 2**20 bytes. Clearly only 2**16 bytes can be addressed by seek. The problem is most acute on the tape files and RK and RF. Something is going to be done about this.

V1 also had a closely related tell() system call, that gave you information about the current file offset. The V1 seek() was system call 19, and tell() was system call 20. The tell() system call seems to disappear rapidly, but its system call number remained reserved for some time. In the V4 sysent.c it's 'no system call', and then in the V5 sysent.c system call 20 is getpid().

In V4 Unix, seek() still uses what are now C ints, but seek()'s manual page documents a very special hack to extend its range. If the third parameter is 3, 4, or 5 instead of 0, 1, or 2, the seek offset is multiplied by 512. At this point, C apparently didn't yet have a long type that could be used to get 32-bit integers on the PDP-11, so the actual kernel implementation of seek() used an array of two ints (in ken/sys2.c), an implementation that stays more or less the same through V6's kernel seek() (still in ken/sys2.c).

(The V6 C compiler appears to have implemented support for a new 'long' C type modifier, but it doesn't seem to have been documented in the C manual or used in, eg, the kernel's seek() implementation. Interested parties can play around with it in places like this online V6 emulator.)

Then finally in V7, we have C longs and along with them a (renamed) version of the seek() system call that finally fixes the limited range issue by using longs instead of ints for the relevant arguments (the off_t type would be many years in the future). However, the V7 lseek() system call thriftily reuses seek()'s system call number 19 (cf libc/sys/lseek.s, and you can compare this against the V5 lseek.s). It seems probable that this is why V7 renamed the system call from seek() to lseek(), in order to force any old code using seek() to fail to link. Since V7 C did not have function prototypes (they too were years in the future), old code that called seek() with int arguments would almost certainly have malfunctioned, passing random things from the stack to the kernel as part of the system call arguments.

(Old V6 binaries were on their own, but presumably this wasn't seen as a problem in the early days of Unix.)

So the reason Unix uses 'lseek()' instead of 'seek()' is that it once had a 'seek()' system call that took ints as arguments instead of longs, and when this system call changed to take longs it was renamed to have an l in front to mark this, becoming 'lseek()'. The 'l' here is for 'long'. However, as covered by Zack Weinberg, this is an odd use of 'l' in Unix system call names. In the stat() versus lstat() case, the 'l' is for special treatment of symbolic names, and both versions of the system call still exist.

LseekWhyNamedThat written at 23:03:15; Add Comment

2023-12-03

A bit more trivia on the Unix V6 shell and its control flow

Over on the Fediverse, I posted about how the V6 'goto' and 'exit' worked, and got a good question in response, namely how did 'goto' and 'exit' get hold of the file descriptor for the script that the V6 shell was executing. The answer turns out to be that the V6 shell always read from standard input (fd 0). If it was running a script, it arranged to open the script with file descriptor 0 (standard input), which it passed on to all children as usual.

(In my Fediverse answer I said it used 'dup()', but the V6 sh.c says I'm wrong. V6 sh.c simply closed fd 0 just before it open()'d the script, insuring that the open() would give it fd 0. There was also special code to 'read' from the -c command line option instead of standard input.)

Obviously this has certain limitations, like you'd better not write shell scripts using programs that read their standard input as a side effect of other operations (at least not without a '</dev/null' or the like). But as Gaelan Steele noted, it's a very early Unix way to handle the whole thing.

Also, Norman Wilson noted that this is where we get the ':' command to do nothing, which would later be one of the ways of writing comments in shell scripts. However, V6 isn't where either ':' or 'goto' appear; a version of 'goto' can be found as far back as V2's goto.c, and just as in V6 it searches through standard input for a line with ': ' at the start. I believe that ':' was always built in to the Unix shell, making it one of the oldest builtins along with 'chdir' (what eventually became 'cd'), although it's not mentioned in the V3 sh manpage.

(Unfortunately we don't seem to have the source for the V3 shell, and neither source nor manual page for the V2 shell. But I can't imagine early Unix not making ':' a built in command like 'chdir', and we know you could put ':' in shell scripts due to 'goto'.)

Update: To explain the connection between 'goto' and ':' a bit more, how 'goto' worked is that your script did 'goto label' and goto searched through the script for a line that said ': label' (for an arbitrary word as the label) and positioned execution there. In order to make this line valid in a shell script, there was a ':' shell builtin that did nothing.

V6ShellControlFlowII written at 21:38:17; Add Comment

2023-12-02

Why Unix kernels have grown caches for directory entries ('name caches')

An interesting feature of modern Unix kernels is that they generally know the names of things like current directories and open files. Traditionally the only thing Unix knew about open files, current directories, active memory mapped files, and so on was their inode (as an in-kernel data structure, including pointers to the inode's mount point and so on). However, some time back various Unixes added in kernel caches of directory entry names and associated data (in Linux these are dentries and the dcache; in FreeBSD there is the name cache). Once a Unix kernel had such a general cache, it could pin all of the entries for active file and directory objects and so generally be able to supply their names, either for system monitoring purposes (such as Linux's /proc/<pid>/fd subdirectory) or so they could support a system call to return the name of the current directory if it had one.

The reason that several Unixes all added these name caches is straightforward; running Unix systems generally do a lot of directory name lookups. The steady addition of shared libraries (which may live in a number of different places), data files for locales and timezones, lots of $PATH entries, and so on didn't improve the situation. Before name caches, each of these lookups had to call into the specific filesystem, which would generally check through whatever the on-disk data structure for directories was; hopefully the actual disk blocks for these directories would already be in the kernel's disk cache, so they didn't have to be read in.

A kernel name cache provides a fast path for all of these lookups. This cache is especially useful for looking up things that are almost certainly already in active use, such as /bin/sh, the core shared library loader, or the C shared library. These are almost always in memory already, so with the right efficient in-memory data structures for name caches, the kernel can go from "/bin/sh" to an inode quite efficiently (and directly, without having to do a bunch of indirection through things like its Virtual Filesystem Switch.

An explicit kernel name cache also has the additional benefit that it can store negative entries (in Linux, negative dentries), which say that a particular name isn't present. There are a fair number of situations on modern Unixes where programs will attempt to find a file in a succession of directories; with negative entries, those checks of all of the directories that the file isn't in can still be pretty efficient. Without some sort of support for 'this name is definitely not here' in the name cache, the kernel would have no choice but to ask the filesystem to search the on-disk directory for the name.

I don't know if there are performance studies for current name caches in current Unix kernels, but I'm sure that they make a real difference (both in lookup speed and in reducing kernel CPU usage). Even in the late 1980s, name lookups were a quite common thing and they relied very heavily on high hit rates in the kernel block cache (I was once involved in studying this in a BSD derived kernel, and I remember hit rates in the high 90%s).

(An interesting read on kernel name translation overhead and optimizing it is the relevant sections in the 4.4 BSD Lite "System Performance" paper.)

PS: Since I looked it up, all of Linux, FreeBSD, OpenBSD, and NetBSD have some form of kernel name caches. I don't know about Illumos or the few surviving commercial Unixes.

Sidebar: The (potential) names of filesystem objects

A directory in a conventional Unix filesystem has either one name or no name (if it's been removed). Because of this, the kernel's name cache can always know the directory's current name if it has one. If it wants to, the name cache can go further and provide the last name that the directory was known by before it was deleted, along with a mark that it was deleted.

A file can have no name (if it's been removed since it was opened or mmap()'d), it can have one name, or it can have several names because there are several hardlinks to it. Because of this the kernel name cache may not necessarily know the current name of an open file. If it started out having multiple hardlinks, was opened through one hardlink, and then that hardlink was removed, the name cache may not know the name of the other remaining hardlink(s).

Even if the name cache does know other names for the file, it's a policy decision if the name cache should provide them or if it should return the original name the file was opened under, along with an indication that the name was removed. In at least some implementations of /proc/<pid>/fd or the equivalent, you can still read the data of now-deleted files, so you don't need a current name to do this and knowing the original now-deleted name the program used may be more useful than knowing a current alternate name.

KernelNameCachesWhy written at 22:51:48; Add Comment

2023-12-01

The Unix V6 shell and how control flow worked in it

On Unix, 'test' and '[' are two names for (almost) the same program and shell builtin. Although today people mostly use it under its '[' name, when it was introduced in V7 along side the Bourne shell, it only was called 'test'; the '[' name was only nascent until years later. I don't know for sure why it was called 'test', but there are interesting hints about its potential genesis in the shell used in V6 Research Unix, the predecessor to V7, and the control flow constructs that shell used.

(This shell is sometimes called the Mashey shell, but the version you can find described in Wikipedia as the 'PWB shell' is rather more elaborate than the V6 sh manual page describes or the V6 sh.c seems to implement.)

The V6 shell didn't have control flow constructs as such; instead it outsourced them to external commands, such as goto(1), exit(1) and if(1). The goto and exit commands worked by changing the seek offset in the shell script, which worked because the V6 shell didn't use buffered input; it read its input line by line (or actually character by character). When they changed the seek offset for a shell script, they caused the V6 shell to read the next line from the new offset, creating either a goto (if the new position wasn't at the end of the file) or an exit (if it was).

The V6 'if' command is even more interesting for us, because it actually isn't a control flow construct as such (although you could use it for control flow in combination with 'goto'). What it does is conditionally execute another command. To quote the manual page's usage section a bit:

if expr command [ arg ... ]

The 'expr' isn't a single argument, it's an expression, as we can see in usr/bin/chk, and the syntax of the expression is quite similar to what the V7 'test' accepts. It's easy to see how this could be translated into the V7 Bourne shell almost equivalent of 'test expr && command [ arg ...]', and if you're translating existing scripts that way, a command with a regular name like 'test' makes sense. If the V7 'test' accepted all of the V6 'if' expressions, you could mostly do it with some straightforward 'ed' commands.

Although I've talked about V6, it turns out that an 'if' command in Research Unix goes back very far. Unix V2 has an if.c that does more or less the same thing as the V6 'if', although I haven't checked the expressions it accepts. The 'if' manual page in V6 cross references 'find' (presumably for a similar way of constructing expressions), but it looks like 'find' appeared later; the earliest I can find is the V5 find.c. So the basic idea of 'run a command if an expression is true' is a very old Unix idea. The V7 Bourne shell 'test expr && command' is merely the culminating form.

(I'm sure Research Unix didn't invent the general idea of executing a command only conditionally. I haven't dug deeply into things like Multics, one obvious source of ideas that Research Unix was drawing on, but I did spot its 'exec_com'. I don't know if earlier operating systems used a standalone 'if' command equivalent or included it as part of a more complicated execution control system.)

V6ShellControlFlow written at 23:11:14; Add Comment

(Previous 10 or go back to November 2023 at 2023/11/25)

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.