Wandering Thoughts

2023-01-19

My twitch about adding a shim in front of a (shell script) interpreter

In a comment on my entry on finding people's use of /usr/bin/python, including in '#!' lines in scripts, Alex Shpilkin asked a good question:

As far as I can see, your solution cannot look backwards in time, but in that case, is there any reason not to replace the /usr/bin/python symlink by a small program that logs whatever details you want then execs /usr/bin/python2? [...]

One answer is that while I'm generally reasonably comfortable to put such a shim in front of ordinary programs, I'm relatively twitchy about doing that to something that's the target of '#!' lines in scripts. I don't know if things would go wrong in practice, but I can imagine a number of things that could go badly in theory.

To start with, you'd want to be very careful that the shim program didn't change the environment the script and its interpreter will be running in. You don't want to add, remove, or change environment variables, or accidentally leak file descriptors from your logging to the interpreter, or really anything else that might be observable. Hopefully you can declare it out of scope to run your shim with environment variables that change the behavior of the early runtime environment (eg various 'LD_' environment variables), because making that not affect your shim but pass through to the real interpreter is going to be fun. And obviously you'd want to make it so that no matter what went wrong in your attempt to do logging, your shim went on to exec the real interpreter.

A subtle issue is that from the perspective of the program running the script, its exec succeeded the moment your shim got loaded, even if your shim then can't exec the real interpreter (for example, because trying to load it exceeds some resource limit). If the program would have done something different when the exec of the script failed, well, it's too late. This is probably not too likely, though; an exec usually doesn't fail if the program is there.

More generally, I don't believe that Unixes (ie, kernels) guarantee that the kernel's internal exec of a '#!' interpreter is exactly the same as a user level exec() with the same command line environments. I can imagine a security sensitive Unix applying special marking to such (direct) interpreter processes, and I can also imagine a kernel passing additional information to a (direct) interpreter process through mechanisms such as Linux's auxiliary vector (also).

(In fact in a quick check, on Linux such a shim program causes AT_EXECFN to change. In the normal case it's the filename of the '#!' script, but in the shim case it's the filename of the real interpreter, such as /usr/bin/python2. Whether or not your interpreter cares about this may depend.)

Given all of this, tracing execs from outside is safer and in theory easier (assuming that whatever method of tracing you use can be made to give you this information). It's exactly and precisely a real exec of a '#!' script's interpreter; you're merely arranging to log additional information extracted from the system through a side channel.

InterpreterShimTwitch written at 23:30:16; Add Comment

2023-01-05

The different sorts of 'iconification' of windows in X

In X, application windows can be in a variety of states. They can be on the screen, they can not have been 'mapped' yet, they can be mapped but located off the currently visible area of the screen (many of my windows spend a lot of time in other pages of my virtual desktop), and pretty much since the beginning they can be what was originally called 'iconified' but which these days is often called 'minimized' in documentation that ordinary people read.

(Things like the Inter-Client Communication Conventions Manual (ICCCM) and Extended Window Manager Hints tend to use 'iconify' and similar terminology, but very few people read the ICCCM. Desktop environments and programs generally describe the state and the action as 'minimize'.)

In the usual X manner, what happens when an application is iconified varies a lot between window managers, and there are two general versions. The original window manager approach, still in use by things like fvwm, is that an iconified window is represented as an icon on the root window (the 'desktop'), using an icon that can come from a variety of places, and generally has a little title below it. You can see a number of these iconified windows in this tour of my (2011) desktop, including a use of the default 'X' icon for two iconified windows.

Because everything in X is a window, these icons are windows (as are the title bars); they're protocol level window entities that are managed internally by the window manager. When a regular window is iconified, the window manager creates or re-maps an appropriate window entity for the icon and for the title and puts it in the right spot. If you have such a window manager around, you can see these protocol level window entities with tools like xwininfo. The icon windows are generally using the Shape extension so they don't have to be rectangles.

However, this 'icons on the desktop' approach has been out of fashion since people started building X desktop environments that used the same general ideas as macOS and Windows. Both of those have always used icons on the desktop for passive objects that had yet to be opened or activated, not active applications; active application windows were collected together in a taskbar. In X window managers that follow this general approach, 'iconifying' an application window doesn't involve materializing a second X window entity; it simply updates the visual appearance of some existing X window.

I believe that basically all modern X desktop environments operate this way with some sort of a taskbar; this definitely includes Cinnamon and modern GNOME. Even the modern default configurations of old school window managers like fvwm often come set up to imitate this (the Fedora default fvwm configuration is set up this way, for example).

(Fvwm itself has several ways to do a taskbar equivalent for some or all windows if you want that, and I use one of them for my terminal windows. This shouldn't be a surprise, because the general taskbar approach is a compact and organized way of keeping track of application windows.)

The difference matters to the X server and at the level of the X protocol, because the 'icons on a desktop' approach creates, destroys, and manipulates a lot more top level X window entities than the taskbar approach does, and in some circumstances it may be doing this in a burst (you can have an application start up by creating two iconified top level windows, for example; the window manager will create all four or so icon windows for them basically at once). The natural consequence of this is you can have an X server bug that only shows up with one of these two approaches to iconification.

(At least the condensed reproduction is small enough to fit into a Mastodon post.)

As a side note, I don't know enough about tiling window managers to know how they (or people using them) typically handle iconified (aka minimized) windows. They're not likely to be using the 'icons on root window' approach for the obvious reason that tiling window managers generally don't have any empty space to show the root window (and icons on it). Since iconifying an application window is something the window manager does, the window manager can also just not do it. Applications can ask to be minimized, but the window manager can always ignore them (or treat it as a request for something else, for example to be moved to a secondary off-screen area).

XIconificationManyWays written at 23:11:54; Add Comment

2023-01-01

Research Unix V2 already had a lot of what we think of as 'Unix'

When I looked into how far back Unix's special way of marking login shells goes, I wound up looking at the V2 source of login.s, which is a .s file instead of a .c file because C (the language) was barely starting to be a thing in mid-1972. One of the things that struck me when I looked at the V2 login.s was how much of what we consider standard Unix features were already there in some form in V2.

In no particular order, the default shell is /bin/sh, there is an /etc/motd, /etc/passwd seems to have the idea that your shell might not be /bin/sh but instead something else, and there's both utmp and wtmp (although both are in /tmp, not their later locations). There is even some sort of mail system, using a 'mailbox' file located in people's home directories.

(There is also /dev/tty<X>, but everything being in the filesystem is a big Unix feature and so it doesn't surprise me that it was there very early.)

Another interesting thing is that there are some familiar sounding standard library subroutine names in use, such as 'fopen' and 'getc'. The V2 library source we have reveals some other familiar names, like a 'printf' (implemented in both C and assembly). While we don't have the V2 library source for these functions that I could spot, we do have the V1 manuals, which reveal that fopen did apparently use buffering, unlike the direct system calls.

Other source code we have reveals more familiar things. Init.s says there was a /dev/tty and an /etc/getty, for example. And getty.s says V2's login was in /bin, as you'd expect. Acct.s reveals a 'qsort' function already present in the standard library.

Because I was curious, I looked at the V2 ls.s, and I think the flags it supports are -l, -t, -a, -s, and -d. Based on A Brief History of the 'ls' command, this is the same set of flags that V1 Unix had.

At one level this isn't surprising. I'm pretty certain that we've known from the early writings about Unix that a lot of the core ideas of Unix were there from the start. At another level, it's interesting to me how much was there how early, in a lot like its later form. I wouldn't necessarily have guessed that utmp and wtmp were in V2, for example.

UnixV2HadALot written at 22:44:57; Add Comment

2022-12-23

Handling numbers in Vim when they have a dash in front of them

Over on the Fediverse, I mentioned a Vim situation I'd run into:

Here's a situation Vim's Ctrl-A feature for handily incrementing numbers doesn't readily handle: if you have machines called eg 'something-1' and want to make that 'something-2', 'something-3', and so on. Vim will interpret the '-1' bit as the number -1 and auto-increment it to 'something0'. There is probably a clever Vim way around this, or you can use another option.

(Then people in the replies gave me a number of clever Vim solutions.)

The situation where this came up was that I was writing some new DNS entries for a series of boring names, starting with:

machine-1   IN A 10.x.y.z

I wanted to copy and duplicate in order to make 'machine-2', 'machine-3', and so on entries, so I copied that first line and hit Ctrl-A to increment the machine number, which didn't work because Vim, by default, saw the '-1' as a negative number and duly incremented it to '0'. Visually selecting the number before using Ctrl-A isn't really a great solution for this particular case, because I want to do it repeatedly to create different numbers; at best I'd be repeatedly selecting shorter and shorter columns and incrementing them by one.

Jeff Forcier offered the best general solution for people like me, which is to tell Vim that I don't work with negative numbers by adding 'unsigned to the 'nrformats' setting. I understand why this isn't the default, but I sort of wish it was or at least that Vim was more sophisticated about what it considered a negative number (so that it required a free-standing '-', not one that text before it).

The obvious quick fix if you remember is switching to Ctrl-X to decrement the number. Related to this is that I could start at the end of the sequence instead of the start, so I'd start with, say, 'machine-5', duplicate it a bunch, and then use Ctrl-A to create machine-4, machine-3, and so on. Both of these rely on realizing that Vim is going to interpret the '-1' as a negative number, which is something that doesn't come up very often for me.

Having written this entry, I will hopefully remember the whole issue. I'm not sure if I want to add 'set nrformats+=unsigned' to any .vimrc, though, because that's a subtle trap in its own right. If I put it in, I'll probably forget it, then someday I may actually be working with negative numbers and get another surprise. The current surprise is at least readily understandable.

VimHandlingDashedNumbers written at 22:12:31; Add Comment

2022-12-10

Unix's special way of marking login shells goes back to V2 Unix (at least)

Many Unix shells have some command line argument that tells them that they are a login shell; for example, Bash has '-l' (also '--login'), and I think that '-l' has become the de facto standard. However, this argument is not how Unix programs like sshd and login actually tell your shell that it's a login shell; instead, for a long time, a login shell has a '-' as the first character of its program name, or to put it another way, argv[0][0] is '-'.

Recently I wound up wondering how far back this approach goes. The answer turns out to be more interesting than I expected, and what exactly is done has changed over time. First, as you might expect, this behavior is in V7 Unix, where login.c sticks a '-' on the front of your shell's name (and is confident that your /etc/passwd shell entry will never be more than 14 characters long). In V6 and V5, we have the C source for login (V6, V5), and both versions simply set argv[0] to '-'. I don't think we have user level source for V4 or V3, but it turns out that we do have assembly source for V2's login.s, and to my surprise, it is still following the same approach of having argv[0] be '-' (visible in the definitions of 'mshell' and 'shellp').

So this approach to marking login shells goes back to pretty much the dawn of Unix, and might even be in V1 (again, I don't think we have user level source for V1). Unix has been doing it this way since at least mid 1972, or fifty years now, and for incremental historical compatibility, your system is probably still doing it today.

(Even OpenBSD, the modern Unix I would most expect to have modernized how login shells are marked, has left this intact in their login source code.)

PS: To my surprise, as late as 4.3 BSD, login.c is still assuming that your /etc/passwd shell is 14 characters or less (or otherwise it will overflow a buffer). This finally gets fixed in 4.4 BSD. This is also not fixed in the System III login.c. I haven't looked at System V, since source for that is less readily accessible.

LoginShellMarkerHistory written at 21:53:14; Add Comment

2022-11-25

A revealing Vim addressing mistake that I made today

Today I wound up typing and trying to use the following Vim command (more or less) with a straight face:

:.,$/Match.*Header.*Fields/d

Vim warned and prompted me by asking 'Backwards range given, OK to swap (y/n)?'. In an insufficiency of something (maybe coffee), I told it yes, and was then confused by the results.

What I had was a file that intermixed a bunch of copies of a header line (although each instance had a prefix that differed) and a bunch of data lines. I wanted to get rid of all but the first instance of the header line and keep all of the data lines, so I moved the cursor down a few lines below the first header line and typed the above.

What I was thinking was 'from here to the end' ('.,$'), 'match the header lines' (the /<regexp>/), 'delete the matching lines'. What had slipped out of my mind is that Vim doesn't match multiple lines when you use a bare regular expression this way; instead it goes to the first line that matches the regular expression. And instead of '.,$' and the regular expression being two separate steps, what I'd created was a compound single address, that went from '.' (current position) to '$/<regexp>/', the first line matching the regular expression (starting at the end of the file, which I believe defaults to rolling around to the start of the file). Since the first instance of the regular expression was at the start of the file, this meant that the range was backward, hence Vim's prompt.

What I actually was thinking of was the :g[lobal] command, and after I realized what I'd done wrong, that's what I used. I've used ':g' before, but this may have been the first time I've used it with a range (at least recently), and evidently my mind remembered the idea but forgot the important bit of the actual 'g'. So the real version I used, after I realized it, was only different by the addition of that one crucial character that made all of the difference:

:.,$g/Match.*Header.*Fields/d

(I'd blame my sam reflexes but even in sam this would have required a filter operation. It also would have been somewhat more annoying because sam isn't line based so I'd have had to make sure that the regexp covered the entire line or something of that order.)

VimAddressingMistake written at 22:01:54; Add Comment

2022-11-24

Unix's (technical) history is mostly old now

Yesterday I wrote about how Unix swap configuration used to be simple and brute force, covering a number of cases from V7 Unix through Linux 0.92c. As I wrote that entry, it became increasingly striking to me that the most recent time I mentioned was 1992. This isn't something unique to swap handling, or new in my entries about much of the (technical) origins and evolution of Unix. Instead, it's because a lot of Unix's technical history is at least thirty years old now.

It's not quite the case that nothing has happened in Unix history since the early 1990s. Very obviously, quite a lot of important social things happened around 'Unix', such that by the end of the 1990s what Unixes people used had changed significantly (and then in the 00s the change became drastic). Less obviously, a bunch of internal kernel technology changed over that time, so that today every remaining common Unix has good SMP and in a far better place for performance.

To some degree, technical evolution has also continued in filesystems. The problem is that this evolution is very unevenly distributed, with the most advanced filesystems the least widely used. Unix has made valuable strides in commonly used filesystems, but they aren't drastic ones. And the filesystem related features visible to people using Unix haven't really changed since the early 1990s, especially in common use (there has been no large move to adopt ACLs or file attributes, for example, although file capabilities have snuck into common use on Linux systems).

Some things that were known in the early 1990s but not very adopted have become pervasive, like having a /proc or interacting with your kernel for status information and tuning through a structured API instead of ad-hoc reading (and sometimes writing) kernel memory. However, these changes at least don't feel as big as previous evolutions. It's better that ps operates by reading /proc, but it's still ps.

I think that if you took a Unix user from the early 1990s and dropped them into a 2022 Unix system via SSH, they wouldn't find much that was majorly different in the experience. Admittedly, a system administrator would have a different experience; practices and tools have shifted drastically (for the better).

(It's possible that my perspective leaves me blinded to important things in Unix's technical history and evolution in 2010s, 2000s, and 1990s.)

UnixHistoryMostlyOldNow written at 22:49:37; Add Comment

2022-11-23

Unix swap configuration used to be rather simple and brute force

Modern Unixes generally support rather elaborate configuration of what swap space is available. FreeBSD supports multiple swap devices and can enable and disable them at runtime (cf swapon(8)), including paging things back in in order to let you disabling a swap device that's in use. Linux can go even further, allowing you to swap to files as well as devices (which takes a bunch of work inside the kernel). It will probably not surprise you to hear that early Unixes were not so sophisticated and featureful, and in fact were rather simple and brute force about things.

(It seems that under the right conditions, FreeBSD will also swap to files, cf the handbook section on creating a swap file.)

In V7, there was a single swap device and that device was hard-coded into the kernel at kernel compilation time, as swapdev in sys/conf/c.c (see sys/conf/mkconf.c). V7 similarly hard-coded the root filesystem device. It doesn't look like V7 had anything to control turning swap on and off; it was on all the time.

In the BSD line, 4.2 BSD had swapon(2) to (selectively) enable swapping, but had no way of turning swapping off on a device once you'd turned it on. It did now support swapping on multiple devices, but the (potential) swap devices were hard coded when you built the kernel, somewhat like V7 (cf eg GENERIC/swaphkvmunix.c and the various configuration files in sys/conf). As in V7, the root filesystem was also hard-coded. This relatively fixed set of swapping options continued through at least 4.3 Tahoe (based on reading manual pages).

Interestingly, swapping to a file goes a long way back in Linux; it's supported in 0.96c (from 1992), according to tuhs.org's copy of mm/swap.c. However, Linux 0.96c only supported a single swap area (whether it was a file or a device), and it doesn't look like you could turn swapping off once you turned it on.

I'm not sure how swap configuration worked in System V, especially before System V Release 4. It turns out that archive.org has System V source code available, and in SVR4 the kernel source code suggests that you can add and delete multiple swap devices and swap files, without any need to configure them into the kernel in advance. This may have been what inspired Linux to support swapping to a file so early on in its life, since System V Release 4 dates from the late 1980s.

(Writing all of this down has gotten me to realize just how long ago all of it was. Unix has had pretty capable swap support for more than 30 years now, if you start from System V Release 4 or Linux.)

SwapSetupWasSimple written at 23:12:25; Add Comment

2022-10-12

We are stuck with egrep and fgrep (unless you like beating people)

Over on Twitter I had a reaction:

I see that a bunch of system administrators (or distribution packagers) are going to set GNU Grep 3.8 on fire with the power of their minds. Spoiler: new warnings are an API change.

What's special about GNU Grep 3.8 is that its version of egrep and fgrep now print an extra message when you run them. Specifically, these messages 'warn' (ie nag) you to stop using them and use 'grep -E' and 'grep -F' instead.

(I assume that these messages are printed to standard error, because not even GNU Grep would be so hostile as to put them in standard output.)

There are two problems with this. The first issue is what's going to cause system administrators and distribution packagers to set GNU Grep 3.8 on fire, which is that in practice adding warnings and other new messages is a breaking API change. There are many places where adding new messages will cause breakage and pain (for example, in scripts run from cron where unexpected output will result in the sysadmins getting mailed, perhaps a lot).

The second issue is that this is unpleasant to actual people. There are a lot of people who are used to using (and typing) 'fgrep' and to a lesser extent 'egrep' when they want certain sorts of results from grep. These people don't care that the people behind GNU Grep (and POSIX) don't like these commands; they are used to them, and now they're being nagged (and threatened with actual removal). There's also a lot of writing about Unix out there in the world that uses 'fgrep' and 'egrep'; GNU Grep wants to make all of this perfectly good writing less useful. It doesn't matter to GNU Grep that fgrep and egrep have been present in Unix since V7. Out they go, because POSIX says so.

As a side note, I believe that one reason that people continue using fgrep and egrep is simply that they're easier and faster to type. Typing 'fgrep' is one continuous run of lower case letters; typing 'grep -F' is two more characters, one of them shifted. People are lazy, and GNU Grep wants them to work harder for no particularly good reason.

(Or it wants everyone in the world to create their own 'fgrep' and 'egrep' cover scripts, which is a terrible amount of redundant work.)

If Unix didn't already have fgrep and egrep, people probably wouldn't add them. But this isn't the world we have; this is a world where fgrep and egrep have been around for over 40 years. In that world, Unix is stuck with them unless you want to beat up people's scripts, documentation, writing, habits, and reflexes. Doing that is not progress or being friendly to people.

(I'm definitely not in favor of fossilizing Unix, but there's a difference between avoiding fossilization and the kind of minimal, mathematical purity that we see GNU Grep trying to impose here. Unix has long since passed the point where it had that sort of minimalism in the standard commands. Modern Unix has all sorts of duplications and flourishes that aren't strictly necessary, and for good reasons. One of them is that it's nicer for the actual people using Unix. You don't strictly speaking need 'sort -h', but it's quite convenient.)

EgrepFgrepStuckWith written at 21:42:24; Add Comment

2022-10-09

Research Unix V7's (comparatively) long time gap from V6

Today for reasons outside the scope of this entry I found myself looking at the release dates for the various editions (versions) of Research Unix up through the pivotal V7 release. This made something about the timeline jump out at me.

Starting from the 1st Edition onward through V6, the Bell Labs CSRC set a blistering pace. V1 is dated November 3rd 1971, and was followed by V2 dated June 12, 1972, V3 dated February 1973, V4 dated November 1973 (where the kernel was written in C for the first time), V5 dated June 1974, and then V6 in May of 1975, the longest time gap up until that point but still less than a year from V5. Then the CSRC didn't pause to put together a formal 'edition' that they released until the 7th Edition (V7) in January 1979, more than three years later. This doesn't mean that the Bell Labs CSRC was idle during that time; V7 has an impressive list of developments, so they were clearly busy evolving Research Unix. They just didn't pause to make and publish their work as an edition, for whatever reasons.

(However, I believe that CSRC published about what would later be released as V7 in 1978, in what became a famous issue of the Bell System Technical Journal, Vol. 57, No. 6, Pt. 2 Jul/Aug 1978 (in scanned PDF form).)

Learning this helps illuminate to me why a relatively large number of early Unixes were partially based on V6, such as PWB/UNIX and CB UNIX (an ingredient into System III). Not only was V6 what you had to work with between 1975 and 1979, but that was enough time that people could accumulate all sorts of changes and pass them on to others. And V6 was the current (and only) externally available Unix release for long enough that people could produce things like A Commentary on the UNIX Operating System (which was first assembled a year after V6's release and then printed as a book another year later).

Although I could speculate about reasons why there was such a gap between the publication of V6 and V7, it would be just that, speculation, so I'm going to skip it. If there's an authoritative account about why, I couldn't find it with some casual Internet searches. Although some of the famous Bell Labs people are sadly deceased (cf), enough remain alive that they could be asked, if they haven't already written about it (which I suspect they have, somewhere).

Sidebar: A brief history of Research Unix licensing

According to Wikipedia, V5, V6, and V7 were all licensed to educational institutions (for little to no cost as I remember it). V6 and I believe V7 were licensed to commercial users as well. I believe that V5 was the first version to be talked about much outside of Bell Labs, and certainly it's said to be the first version released to outside people (only in educational institutions, though). Anyone who started with V5 might well have replaced it shortly afterward when V6 came out, assuming that hearing about it, getting together interested people at your institution, finding or funding a machine, and so on didn't take long enough that V6 was out by the time you were ready.

V7LongTimeGap written at 23:15:19; Add Comment

(Previous 10 or go back to October 2022 at 2022/10/07)

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.