Wandering Thoughts


How Unix shells used to be used as an access control mechanism

Once upon a time, one of the ways that system administrators controlled who could log in to what server was by assigning special administrative shells to logins, either on a particular system or across your entire server fleet. Today, special shells (mostly) aren't an effective mechanism for this any more, so modern Unix people may not have much exposure to this idea. However, vestiges of this live on in typical Unix configurations, in the form of /sbin/nologin (sometimes in /usr/sbin) and how many system accounts have this set as their shell in /etc/passwd.

The normal thing for /sbin/nologin to do when run is to print something like 'This account is currently not available.' and exit with status 1 (in a surprising bit of cross-Unix agreement, all of Linux, FreeBSD, and OpenBSD nologin appear to print exactly the same message). By making this the shell of some account, anything that executes an account's shell as part of accessing it will fail, so login (locally or over SSH) and a normal su will both fail. Typical versions of su usually have special features to keep you from overriding this by supplying your own shell (often involving /etc/shells, or deliberately running the /etc/passwd shell for the user). Otherwise, there is nothing that prevents processes from running under the login's UID, and in fact it's extremely common for such system accounts to be running various processes.

Unix system administrators have long used this basic idea for their own purposes, creating their own fleet of administrative shells to, for example, tell you that a machine was only accessible by staff. You would then arrange for all non-staff logins on the machine to have that shell as their login shell (there might be such logins if, for example, the machine is a NFS fileserver). Taking the idea one step further, you might suspend accounts before deleting them by changing the account's shell to an administrative shell that printed out 'your account is suspended and will be deleted soon, contact <X> if you think this is a terrible mistake' and then exited. In an era when everyone accessed your services by logging in to your machines through SSH (or earlier, rlogin and telnet), this was an effective way of getting someone's attention and a reasonably effective way of denying them access (although even back then, the details could be complex).

(Our process for disabling accounts gives such accounts a special shell, but it's mostly as a marker for us for reasons covered in that entry.)

You could also use administrative shells to enforce special actions when people logged in. For example, newly created logins might be given a special shell that would make them agree to your usage policies, force them to change their password, and then through magic change their shell to a regular shell. Some of this could be done through existing system features (sometimes there was a way to mark a passwd entry so that it forced an immediate password change), but generally not all of it. Again, this worked well when you could count on people starting using your systems by logging in at the Unix level (which generally is no longer true).

Sensible system administrators didn't try to use administrative shells to restrict what people could do on a machine, because historically such 'restricted shells' had not been very successful at being restrictive. Either you let someone have access or you didn't, and any 'restriction' was generally temporary (such as forcing people to do one time actions on their first login). Used this way, administrative shells worked well enough that many old Unix environments accumulated a bunch of local ones, customized to print various different messages for various purposes.

PS: One trick you could do with some sorts of administrative shells was make them trigger alarms when run. If some people were not really supposed to even try to log in to some machine, you might want to know if someone tried. One reason this is potentially an interesting signal is that anyone who gets as far as running a login shell definitely knows the account's password (or otherwise can pass your local Unix authentication).

(These days I believe this would be considered a form of 'canary token'.)

ShellsAsAccessControl written at 22:28:59; Add Comment


The roots of an obscure Bourne shell error message

Suppose that you're writing Bourne shell code that involves using some commands in a subshell to capture some information into a shell variable, 'AVAR=$(....)', but you accidentally write it with a space after the '='. Then you will get something like this:

$ AVAR= $(... | wc -l)
sh: 107: command not found

So, why is this an error at all, and why do we get this weird and obscure error message? In the traditional Unix and Bourne shell way, this arises from a series of decisions that were each sensible in isolation.

To start with, we can set shell variables and their grown up friends environment variables with 'AVAR=value' (note the lack of spaces). You can erase the value of a shell variable (but not unset it) by leaving the value out, 'AVAR='. Let's illustrate:

$ export FRED=value
$ printenv | fgrep FRED
$ printenv | fgrep FRED
$ unset FRED
$ printenv | fgrep FRED
$ # ie, no output from printenv

Long ago, the Bourne shell recognized that you might want to only temporarily set the value of an environment variable for a single command. It was decided that this was a common enough thing that there should be a special syntax for it:

$ PATH=/special/bin:$PATH FRED=value acommand

This runs 'acommand' with $PATH changed and $FRED set to a value, without changing (or setting) either of them for anything else. We have now armed one side of our obscure error, because if we write 'AVAR= ....' (with the space), the Bourne shell will assume that we're temporarily erasing the value of $AVAR (or setting it to a blank value) for a single command.

The second part is that the Bourne shell allows commands to be run to be named through indirection, instead of having to be written out directly and literally. In Bourne shell, you can do this:

$ cmd=echo; $cmd hello world
hello world
$ cmd="echo hi there"; $cmd
hi there

The Bourne shell doesn't restrict this indirection to direct expansion of environment variables; any and all expansion operations can be used to generate the command to be run and some or all of its arguments. This includes subshell expansion, which is written either as $(...) in the modern way or as `...` in the old way (those are backticks, which may be hard to see in some fonts). Doing this even for '$(...)' is reasonably sensible, probably sometimes useful, and definitely avoids making $(...) a special case here.

So now we have our perfect storm. If you write 'AVAR= $(....)', the Bourne shell first sees 'AVAR= ' (with the space) and interprets it as you running some command with $AVAR set to a blank value. Then it takes the '$(...)' and uses it to generate the command to run (and its command line). When your subshell prints out its results, for example the number of lines reported by 'wc -l', the Bourne shell will try to use that as a command and fail, resulting in our weird and obscure error message. What you've accidentally written is similar to:

$ cmd=$(... | wc -l)
$ AVAR= $cmd

(Assuming that the $(...) subshell doesn't do anything different based on $AVAR, which it probably doesn't.)

It's hard to see any simple change in the Bourne shell that could avoid this error, because each of the individual parts are sensible in isolation. It's only when they combine together like this that a simple mistake compounds into a weird error message.

(The good news is that shellcheck warns about both parts of this, in SC1007 and SC2091.)

BourneShellObscureErrorRoots written at 22:12:44; Add Comment


(Unix) Directory traversal and symbolic links

If and when you set out to traverse through a Unix directory hierarchy, whether to inventory it or to find something, you have a decision to make. I can put this decision in technical terms, about whether you use stat() or lstat() when identifying subdirectories in your current directory, or put it non-technically, about whether or not you follow symbolic links that happen to point to directories. As you might guess, there are two possible answers here and neither is unambiguously wrong (or right). Which answer programs choose depends on their objectives and their assumptions about their environment.

The safer decision is to not follow symbolic links that point to directories, which is to say to use lstat() to find out what is and isn't a directory. In practice, a Unix directory hierarchy without symbolic links is a finite (although possibly large) tree without loops, so traversing it is going to eventually end and not have you trying to do infinite amounts of work. Partly due to this safety property, most standard language and library functions to walk (traverse) filesystem trees default to this approach, and some may not even provide for following symbolic links to directories. Examples are Python's os.walk(), which defaults to not following symbolic links, and Go's filepath.WalkDir(), which doesn't even provide an option to follow symbolic links.

(In theory you can construct both concrete and virtual filesystems that either have loops or, for virtual filesystems, are simply endless. In practice it is a social contract that filesystems don't do this, and if you break the social contract in your filesystem, it's considered your fault when people's programs and common Unix tools all start exploding.)

If a program follows symbolic links while walking directory trees, it can be for two reasons. One of them is that the program wrote its own directory traversal code and blindly used stat() instead of lstat(). The other is that it deliberately decided to follow symbolic links for flexibility. Following symbolic links is potentially dangerous, since they can create loops, but it also allows people to assemble a 'virtual' directory tree where the component parts of it are in different filesystems or different areas of the same filesystem. These days you can do some of this with various sorts of 'bind' or 'loopback' mounts, but they generally have more limitations than symbolic links do and often require unusual privileges to set up. Anyone can make symbolic links to anything, which is both their power and their danger.

(Except that sometimes Linux and other Unixes turns off your ability to make symlinks in some situations, for security reasons. These days the Linux sysctl is fs.protected_symlinks, and your Linux probably has it turned on.)

Programs that follow symbolic links during directory traversal aren't wrong, but they are making a more dangerous choice and one hopes they did it deliberately. Ideally such a program might have some safeguards, even optional ones, such as aborting if the traversal gets too deep or appears to be generating too many results.

PS: You may find the OpenBSD symlink(7) manual page interesting reading on the general topic of following or not following symbolic links.

DirectoryTraversalAndSymlinks written at 23:23:39; Add Comment


The technical merits of Wayland are mostly irrelevant

Today I read Wayland breaks your bad software (via), which is in large part an inventory of how Wayland is technically superior to X. I don't particularly disagree with Wayland's general technical merits and improvements, but at this point I think that they are mostly irrelevant. As such, I don't think that talking about them will do much to shift more people to Wayland.

(Of course, people have other reasons to talk about Wayland's technical merits. But a certain amount of this sort of writing seems to be aimed at persuading people to switch.)

I say that the technical merits are irrelevant because I don't believe that they're a major factor any more in most people moving or not moving to Wayland. At this point in time (and from my vantage point), there are roughly four groups of people still in the X camp:

  • People on Unix environments that don't have Wayland support. They have to use X or not have a graphical experience.

    (Suggesting that these people change to a Linux environment with Wayland support is a non-starter; they are presumably using their current environment for good reasons.)

  • People using mainstream desktop environments that already support Wayland, primarily GNOME and KDE, in relatively stock ways. Switching to Wayland is generally transparent for these people and happens when their Linux distribution decides to change the default for their hardware. If their Linux distribution has not switched the default, there is often good reason for it.

    Most of these people will switch over time as their distribution changes defaults, and they're unlikely to switch before then.

  • People using desktop environments or custom X setups that don't (currently) support Wayland. Switching to Wayland is extremely non-transparent for these people because they will have to change their desktop environment (so far, to GNOME or KDE) or reconstruct a Wayland version of it. Back in 2021, this included XFCE and Cinnamon, and based on modest Internet searches I believe it still does.

    One can hope that some of these desktop environments will get Wayland support over time, moving people using them up into the previous category (and probably moving them to Wayland users). However the primary bottleneck for this is probably time and attention from developers (who by now probably have heard lots about why people think they should add support for Wayland and its technical merits).

  • People who could theoretically switch to Wayland and who might gain benefits from doing so, but who have found good reasons (often related to hardware support) that X works better for them (cf some of the replies to my Fediverse post).

(There are other smaller groups not included here, such as people who have a critical reliance on X features not yet well supported in Wayland.)

With only a slight amount of generalization, none of these people will be moved by Wayland's technical merits. The energetic people who could be persuaded by technical merits to go through switching desktop environments or in some cases replacing hardware (or accepting limited features) have mostly moved to Wayland already. The people who remain on X are there either because they don't want to rebuild their desktop environment, they don't want to do without features and performance they currently have, or their Linux distribution doesn't think their desktop should switch to Wayland yet.

There are still some people who would get enough benefit from what Wayland improves over X that it would be worth their time and effort to switch, even at the cost of rebuilding their desktop environment (and possibly losing some features, because there are things that X does better than Wayland today). But I don't think there are very many of them left by now, and if they're out there, they're hard to reach, since the Wayland people have been banging this drum for quite a while now.

(My distant personal view of Wayland hasn't changed since 2021, since Cinnamon still hasn't become Wayland enabled as far as I know.)

PS: The other sense that Wayland's technical merits are mostly irrelevant is that everyone agrees that Wayland is the future of Unix graphics and development of the X server is dead. Unless and until people show up to revive X server development, Wayland is the only game in town, and when you have a monopoly, your technical merits don't really matter.

WaylandTechnicalMeritsIrrelevant written at 23:04:05; Add Comment


Unix is both a technology and an idea

The thing we call Unix is both a technology (really a set of them) and an idea (really a set of them). Of course the two sides aren't unrelated to each other; in Research Unix through V7, clearly two mutually reinforced each other, with the technology being developed to support the ideas and also, I think, with the technology enabling some further ideas. However, even historically the two were not glued together. People took the Unix ideas and implemented them on other systems (that was part of what made Unix so powerful as an agent of change and a source of ideas), and even in the 1970s and 1980s other people took the technology and used it in contexts where the ideas were not really expressed or exposed.

The ideas of Unix are not just about how things work; they're also about how people should interact with the system. For example, consider Rob Pike's thread about the Unix file model over on the Fediverse, which is at its core all about how people interact with 'files', rather than how the technology works. The ideas of Unix admit a wide variety of ways to interact with Unix, not necessarily through a command line, but the ideas tend to shine through all of them; see, for example, Russ Cox's "A Tour of Acme" (YT). I suspect you can readily see various Unix ideas in GUI form in Acme, although it's a GUI program.

When people talk about 'Unix', they generally mean the technology plus some varying amount of the ideas; how much of the ideas depends on who exactly is talking. When some people talk about Unix, they care very much about all of the ideas, especially including the ideas of how people use and interact with Unix systems (this is Rob Pike's thread again). However sometimes people talking about Unix mean only the technology, or even only really the basic kernel and low level API.

(If we mean just the ideas without the technology, I feel that this is usually phrased in terms of 'inspired by Unix' or the like; 'Unix' as such requires the technology, for all that a system can theoretically be POSIX compliant without being very much Unix.)

Getting confused about which of these two we mean by 'Unix' is one way to have big discussions about whether Android, iOS, or macOS are 'Unix'. In terms of technology, these are all clearly significantly or entirely Unix. In terms of ideas, especially the ideas of how you interact with them, they are often not particularly 'Unix'. Two people can easily talk past each other about whether, eg, Android is Unix; the person meaning Unix in a technology sense says yes, while the person meaning Unix in the ideas plus technology sense says no, and both are right.

(You can also have an argument about how much of the technology is required for something to be Unix in technology. Is using a Unix kernel with a completely custom userland sufficient? The result will have processes and many other Unix technologies, which pushes it to have certain features like an 'init' PID 1 equivalent. How does your view change if there are a bunch of strange new system calls grafted into the kernel and the userland does much of its work with these syscalls? Bear in mind that eg Solaris grafted in doors and significantly used them in its userland.)

UnixTechnologyAndIdea written at 21:19:26; Add Comment


How the rc shell handles whitespace in various contexts

I recently read Mark Jason Dominus's The shell and its crappy handling of whitespace, which is about the Bourne shell and its many issues with whitespace in various places. I'm a long time user of (a version of) Tom Duff's rc shell (here's why I switched), which was written for Research Unix Version 10 and then Plan 9 to (in part) fix various issues with the Bourne shell. You might ask if rc solves these whitespace issues; my answer is that it about half solves them, and the remaining half is a hard to deal with problem area (although the version of rc I use doesn't support some things that would help).

(There's also the Plan 9 from User Space version of rc, which I believe is basically the original Plan 9 rc. The rc reimplementation that I use is mildly augmented and slightly incompatible with the Plan 9 rc.)

As covered in the manual page for the rc I use, shell variables in rc are fundamentally lists made up of one or more items (making the value of a shell variable be a zero length list effectively erases it). Rc draws a distinction between a variable that doesn't exist and a zero-length (empty) list:

; null='' empty=() echo $#null $#empty
1 0

('$#<var>' is how you get how many elements are in a list.)

When shell variables are expanded, they're replaced by their list of items, each of which becomes a separate argument. Given a hypothetical program that reports how many arguments it's been invoked with:

; l=(a b 'space separated')
; numargs $l
; v='some space separated thing'
; numargs $v

This means that rc needs no special handling for '$*', the shell variable of arguments to your shell script (or function within the script). It's a variable that is a list of all of the arguments, and if a particular argument has internal whitespace, that won't be expanded into multiple arguments when it's used. So you can write the following in complete confidence:

for (i in $*) {
  step1 $i
  step2 $i

(Rc provides a way to flatten a list into a space separated single value, if you want to do that, but mostly you don't.)

Similarly, you can safely use '$*' as a whole, as in the 'yell' example from the article:

printf 'I am about to run ''%s'' now!!!\n' $^*
exec $*

(Here we see a rare use for flatting a list to one element, right after I said you mostly don't need it.)

When rc expands filename wildcards, the result is a list where each element is a single filename, even if the filename has whitespace in it. You can assign this to a variable or use it directly in a loop, and either works correctly with no whitespace problems:

for (i in *.jpg) {cp $i /tmp}

But this is where the good news ends, because of good old fashioned Unix conventions for how programs produce (or report) multiple results. Consider the example that Dominus gave of changing the suffix of a bunch of files. In rc, the starting version of this is:

for (i in *.jpeg) {
  mv $i `{suf $i}^.jpg

However this has the same issue as the Bourne shell version. Rc's backquote substitution generates a list from the command's output, and normally it breaks the output into list elements based on whitespace. So if the 'suf' command prints out a result that has whitespace in it, rc will error out in the same way. We can see this in action with:

; l=`{echo 'one two three'}
; echo $#l

To step around this we need to use a special version of rc backquote substitution that specifies the word separator (this is a feature not in the Plan 9 rc, which requires you to change '$ifs'). But there is another trap here with the simple version, which is:

; l=`` () {echo one two three}
; echo $#l $l
1 one two three


We got an extra newline because rc took us at our word; when we said there was no separator, it didn't strip off the final newline that echo added. So to do what we want, we need to have a '$nl' variable with a newline in it and then write:

for (i in *.jpeg) {
  mv $i `` $nl {suf $i}^.jpg

Unfortunately, this won't work if any of the filenames have newlines in them. Fixing that is theoretically possible but much more complex (you need an auxiliary function to reassemble the output of 'suf' into a single variable with newline separation of the components).

Incidentally, this means that rc is worse than the Bourne shell for Dominus's 'lastdl' example, where that program reports the most recent downloaded file and is ideally used as 'something $(lastdl)'. In the Bourne shell you can at least force the right interpretation with a simple 'something "$(lastdl)"'. Since rc doesn't have a simple syntax for this forced single result case, you have to do something more verbose. If you have a '$nl' normally defined in your shell environment, you can write:

something `` $nl {lastdl}

which works but is far from aesthetic or pleasant. Frankly, I'd use Dominus's workaround of making lastdl rename to a safe name, which in the version of rc I use would let me write:

something `lastdl

It might be possible to add some features to rc to make this case a bit easier. For instance, maybe a `^{...} operation could flatten the backquote substitution list down to a space separated single argument (by analogy to rc's '$^<var>' that flattens a shell variable to a single element). Or there could be a special backquote version that takes everything literally but strips the trailing newline.

Another change to the version of rc that I use that could help in industrial strength scripts is the ability to define and use a shell variable that contains the null (zero) byte. In a hypothetical version of rc where this worked, you could then write:

z=`{printf '\0'}
names=`` $z {find whatever ... -print0}

Separation with zero bytes is the de facto Unix standard for safely passing around completely arbitrary filenames (well, file paths), since the zero byte is the one thing that can't appear in them.

(Bash doesn't do any better here, but it at least reports that it's ignoring the null byte when it handles the backquote substitution.)

In the modern Unix world, one practical workaround for many of these issues might be to remove space from your default interactive $IFS (although in a Bourne shell this has consequences for what "$*" expands to). A lot of the time these days space is effectively not a word splitting separator you normally want, because filenames and so on have spaces in them on a regular basis. In a new Unix shell, I would be quite tempted to make the default backquote substitution not split on spaces and have a longer form one that did, although maybe the whole area of backquote substitutions needs some deep thought.

(In a new Unix shell, wildcard filename expansion should definitely not perform word splitting on the result.)

PS: Drew DeVault is working on a shell that is called 'rc', and this shell may solve some of these whitespace problems, based on DeVault's post. However, DeVault's rc doesn't have syntax compatible with the original Duff rc and its reimplementations. I admit that I don't understand how DeVault's rc passes all of these cases based on its current manual page, because it says it splits the result of `{...} backquote substitution on (its) '$ifs', which is said to include all of the usual whitespace.

RcShellWhitespaceHandling written at 23:05:57; Add Comment


Command hashing in Unix shells is probably no longer worth it

Recently, I had an experience with Bash (and not for the first time with this sort of thing):

Dear bash: if you have a path for a command cached and trying to execute that file gets ENOENT, perhaps you should, you know, *check $PATH again*?

Then I endorsed getting rid of shell command hashing.

Once upon a time, Unix computers were slow, system calls were slow, Unix kernels only cached simple things, RAM was small relative to the size of directories, $PATH often contained many directories, and your filesystems were on slow spinning rust. In that environment, 4.2 BSD's csh(1) introduced a feature where it would maintain an internal map (a 'hash') from command names like 'make' to the full path it had found for them in your $PATH. rather than search through $PATH each time. This csh feature is command hashing, and it was copied into (some) later shells, including Bash, so that they all cached the file path of commands.

Any time you have a cache, you have to think about cache invalidation. The csh answer to cache invalidation was that it didn't, or more exactly it gave you a builtin (called 'rehash') to flush the cache and it was up to you to do so when you felt like it, or when you hit an error caused by this. This behavior was then generally copied by later shells, although Bash has a non-default option to sometimes automatically flush a bit of the cache (in the 'checkhash' shopt).

Generally, none of those original early 1980s things are the case any more on modern Unix machines. Machines are fast, system calls are fast, $PATH often contains only a few directories, memory is large compared to the size of things like /usr/bin's on-disk directory (even with the growth in /usr/bin's contents), filesystems are often on fast or very fast disks, and kernels got more sophisticated. Specifically, people realized that name lookups happen so often in Unix that it was worth building in kernel cache structures specifically for them, generally including also negative entries ('this name is not in this directory'); this is Linux's dentry cache (also, and negative dentries) and FreeBSD's kernel name cache. Under typical circumstances, name lookup in frequently used directories is now a very fast operation.

As we know from Amdahl's law, optimizing an operation that's already very fast doesn't provide you with much time savings. In the mean time, shells haven't gotten much better about their command hash cache invalidation, which means that every so often they get things actively wrong, either not running a command when they could or running the wrong command given your $PATH's current state.

DropShellCommandHashing written at 21:49:37; Add Comment


A bit of Unix history on 'su -'

These days, modern versions of su interpret 'su -' as 'su -l' (via). Although they have different implementations of su, this is true on Linux, FreeBSD, and OpenBSD. However, it turns out that this feature wasn't always in Unix's su.

The V7 su(1) is quite minimal, and has no equivalent of '-' or '-l'. In fact, V7 su takes no options at all; it treats its first argument as the username to su to and ignores all other arguments to just run the shell (passing it the nominal program name of 'su', cf). Su became rather more complex in System III, where it gained both 'su -' and the ability to use additional arguments, which were passed to /bin/sh as command line arguments (su also got other changes, such as logging). Based on reading the System III code for su.c, I believe that you could use both 'su -' and 'su login args' together, with the meaning that 'su - login args' would reset $HOME, chdir to the target login's home directory, setuid to them, and then run '/bin/sh args'. Unfortunately I can't find System III manual pages, so I can't see what it documented for su usage.

Su in 4.2 BSD has a more complicated story. The 4.2 BSD su(1) manual page claims the same minimal usage as V7 su. However, the 4.2 BSD su.c code shows that it actually had a much more elaborate usage; su supported both '-' and '-f', as well as command line arguments for the invoked shell. The '-f' option passed '-f' as the first argument to the shell; according to the 4.2 BSD csh(1) manual page, this caused it to skip reading .cshrc. Unlike the System III su, the 4.2 BSD su always used the target login's shell, even when passing arguments to it (this matters when the target has a restricted shell). As with System III, you could combine '-' and additional arguments, 'su - login arg ...'. In 4.3 BSD, the su(1) manual page actually documents all of this. The BSD su didn't change again until 1990's 4.3 BSD Reno where it added a Kerberos focus and '-l' and '-m' options, based on the su(1) manual page we have.

Based on the NetBSD su(1) manual page, the NetBSD su may be the closest to this ancestral *BSD version of su, since NetBSD has retained the -K argument related to Kerberos.

I suspect that the various Linux implementations of su (which have come from at least the shadow package and util-linux) have had a '-l' option for a long time, but I lack the energy to trace the various packages back through history. The Linux su(s) have had additional options for a long time, although in some sense 2006 is probably 'recent' as far as Linux su features are concerned.

SuDashHistory written at 22:44:30; Add Comment


The evolving Unix attitudes on handling signals in your code

Once upon a time, back in V7 Unix or so, Unix signal handling in programs was nominally very simple. You'd set a signal handler with signal(2), and then when it was invoked it would do things, possibly including using longjmp(3) to pop back to the top level of your programs. Among other examples, the Bourne shell famously used SIGSEGV as a memory allocation method. Even today, a lot of programs behave as if this is the signal handling model in effect; you can interrupt your shell, or your pager, or various other things with a Ctrl-C and they'll act like this, popping back to a top level or cleanly stopping their current action while still operating in general (instead of just exiting the way simpler programs do).

In actual reality, even in V7 signal handling could be potentially chancy. The problem with handling signals in the V7 way is that they're interrupts, which means that they can happen at arbitrary points in program execution. If a signal arrives at the wrong point, it will interrupt program execution half-way through doing something that wasn't designed to be interrupted, for example half way through doing a malloc() or free(), and then various havoc can ensue. In V7 I think there weren't all that many critical points like this (in the C library or in programs), and in general the possibility was mostly ignored outside of a few programs that took care to block signals around their critical operations. If something went wrong, the person using the program would deal with it.

(This was in general the V7 way; it was a simple operating system so it had simple implementations that often punted on harder problems.)

To simplify the story, as Unix grew both programs and the C library became more complex, with more complex internal operations going on, and people became less tolerant of flaky programs than they might have been in a simple research operating system. Eventually people began to think about threads, and also about standardizing what signal handlers could legally do as part of POSIX. This resulted in the current situation where POSIX signal handlers are very constrained in what they can legally do, especially in threaded programs. To simplify things, you can call some C library functions (primarily to interact with the operating system), or set a flag, and that's about it. A particular Unix may go beyond the POSIX requirements to make other things safe in signal handlers, and programs may break these requirements and still get away with it most of the time, but today there isn't much you can safely do in a signal handler within the C API.

(A non-C language on Unix may or may not have to restrict itself to the C API behavior in its signal handling, depending on how much it relies on the C library.)

With effort it's still possible to write reliable Unix programs that handle signals and behave as people expect them to. But it's not trivial, and in particular it's not trivial to present an API to programs so that they can handle signals as if they were on V7, with their 'signal handlers' free to do pretty much anything and make broad transfers of control without restriction. For a start, if you offer this API to programs, their signal handlers can't be real signal handlers and by extension you need a runtime to catch the actual signals, set status flags, and then invoke the 'signal handlers' outside of the actual Unix signal delivery.

(This is how (C)Python handles signals, for example. I believe that Go on Linux handles signals outside of the C API, and as part of that manages handling locking and coordination on its own.)

PS: The POSIX signal handler requirements are also only a promise about C (POSIX) API functions, not about what functions in your own program may or may not be safe to call from your signal handlers. If you manipulate data structures or have internal locking in your program, or in libraries that you call, interacting with things safely from within a signal handler is your own responsibility. POSIX makes no promises.

PPS: I'm not sure if restrictions on what signal handlers should do were ever written down before POSIX. The 4.3 BSD sigvec(2) and signal(3) manual pages don't contain any cautions, for example.

Sidebar: threads and signals

Once you introduce threads, many operations in the C library may start requiring locks. Once you have operations taking locks, it becomes quite dangerous to call back in to anything related to those locks in a signal handler. If you get a signal at the wrong time, some thread will attempt to recursively obtain a lock and then probably deadlock. Introducing threads to your C library model forces you to think about locks, deadlocks, and preventing them, and now you can't hand-wave signal safety any more.

(Not that you ever could, but threads make it basically impossible to think you can get away with it, because the failure modes are so obvious.)

PS: I don't know how this interacts with POSIX sigsetjmp() and siglongjmp(), since siglongjmp() is listed as one of the POSIX functions that's safe to call in a signal handler.

SignalHandlingOverTime written at 23:17:45; Add Comment


Where your program's configuration files ('dotfiles') should go today

Over on the Fediverse, Ben Zanin had a question:

Is there a general consensus about the preference order of dotfiles and config directories?

Let's say we have a tool that reads an optional config file & populates/uses a small state file.

1. There is the traditional ~/.example/ dir that would contain `config` and `db`
2. There is the XDG ~/.config/example/config and ~/.local/example/db
3. There is the truly ancient ~/.examplerc / ~/.example.db convention

What is your preferred ordering of these options when checking all, most to least?
[poll elided]

Of course I chimed in with my views:

I voted 2, 1, 3, but really you should only support 2 1 and 3 as a backward compatibility measure if you ever originally supported them. If you're starting from scratch these days with a new program, IMHO you should only support the XDG approach. $HOME/.config is just less clutter.

In my view this isn't about adhering to the XDG standard, it's about getting things out of $HOME. Unix dotfiles were always a (somewhat accidental) hack, and over the years we've accumulated entirely too many of them in our $HOMEs. The XDG option isn't particularly perfect, but it's at least a standard approach and it achieves the goal of getting dotfiles out of $HOME. As a side effect the XDG approach makes things more legible if you look in ~/.config.

(You should look for an explicitly set $XDG_CONFIG_HOME, though, rather than hard-coding $HOME/.config. It's not that much extra work; in shell scripts it's almost free.)

You can raise assorted objections to the XDG standard, but it has two benefits. First, it's better than what we had before, when everyone put things in $HOME. Second, it's something (ie, a single place) that people have broadly agreed on, which matters because that's what keeps us from yet more clutter (cue the famous xkcd comic on standards). In this situation, one imperfect standard is better than a bunch of different groups making their own perfect standards, each of which would give us some new top level directories in your $HOME.

(The XDG Base Directory Specification definitely originated on Linux, but it's applicable to any Unix and I don't think it conflicts with what any of the other free Unixes are doing (or not doing) here. Plus, there's a growing number of programs that are going to use it because they have to pick somewhere to put their files, and the XDG way beats the alternative.)

DotfilesWhereToday written at 22:35:20; Add Comment

(Previous 10 or go back to June 2023 at 2023/06/06)

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.