Wandering Thoughts archives

2023-11-23

Unix's 'test' program and the V7 Bourne shell

Recently I read Julio Merino's test, [, and [[ (via), which is in part about there being a real '[' binary and a 'test' binary to go along with it, and as part of that, Merino wonders why the name 'test' exists at all. I don't have any specific insight into this, but I can talk a bit about the history, which turns out to be more tangled and peculiar than I thought.

The existence of 'test' goes back to V7 Unix, which is also where the Bourne shell was introduced. In V7, the manual page for the program is test(1), which has no mention the '[' alternate name, and the source is cmd/test.c, which has a comment at the start about the '[' usage and code to support it. While 'test' is a much easier name to deal with in Unix than '[', there seems to be more to this than just convenience. There are a number of shell scripts, Makefiles, and so on in V7 Unix, and as far as I can tell all of them use 'test' and none of them use '['.

(For example, bin/nohup, bin/calendar, bin/lookbib, and usr/src/cmd/learn/makefile.)

Another source of information is S. R. Bourne's An Introduction to the Unix Shell (also PDF version and the V7 troff sources). In section 2.5, Bourne introduces the 'test' command under that name, and then goes on to use it with 'while' (section 2.6) and 'if' (section 2.7). As far as I can see there's no mention of the '[' alternate name.

In trawling through various sources of information, I can't actually find any clear sign that V7 ever had a '[' hard link for 'test'. The test source code is definitely ready for this, but such a hard link doesn't exist. 4BSD has a src/cmd/DESTINATIONS file that suggests that /usr/bin/[ existed at this point (along side /usr/bin/test), but that's the earliest trace I could find. In 4.1c BSD we finally have clear evidence of /usr/bin/[ in the form of src/bin/Makefile, which explicitly creates it as a hard link to /usr/bin/test.

However, there's something rather interesting in the V7 Bourne shell source code, in the form of vestigial, disabled support for a '[' builtin. In msg.c, there is a commented out section toward the bottom:

[...]
SYSTAB  commands {
      {"cd",          SYSCD},
      {"read",        SYSREAD},
/*
      {"[",           SYSTST},
*/
      {"set",         SYSSET},
[...]

Then in xec.c there's commented out code that would have handled SYSTST in the execute() function:

[...]
    case SYSREAD:
        exitval=readvar(&com[1]);
        break;

/*
    case SYSTST:
        exitval=testcmd(com);
        break;
*/
[...]

There's no actual 'testcmd()' function in the V7 Bourne shell source code, but we can guess what it might have done.

Given this disabled code and that the V7 'test' itself supported being used as '[', it seems possible that this syntax was Bourne's preference. It's possible that the builtin '[' was implemented and then removed in favor of '[' being a hardlink to 'test', and then for whatever reason other people in Bell Labs didn't use it and V7 wasn't distributed with such a hardlink set up (although individual installs could make it themselves and it appears that the result would work). However, this may have been the other way around, per this HN comment, with Bourne preferring the 'test' form over the '[' form.

As it happens, I don't think the 'test' command (and its syntax) appeared from nowhere in V7; instead I believe we can trace it to antecedents in V6 Unix. But that's going to take another entry to discuss, since this one is already long enough.

V7TestAndBourneShell written at 23:24:29; Add Comment

2023-11-07

The Vim features that make me a Vim user instead of a Vi user

Over on the Fediverse there was a little Vim versus Vi discussion, and in response to seeing it I posted something:

I used to be a minimal vi user. Over the years I've drifted to being a not so minimal vim user, and I think the vim features that I'm now addicted to are:

  • infinite undo and redo (and a tree view of undo)
  • unlimited backspacing in insert mode (true vi only lets you backspace so far)
  • vim windows, which let me have multiple files on screen at once (this used to be vi's big limit versus emacs)
  • recently, visual mode, both line and character.

(I use other vim things but these matter to me.)

(For example, vim settings for YAML and incrementing and decrementing numbers.)

Back in 2020 I wrote about realizing that I was now a Vim user, citing Vim's powerful undo and Vim windows; the other things I mentioned are new in my awareness since then. Unlimited backspacing in insert mode is one of those Vim features that are so instinctively right that I didn't realize (or remember) that classical Vi is rather more restricted that way, much like unlimited undo.

(OpenBSD vi only lets you backspace in insert mode within the current insertion, and my vague memory is that classical Vi may not have let you back up to previous lines even within a single insertion.)

Vim's visual mode is more specialized and limited, but for the kind of editing that I do it's turned out to be quite convenient, enough so that I use it regularly and would miss it if I had to do without it.

OpenBSD's vi is probably the closest I come today to a pure old fashioned Vi experience. I can definitely edit files in it without problems (I do every so often), and I often don't notice any difference from Vim if I'm editing a single file for straightforward changes (where I only need to undo simple mistakes immediately), which is the typical case for what I do on our OpenBSD machines. However, if I only had Vi and not Vim I probably wouldn't use vi(m) as much as I do today; I'd be much more likely to reach for other editors for multi-level undo and split screen editing of multiple files (with the ability to move text from one file to the other).

(I'd probably still use vi a lot, because the forces pushing me to it are fairly strong and I was using 'vim as vi' well before I started using 'vim as vim'.)

PS: I know that there are people who like the appeal of the simple (and BSD-pure) original Vi, but I'm not a Unix purist these days. I'm lazy; unlimited (and sophisticated) undo, backspacing as much as I want, multiple windows, and so on are all quite convenient (with very little effort on my part, and they work in all the many environments I use vim in). I use Vim instead of Vi for much the same reason that I now have file and command completion in my shell.

(I might feel differently about this if I'd been a heavy Vi user and was very used to its specific quirks, but I only started seriously using vi(m) in the Vim era.)

VimFeaturesThatHookedMe written at 23:19:23; Add Comment

2023-10-24

dup()'s shared file IO offset is a necessary part of Unix

In a recent entry I noted dup() somewhat weird seeming behavior that the new file descriptor you get from dup() (and also from dup2(), its sometimes better sibling) shares the file's IO offset with the original file descriptor. This behavior is different from open()'ing the same file again, where you get a file descriptor with an independent file IO offset (sometimes called the seek offset or the seek position). In discussing this on the Fediverse, I wondered if this was only a convenient implementation choice. The answer, which I should have realized even at the time, is that dup()'s shared IO offset is a necessary part of Unix pipelines (especially in the context of older Unixes, such as V7 Unix).

Consider the following illustrative shell pipeline:

$ (cmd1 | cmd2 | cmd3) 2>/tmp/errors-log

Here we want to redirect any errors from these commands (and any sub-things they run) into /tmp/errors-log. We want all of the errors, with them in errors-log in the order they were printed by the various commands (which is not necessarily pipeline order; cmd3 could write some complaints before cmd2 did, for example).

If the shell opens /tmp/errors-log once and dup()'s the resulting file descriptor to standard error for cmd1, cmd2, and cmd3, this is exactly what you get, and it's because of that shared file IO offset. Every time any of the commands writes to standard error, they advance the offset of the next write() for all of the commands at once. Today you could get the same effect for writes with O_APPEND, but that wasn't in V7 Unix

The shared offset also makes setting up standard input easier in some shell situations. Consider this:

$ (cmd1; cmd2; cmd3) <input-file

Implementing this without dup()'s shared IO offset would require that the parent shell set up standard input once, before it started forking children, so that it could pass the same file descriptor to all of them. With dup(), the parent can merely open input-file and then leave it to each child to dup() it on to standard input at an appropriate time.

There's a closely related idiom that also requires these dup() semantics even in a single process. Consider:

$ command >/tmp/out 2>&1

You want both standard output and standard error in the same file, interleaved in the order they were written, but in the child process these are necessarily two different file descriptors. You need them to share the IO offset anyway, which is achieved by dup()'ing one to the other (in a specific order, also).

Even without these dup() semantics, sharing the file IO offset of the same (inherited) file descriptor between processes is basically essential. Consider:

$ make >/tmp/output

Make will write to standard output and it will pass its own standard output file descriptor on to children (ie, all of the commands that get run from your Makefile) unchanged. All of the writes by all of the various processes to each individual file descriptor 1 have to all share an IO offset, or they'd repeatedly write over each other at the start of the file.

(You can create similar but more contrived examples with standard input coming from a file.)

Before I started writing this entry, I don't think I appreciated how important Unix's separation of the file IO offset from file descriptors is, or how deep it goes.

DupSharedOffsetNecessary written at 23:25:41; Add Comment

2023-10-22

Unix /dev/fd and dup(2)

I recently read Amber Screen's A tale of /dev/fd (via) which notes an odd behavior of Linux's /dev/fd and in fact of FreeBSD's /dev/fd as well, where if you try to open a /dev/fd/N name for a file descriptor, it does permissions checks and may refuse a process permissions to re-open a file descriptor it already has open. Screen writes:

To the dispassionate hacker (or a reader of Stevens), it's pretty clear that a syscall like open("/dev/fd/5", O_RDONLY) should be similar to dup(5). [...]

'Similar' is an extremely important word here, because you probably don't actually want this to act as if you called dup() on the original file descriptor. But we'll get to that.

A narrow Linux-specific reason that opening /dev/fd/N forces permissions re-checks is probably partly because /dev/fd is a symbolic link to /proc/self/fd, which is a self-referential instance of a general Linux /proc facility that allows you to open what any process's file descriptor points to, provided that you have the necessary permissions. This includes file descriptors for things that are now deleted (I've used this to rescue a file that I accidentally removed but a process still had open). You obviously need to do permissions checks on this in general, so it would take extra code to special case re-opening your own file descriptors.

Somewhat more broadly, you also need some permissions checks no matter what, because the re-open of /dev/fd/N (aka /proc/self/fd/N) could be for a different mode than the file descriptor is currently open for. If your process currently has a file descriptor opened read only and you try to re-open it for write through /dev/fd/N (or /proc/self/fd/N), that had better check if you can actually write to the file. At the same time you probably do want to allow mode changes like this for /dev/fd/N provided that the underlying file permissions (and other situation) allow it, partly because programs may require it for some files they open.

(This isn't just for re-opening files with write permissions when you started with read permission. You might also have file descriptors opened only for write that you now want to read.)

The reason that you don't want /dev/fd/N to do a dup() is that dup()'d file descriptors share more things with each other than separately opened file descriptors on the same file. For one prominent example (noted in both Linux dup(2) and FreeBSD dup(2)), the file offset is shared between all dup()'d file descriptors. If one such descriptor reads, writes, or lseeks, all file descriptors now act on the new file offset. This is fine if a program expects this behavior (or reasonably should), because it obtained the file descriptor through a dup() or dup2(). It's quite possibly not fine if the program gets gifted this behavior simply because it open()'d what it sees as a separate file name; in fact, this behavior would be more or less in contradiction to open()'s normal specification (which promises you an independent file object). So you don't want to implement /dev/fd/N by directly doing a dup() on the given file descriptor if the modes match; you need to do something more complicated.

(To re-iterate Amber Screen's own quote, they aren't advocating for /dev/fd/N literally being a dup() of the file descriptor. As I read it, they are advocating for similar behavior with minimal permissions checking if required because you're changing the mode. Otherwise this /dev/fd/N re-open would make a new and separate copy of what POSIX calls the 'open file description' (see Linux open(2)'s discussion of this).)

DevFdAndDup written at 21:56:48; Add Comment

2023-10-13

OpenBSD PF-based firewalls suffer differently from denial of service attacks

Suppose, hypothetically, that you have some DNS servers that are exposed to the Internet behind an OpenBSD PF-based firewall. Since you're a sensible person, you have various rate limits set in your DNS servers to prevent or at least mitigate various forms of denial of service attacks. One day, your DNS servers become extremely popular for whatever reason, your rate limits kick in, and your firewall abruptly stops allowing new connections in or out. What on earth happened?

The answer is that you ran out of room in the PF state table. OpenBSD PF mostly works through state table entries, and when a rule that normally would create a new state table entry is unable to do so, the packet is dropped. This is somewhat documented in places like the max option for stateful rules:

Limits the number of concurrent states the rule may create. When this limit is reached, further packets that would create state are dropped until existing states time out.

(That this is more or less explicitly documented is better than it once was.)

One of the reasons that you can run out of state table entries despite your DNS servers dutifully rate-limiting their responses is that DNS is primarily UDP based and so PF doesn't really know if a given UDP 'connection' is 'closed' and so should have its state table entries cleaned up more aggressively. Instead, all PF does for UDP is guess timeouts based on packet counts, and those packet counts are for each unique set of source IP, source port, destination IP, and destination port. If your DNS query sources vary their source port for each query, this can add up fast.

(As we've seen, even TCP connections can linger in the state table for some time after they're closed.)

The current OpenBSD 7.3 manual page for pf.conf says that the default maximum size of the state table is only 100,000 entries, which is often effectively 50,000 'connections' (it's not uncommon for each connection to create two state table entries). It doesn't take a huge amount of bandwidth or a huge packets per second rate to exhaust that many state table entries, and it mostly doesn't matter whether or not your DNS servers actually respond to the queries.

That may sound odd so let's cover it explicitly. PF has three states for UDP traffic; 'first' if the source has only sent one packet, 'multiple' if both ends have sent packets, ie your DNS server responded, and 'single' if the source has sent multiple packets (with the same source port) without a response, ie your DNS server is dropping their queries and they're retrying. The first two states default to 60 second timeouts and the third defaults to a 30 second timeout, and that's after packets stop flowing. A DNS query source that keeps re-sending its query every fifteen seconds (with the same source port) will keep even a 'single' state entry alive forever.

As far as I can see, the only really good way to limit states created by UDP traffic is to set a max option on the rules involved. Often this will cover only half of the states created by this traffic (for reasons covered in my entry on state table entries). You can try to limit the number of source IPs and states per IP that can be created (and do so across relevant rules), but it's hard to come up with sensible numbers for both that won't block legitimate traffic while also not letting people blow out your state table.

(I assume without checking that you can set all of max, max-src-nodes, and max-src-states, and then have the total number of state entries limited by max instead of the product of the latter two. This could be useful if you want some per-IP firewall limits in addition to the total state limit, perhaps to insure that one or a few IPs can't eat up all of the total allowed states.)

All of this is surprising if you're thinking of rate limiting and denial of service issues from the normal perspective of services on your hosts (such as DNS servers, or even web servers). In the host services world, if you reject or drop traffic through rate limiting, you're done with the traffic and you don't need to worry further (okay, yes, SYN cookies for TCP connection attempt traffic floods, but most things do that automatically today). But your OpenBSD PF firewall is still keeping state for that traffic your host rate-limited or dropped, and that state can (and will) add up, especially for UDP traffic.

OpenBSDPfStatesAndDoS written at 23:02:37; Add Comment

2023-10-05

X's two ways to send events to X clients (more or less)

Once upon a time, the X11 protocol was shiny and new. One small part of the protocol (and the client library) was a way for one client program to send X events, such as key or mouse presses (or more routinely expected things)), to another program. In Xlib, this is XSendEvent() (also). When the target X client receives this event, it will have a special flag set in the X event structure to signal that it comes from a SendEvent request from another client. Such events are normally called synthetic events, because they were created synthetically by another X client, instead of naturally by the X server.

X11 wasn't (and isn't) the most secure windowing system (in an access control sense), with it being pretty easy for people to connect X clients to your X server session. Partly because of this, X programs like xterm either started out being able to ignore such synthetic events (for key and mouse events, at least) or soon added this feature. As covered in the xterm manual page, this is allowSendEvents and is described this way:

Specifies whether or not synthetic key and button events (generated using the X protocol SendEvent request) should be interpreted or discarded. The default is "false" meaning they are discarded. Note that allowing such events would create a very large security hole, therefore enabling this resource forcefully disables the allowXXXOps resources. The default is "false".

If you hang around people who automate things in their X session, you may have heard of xdotool. If you've tried it, you may have noticed that xdotool seems pretty successful in manipulating the windows of X programs, despite the general feelings about SendEvents, and so you might wonder what's going on here. The answer is that xdotool (and other automation programs) use a second mechanism to inject synthetic events, the XTEST extension (protocol). The original purpose of this extension is, to quote its documentation:

This extension is a minimal set of client and server extensions required to completely test the X11 server with no user intervention.

Events injected through XTEST don't carry the 'SendEvents mark of shame', and so programs like xterm won't automatically reject them. However, due to its origins (and also probably security concerns), XTEST has certain limitations, and so in some circumstances xdotool has to fall back to (X)SendEvent(s) and suffer the mark of shame and perhaps having things not work.

A nice description of the situation is in the xdotool manual page's SENDEVENT NOTES section:

If you are trying to send key input to a specific window, and it does not appear to be working, then it's likely your application is ignoring the events xdotool is generating. This is fairly common.

Sending keystrokes to a specific window uses a different API than simply typing to the active window. If you specify 'xdotool type --window 12345 hello' xdotool will generate key events and send them directly to window 12345. However, X11 servers will set a special flag on all events generated in this way (see XEvent.xany.send_event in X11's manual). Many programs observe this flag and reject these events.

It is important to note that for key and mouse events, we only use XSendEvent when a specific window is targeted. Otherwise, we use XTEST.

If you don't target a specific window and use XTEST, it's like you typed the keys at your keyboard. The 'typed' keys go to whatever window has keyboard focus at the time xdotool runs, or into the void if no window has keyboard focus at the time. With SendEvent you can type the keys to a specific identified window no matter what else is going on, but interesting programs will probably ignore you.

(Even if programs don't ignore you because of the SendEvent mark, they may ignore you for other reasons. For example, Gnome Terminal appears to accept SendEvent keyboard input, but only if it currently has keyboard focus.)

(This entry is a variation on part of something I wrote recently on the fvwm mailing list.)

XTwoWaysToSendEvents written at 22:53:47; Add Comment

2023-09-29

Understanding the NMH repl command's '-cc me' and '-nocc me' options

Suppose, not hypothetically, that you use NMH as your mail client and that you would like to cc: yourself on all of the mail you send; this is what I do. It's relatively easy to set this up for the NMH comp command, which creates new messages. There are a number of approaches and it's easy to understand all except the most complex ones, and since you have to create the complex ones yourself, presumably anyone who can set it up knows what they're doing. However, understanding what you can do and how it works with the NMH repl command for replying to mail is not so straightforward or helpful, and for years I've been not really understanding what I was doing with it.

(Conventionally NMH people who want to keep a copy of all of their email use 'Fcc:' to automatically file a copy in an NMH folder, but this has various issues similar to the IMAP Sent folder situation.)

When repl replies to a message, it has a set of options to control what additional addresses are included in the message; these are the '-cc all/to/cc/me' and their -nocc versions. What '-cc me' does seems conceptually simple (and it sounds like what I want), but the actual reality of both what it does and how it works is not. If you look at the repl 'replcomps' or 'replgroupcomps' file (the default one is normally in /etc/nmh), you will see a complex mh-format tangle. What this tangle does for the cc: header is it unconditionally takes a bunch of different (potential) sets of addresses and puts them into a bucket, de-duplicating addresses as it goes:

%(formataddr{to}) %(formataddr{cc}) %(formataddr(localmbox))

In effect, what repl does to implement -cc and -nocc is that it filters the resulting collection of addresses (I don't know if this is the literal implementation). An address will be in the actual reply's cc: only if it was in To:, cc: or your local address and the relevant command line switch was given (an option not selected defaults to off, so a bare '-cc to' is normally equivalent to '-cc to -nocc cc -nocc me'). So the repl components file is always trying to cc you (that's the 'localmbox' bit), but whether or not it succeeds depends on if '-cc me' is in effect.

If you specify '-cc me', then your local address (your primary address) is guaranteed to be included in the cc: list. Normally it will be included only once, even if it already appears in To:, cc:, or both; the addresses will be de-duplicated as they're added to the collection, including the addition at the end of 'localmbox' (your local address). If '-nocc me' is in effect, both your local address and any Alternate-Mailboxes addresses from your mh-profile. will be filtered out of cc: (and in fact To: too).

Unfortunately, repl provides no way to either remove only your alternate mailboxes (while still cc'ing your primary one) or not cc: your primary mailbox if any of your alternate mailboxes are already cc'd. You can either cc: yourself and any alternate mailboxes that are already present, or remove everything; alternate mailboxes effectively only get used for '-nocc me', not for '-cc me'. If you almost never want to use '-nocc me', this means that it may not be too useful to list your alternate mailboxes in your .mh_profile. There's also no way to leave any existing cc's of yourself intact while not adding a new one if there wasn't one already; either '-nocc me' will remove any existing ones, or '-cc me' will add one if it's not already there.

(If you try to strip your alternate mailboxes with '-nocc me -cc me', the '-nocc me' has no effect; it turns off '-cc me' and then you turn it back on.)

Because 'repl -group' implies '-cc all' as the default, which includes '-cc me', normally repl will explicitly add a cc: to you in all such group replies, in addition to whoever the original message was explicitly to or cc'd to. As mentioned earlier, there's no real neutral option even without this default '-cc all', since anything not explicitly mentioned as a -cc is (explicitly) filtered out. Repl just doesn't want to leave this alone.

(If you want repl to leave this alone and never want to cc yourself, you need a custom components file that removes the '%(formataddr(localmbox))' bit, and then always supply '-cc me' either implicitly or explicitly to keep repl from removing your addresses from the to and cc address lists.)

If you just want to get an exact copy of the same email everyone else got but don't care about your address appearing in the headers, you can probably use the under-documented NMH 'Dcc:' header, which does what normal mail agents use 'Bcc:' for. NMH has a 'Bcc:', but that doesn't get you a verbatim copy of the message because NMH has opinions there (see send and especially post). You'll need custom 'replcomps' and 'replgroupcomps' files that add 'Dcc: ...' headers. If you do this you probably want to use '-nocc me' all the time, to strip your addresses from the explicit headers.

(I recently did a bunch of experimentation to understand this as part of trying to improve my MH-E environment (cf), so I want to write it down while I remember it. In the end I was able to hack together some elisp magic for MH-E to mostly do what I want, although the result is imperfect.)

NMHReplAndCcMe written at 22:51:36; Add Comment

2023-09-19

How Unix shells used to be used as an access control mechanism

Once upon a time, one of the ways that system administrators controlled who could log in to what server was by assigning special administrative shells to logins, either on a particular system or across your entire server fleet. Today, special shells (mostly) aren't an effective mechanism for this any more, so modern Unix people may not have much exposure to this idea. However, vestiges of this live on in typical Unix configurations, in the form of /sbin/nologin (sometimes in /usr/sbin) and how many system accounts have this set as their shell in /etc/passwd.

The normal thing for /sbin/nologin to do when run is to print something like 'This account is currently not available.' and exit with status 1 (in a surprising bit of cross-Unix agreement, all of Linux, FreeBSD, and OpenBSD nologin appear to print exactly the same message). By making this the shell of some account, anything that executes an account's shell as part of accessing it will fail, so login (locally or over SSH) and a normal su will both fail. Typical versions of su usually have special features to keep you from overriding this by supplying your own shell (often involving /etc/shells, or deliberately running the /etc/passwd shell for the user). Otherwise, there is nothing that prevents processes from running under the login's UID, and in fact it's extremely common for such system accounts to be running various processes.

Unix system administrators have long used this basic idea for their own purposes, creating their own fleet of administrative shells to, for example, tell you that a machine was only accessible by staff. You would then arrange for all non-staff logins on the machine to have that shell as their login shell (there might be such logins if, for example, the machine is a NFS fileserver). Taking the idea one step further, you might suspend accounts before deleting them by changing the account's shell to an administrative shell that printed out 'your account is suspended and will be deleted soon, contact <X> if you think this is a terrible mistake' and then exited. In an era when everyone accessed your services by logging in to your machines through SSH (or earlier, rlogin and telnet), this was an effective way of getting someone's attention and a reasonably effective way of denying them access (although even back then, the details could be complex).

(Our process for disabling accounts gives such accounts a special shell, but it's mostly as a marker for us for reasons covered in that entry.)

You could also use administrative shells to enforce special actions when people logged in. For example, newly created logins might be given a special shell that would make them agree to your usage policies, force them to change their password, and then through magic change their shell to a regular shell. Some of this could be done through existing system features (sometimes there was a way to mark a passwd entry so that it forced an immediate password change), but generally not all of it. Again, this worked well when you could count on people starting using your systems by logging in at the Unix level (which generally is no longer true).

Sensible system administrators didn't try to use administrative shells to restrict what people could do on a machine, because historically such 'restricted shells' had not been very successful at being restrictive. Either you let someone have access or you didn't, and any 'restriction' was generally temporary (such as forcing people to do one time actions on their first login). Used this way, administrative shells worked well enough that many old Unix environments accumulated a bunch of local ones, customized to print various different messages for various purposes.

PS: One trick you could do with some sorts of administrative shells was make them trigger alarms when run. If some people were not really supposed to even try to log in to some machine, you might want to know if someone tried. One reason this is potentially an interesting signal is that anyone who gets as far as running a login shell definitely knows the account's password (or otherwise can pass your local Unix authentication).

(These days I believe this would be considered a form of 'canary token'.)

ShellsAsAccessControl written at 22:28:59; Add Comment

2023-09-10

The roots of an obscure Bourne shell error message

Suppose that you're writing Bourne shell code that involves using some commands in a subshell to capture some information into a shell variable, 'AVAR=$(....)', but you accidentally write it with a space after the '='. Then you will get something like this:

$ AVAR= $(... | wc -l)
sh: 107: command not found

So, why is this an error at all, and why do we get this weird and obscure error message? In the traditional Unix and Bourne shell way, this arises from a series of decisions that were each sensible in isolation.

To start with, we can set shell variables and their grown up friends environment variables with 'AVAR=value' (note the lack of spaces). You can erase the value of a shell variable (but not unset it) by leaving the value out, 'AVAR='. Let's illustrate:

$ export FRED=value
$ printenv | fgrep FRED
FRED=value
$ FRED=
$ printenv | fgrep FRED
FRED=
$ unset FRED
$ printenv | fgrep FRED
$ # ie, no output from printenv

Long ago, the Bourne shell recognized that you might want to only temporarily set the value of an environment variable for a single command. It was decided that this was a common enough thing that there should be a special syntax for it:

$ PATH=/special/bin:$PATH FRED=value acommand

This runs 'acommand' with $PATH changed and $FRED set to a value, without changing (or setting) either of them for anything else. We have now armed one side of our obscure error, because if we write 'AVAR= ....' (with the space), the Bourne shell will assume that we're temporarily erasing the value of $AVAR (or setting it to a blank value) for a single command.

The second part is that the Bourne shell allows commands to be run to be named through indirection, instead of having to be written out directly and literally. In Bourne shell, you can do this:

$ cmd=echo; $cmd hello world
hello world
$ cmd="echo hi there"; $cmd
hi there

The Bourne shell doesn't restrict this indirection to direct expansion of environment variables; any and all expansion operations can be used to generate the command to be run and some or all of its arguments. This includes subshell expansion, which is written either as $(...) in the modern way or as `...` in the old way (those are backticks, which may be hard to see in some fonts). Doing this even for '$(...)' is reasonably sensible, probably sometimes useful, and definitely avoids making $(...) a special case here.

So now we have our perfect storm. If you write 'AVAR= $(....)', the Bourne shell first sees 'AVAR= ' (with the space) and interprets it as you running some command with $AVAR set to a blank value. Then it takes the '$(...)' and uses it to generate the command to run (and its command line). When your subshell prints out its results, for example the number of lines reported by 'wc -l', the Bourne shell will try to use that as a command and fail, resulting in our weird and obscure error message. What you've accidentally written is similar to:

$ cmd=$(... | wc -l)
$ AVAR= $cmd

(Assuming that the $(...) subshell doesn't do anything different based on $AVAR, which it probably doesn't.)

It's hard to see any simple change in the Bourne shell that could avoid this error, because each of the individual parts are sensible in isolation. It's only when they combine together like this that a simple mistake compounds into a weird error message.

(The good news is that shellcheck warns about both parts of this, in SC1007 and SC2091.)

BourneShellObscureErrorRoots written at 22:12:44; Add Comment

2023-09-07

(Unix) Directory traversal and symbolic links

If and when you set out to traverse through a Unix directory hierarchy, whether to inventory it or to find something, you have a decision to make. I can put this decision in technical terms, about whether you use stat() or lstat() when identifying subdirectories in your current directory, or put it non-technically, about whether or not you follow symbolic links that happen to point to directories. As you might guess, there are two possible answers here and neither is unambiguously wrong (or right). Which answer programs choose depends on their objectives and their assumptions about their environment.

The safer decision is to not follow symbolic links that point to directories, which is to say to use lstat() to find out what is and isn't a directory. In practice, a Unix directory hierarchy without symbolic links is a finite (although possibly large) tree without loops, so traversing it is going to eventually end and not have you trying to do infinite amounts of work. Partly due to this safety property, most standard language and library functions to walk (traverse) filesystem trees default to this approach, and some may not even provide for following symbolic links to directories. Examples are Python's os.walk(), which defaults to not following symbolic links, and Go's filepath.WalkDir(), which doesn't even provide an option to follow symbolic links.

(In theory you can construct both concrete and virtual filesystems that either have loops or, for virtual filesystems, are simply endless. In practice it is a social contract that filesystems don't do this, and if you break the social contract in your filesystem, it's considered your fault when people's programs and common Unix tools all start exploding.)

If a program follows symbolic links while walking directory trees, it can be for two reasons. One of them is that the program wrote its own directory traversal code and blindly used stat() instead of lstat(). The other is that it deliberately decided to follow symbolic links for flexibility. Following symbolic links is potentially dangerous, since they can create loops, but it also allows people to assemble a 'virtual' directory tree where the component parts of it are in different filesystems or different areas of the same filesystem. These days you can do some of this with various sorts of 'bind' or 'loopback' mounts, but they generally have more limitations than symbolic links do and often require unusual privileges to set up. Anyone can make symbolic links to anything, which is both their power and their danger.

(Except that sometimes Linux and other Unixes turns off your ability to make symlinks in some situations, for security reasons. These days the Linux sysctl is fs.protected_symlinks, and your Linux probably has it turned on.)

Programs that follow symbolic links during directory traversal aren't wrong, but they are making a more dangerous choice and one hopes they did it deliberately. Ideally such a program might have some safeguards, even optional ones, such as aborting if the traversal gets too deep or appears to be generating too many results.

PS: You may find the OpenBSD symlink(7) manual page interesting reading on the general topic of following or not following symbolic links.

DirectoryTraversalAndSymlinks written at 23:23:39; Add Comment

2023-08-31

The technical merits of Wayland are mostly irrelevant

Today I read Wayland breaks your bad software (via), which is in large part an inventory of how Wayland is technically superior to X. I don't particularly disagree with Wayland's general technical merits and improvements, but at this point I think that they are mostly irrelevant. As such, I don't think that talking about them will do much to shift more people to Wayland.

(Of course, people have other reasons to talk about Wayland's technical merits. But a certain amount of this sort of writing seems to be aimed at persuading people to switch.)

I say that the technical merits are irrelevant because I don't believe that they're a major factor any more in most people moving or not moving to Wayland. At this point in time (and from my vantage point), there are roughly four groups of people still in the X camp:

  • People on Unix environments that don't have Wayland support. They have to use X or not have a graphical experience.

    (Suggesting that these people change to a Linux environment with Wayland support is a non-starter; they are presumably using their current environment for good reasons.)

  • People using mainstream desktop environments that already support Wayland, primarily GNOME and KDE, in relatively stock ways. Switching to Wayland is generally transparent for these people and happens when their Linux distribution decides to change the default for their hardware. If their Linux distribution has not switched the default, there is often good reason for it.

    Most of these people will switch over time as their distribution changes defaults, and they're unlikely to switch before then.

  • People using desktop environments or custom X setups that don't (currently) support Wayland. Switching to Wayland is extremely non-transparent for these people because they will have to change their desktop environment (so far, to GNOME or KDE) or reconstruct a Wayland version of it. Back in 2021, this included XFCE and Cinnamon, and based on modest Internet searches I believe it still does.

    One can hope that some of these desktop environments will get Wayland support over time, moving people using them up into the previous category (and probably moving them to Wayland users). However the primary bottleneck for this is probably time and attention from developers (who by now probably have heard lots about why people think they should add support for Wayland and its technical merits).

  • People who could theoretically switch to Wayland and who might gain benefits from doing so, but who have found good reasons (often related to hardware support) that X works better for them (cf some of the replies to my Fediverse post).

(There are other smaller groups not included here, such as people who have a critical reliance on X features not yet well supported in Wayland.)

With only a slight amount of generalization, none of these people will be moved by Wayland's technical merits. The energetic people who could be persuaded by technical merits to go through switching desktop environments or in some cases replacing hardware (or accepting limited features) have mostly moved to Wayland already. The people who remain on X are there either because they don't want to rebuild their desktop environment, they don't want to do without features and performance they currently have, or their Linux distribution doesn't think their desktop should switch to Wayland yet.

There are still some people who would get enough benefit from what Wayland improves over X that it would be worth their time and effort to switch, even at the cost of rebuilding their desktop environment (and possibly losing some features, because there are things that X does better than Wayland today). But I don't think there are very many of them left by now, and if they're out there, they're hard to reach, since the Wayland people have been banging this drum for quite a while now.

(My distant personal view of Wayland hasn't changed since 2021, since Cinnamon still hasn't become Wayland enabled as far as I know.)

PS: The other sense that Wayland's technical merits are mostly irrelevant is that everyone agrees that Wayland is the future of Unix graphics and development of the X server is dead. Unless and until people show up to revive X server development, Wayland is the only game in town, and when you have a monopoly, your technical merits don't really matter.

WaylandTechnicalMeritsIrrelevant written at 23:04:05; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.