Wandering Thoughts


Some things that make languages easy (or not) to embed in Unix shell scripts

Part of Unix shell scripting is that Unix has a number of little languages (and interpreters for them) that are commonly embedded in shell scripts to do various things. Shell scripts aren't just written in the Bourne shell; they're effectively written in the Bourne shell plus things like sed and awk, and later more things like Perl (the little language used by jq may in time become routine). However, not all languages become used on Unix this way, even if they're interpreted and otherwise used for shell script like things. Recently it occurred to me that one factor in this is how embeddable the language is in a shell script.

If you're putting together a shell script, your life is a lot easier if the shell script is self-contained and doesn't need any additional files distributed with it (files that it will probably have to know where to find). If you're going to use an additional little language in your shell script, you really want to be able to provide the program in the little language as part of the shell script. Interpreters and languages can make this more or less easy, in two ways.

First and obviously, the interpreter mostly needs to accept a program as a command line argument, not require it to be in a file that the interpreter reads (and most especially not require the file to have a specific extension). There is a way to embed file contents in shell scripts but it will make your shell script's life harder. For many people this will probably push them to shipping the program in a separate file, which in turn will probably push them using a more shell script embedding friendly language.

It's convenient but not essential if the interpreter accepts multiple snippets of program as separate command line arguments. The poster child for this is sed, where you can supply multiple lines of program with multiple -e arguments. Lack of this isn't fatal, as shown by awk, especially if even snippets of the overall program are probably going to be multiple lines in themselves.

Generally, the only practical way to quote a long, multi-line command line argument in the Bourne shell is with single quotes (' .... '); quoting with double quotes ("...") can be done, but you will have heartburn with all sorts of characters. This makes it quite important that a language to be embedded use single quotes as little as possible. If you can't naturally write a program without using single quotes, you'll have problems providing the program as an embedded command line argument in the shell script. If your language wants you to use all of single quotes, double quotes, backslashes, and dollar signs ('$'), you're really going to have heartburn.

(It also helps if your language isn't picky about formatting and indentation, and lets you squeeze a bunch of statements onto a single physical line.)

There is a way to deal with languages that aren't friendly to shell quoting; you can use a here document to create a shell script variable and then supply the environment variable as the program when you invoke the interpreter. For example:

pyprog="$(cat <<'EOF'
python -c "$pyprog" ...

However, this is more awkward than doing the equivalent in awk. This awkwardness acts as friction that pushes people away from using such awkward languages in shell scripts. If they do use them, it's more natural to put the program in a separate file and ship the shell script and the separate file (which will go into some known location, and so on).

ShellScriptLanguageEmbedding written at 21:39:23; Add Comment


Getting a Bourne shell "here document" into a shell variable

Suppose, for reasons to be discussed in a later entry, you would like to write a shell script that turns an embedded here document into a shell variable (ie, an unexported environment variable). As a preview, one reason to want to do this is that here documents allow almost arbitrary contents, while other forms of getting things into environment variables or command line arguments may block using certain characters or require awkward quoting.

It turns out that in a modern Bourne shell, this is straightforward with command substitution (normally using a quoted here document delimiter, because otherwise various things will get expanded on you):

yourvar="$(cat <<'EOF'
All of your content goes here.
Any characters and sequences are
fair game, like $, ", ', and
no expansion of eg $HOME or
stuff like $(hostname).

This is more straightforward than I was expecting, and as far as I can tell there are no surprises and traps here. The resulting variable has the contents you expect and can be used normally as you need it.

(I haven't tried this with the old style command substitution using `...` instead of $(...), but I'd expect it to fail in various ways because the backtick style command substitution historically had lots of problems with escaping things. Also, there's not much reason to use the backtick style today.)

In Bash (or any Bourne shell that supports process substitution (Wikipedia)), you can also provide a program a command line argument that's a file that comes from a here document embedded into the shell script:

awk -f <(cat <<'EOF'
BEGIN { .... }

If you're going this far, extra arguments to the program go after the process substitution:

awk -f <(cat <<'EOF'
) /tmp/file1 /tmp/file2

I suggest trying hard to not need to do this in order to keep your sanity; this is situation like the rules for combining things with here documents where keeping it simple is much better (cf the sensible way to use here documents in pipelines). Because this is limited and somewhat tangled, I would only use it in a situation where I absolutely had to provide a program with a file (with the desired contents) instead of being able to provide things as a command line argument.

(The resulting script would be a Bash script instead of a general Bourne shell script, but sometimes this is fine.)

BourneShellHereDocToVariable written at 21:46:11; Add Comment


How to talk to a local IPMI under OpenBSD

Much like Linux, modern versions of OpenBSD are theoretically able to talk to a suitable local IPMI using the standard ipmi(4) kernel driver. This is imprecise although widely understood terminology; in more precise terms, OpenBSD can talk to a machine's BMC (Baseboard Management Controller) that implements the IPMI specification using one of a number of standard interfaces, as covered in the "System Interfaces" section of ipmi(4). However, OpenBSD throws us a curve ball in that the ipmi(4) driver is normally present in the default OpenBSD kernel but not enabled.

If the ipmi driver is present but not enabled and your machine nas an IPMI that OpenBSD can talk to, the kernel boot messages will report something like:

ipmi at mainbus0 not configured

If you don't see any mention of 'ipmi' in the boot messages and you're using a normal kernel, your machine almost certainly doesn't have a recognized IPMI and you can stop here. If you do see this 'not configured' message, you most likely have an IPMI that OpenBSD can talk to and you now need to enable the IPMI driver.

If you're using OpenBSD 7.0 or later, you enable the driver by creating or editing the file /etc/bsd.re-config (see bsd.reconfig(5)) to contain:

enable ipmi

(This will often be the only line in bsd.re-config, partly because the file format doesn't allow comments.)

After you've set up bsd.re-config, you need to reboot at least once and perhaps twice. After this the kernel will recognize your IPMI with messages that look something like this:

ipmi0 at acpi0: version 2.0 interface KCS iobase 0xca8/8 spacing 4
ipmi at mainbus0 not configured 
iic0: skipping sensors to avoid ipmi0 interactions

(You may not see the iic0 message.)

In OpenBSD 6.9 and previous versions there is no bsd.re-config, so you need to manually create a new kernel image with config(8) that has the ipmi driver specifically enabled. A typical usage would be (with 'ukc>' being the prompts from config):

# config -e -o /bsd.new /bsd
ukc> enable ipmi
[some messages about it]
ukc> quit
# mv /bsd /bsd.last && mv /bsd.new /bsd
# reboot

(Then you'll see the same sort of kernel messages as in OpenBSD 7.0.)

Unfortunately using config(8) this way conflicts with OpenBSD's KARL kernel relinking. Enabling the ipmi driver this way will survive reboots (or it has so far for me), but it will apparently be lost if you use syspatch to apply at least kernel patches and perhaps any patch.

Once your IPMI is configured under any OpenBSD version, you can do at least two new things. The first is that you can see IPMI sensors in 'sysctl hw.sensors', usually under hw.sensors.ipmi0. OpenBSD seems to be able to read IPMI sensors quite readily and without delays, which is a nice change from the usual Linux situation. The output of this on one of our machines looks like:

hw.sensors.ipmi0.temp0=26.00 degC (CPU Temp), OK
hw.sensors.ipmi0.temp1=32.00 degC (PCH Temp), OK
hw.sensors.ipmi0.fan0=9800 RPM (FAN1), OK
hw.sensors.ipmi0.volt0=12.29 VDC (12V), OK
hw.sensors.ipmi0.volt1=5.12 VDC (5VCC), OK
hw.sensors.ipmi0.indicator0=Off (Chassis Intru), OK

(Unfortunately, the Prometheus host agent currently doesn't read and report any of the hw.sensors sysctls. As always, the sensors you get will vary between server models and not all of them may make sense or be valid.)

The second thing is that you can install and use ipmitool, with it working the same as on Linux (and probably other *BSDs). Ipmitool comes from OpenBSD's ports collection and can be added with pkg_add. Once installed it will automatically use the /dev/ipmi0 device that OpenBSD has set up and everything just works. This can let you take an OpenBSD machine's IPMI from an unconfigured state to being up and on your management network without having to take the machine down into BIOS (although you do have to reboot at least once).

(In theory, you can also do things like control what will happen to the machine if power goes out and then comes back on. Your mileage may vary as to whether your BMC really supports this portion of IPMI and it works right.)

OpenBSDLocalIPMI written at 23:22:32; Add Comment


A reason why Unix programs sometimes support '-?' for help

I recently read Clayton Craft's -h --help -help help --? -? ???? (via). In part of it, Craft mentions '--? / -?' as help options, and says about them:

???? I have no idea where these came from, but my guess is that they are migrants from the wild west Windows-land, where I assume the shell won't try to expand ? into anything. [...]

I think there's a different and far more Unixy explanation, and it's our friend getopt(3), and the Bourne shell getopts equivalent. Both getopt(3) and getopts return errors, such as unrecognized options, through what we could call in-band signalling, instead of using an additional return value (both C and normal Bourne shell don't handle multiple return values very easily). Classical getopt(3) normally returns the latest option character for you to parse; when it hits an option character you don't accept, it instead returns a special marker character. This marker character is '?'. Shell getopts follows the same approach (although in the shell case, you might match on any otherwise un-handled option character).

(GNU getopt_long() has its own conventions for this return value.)

As a result, Unix programs using getopt (or shell scripts using getopts) can't have '-?' as a valid command line option for anything meaningful, because there would be no way to tell a real '-?' apart from an error. As a consequence of this, it's almost always safe to run a program as 'program -?'; no matter how large and weird its collection of command line option letters is (ls is famous for using almost all of them, including '-h'), it won't be using '-?' and so running it that way is a generally safe way to get some sort of usage message (and an error).

Once people start running 'program -?' to get a usage message, programs themselves have an incentive to make '-?' print a longer help message, and perhaps to list it in getopt() or getopts as a valid option, so that people no longer get "invalid option -- '?'" messages or the like when they're doing it deliberately.

(Since getopt() itself generates the 'invalid option' message, people will still get this for genuinely invalid command line options; listing '?' as a valid option only affects whether you get the message for 'program -?'.)

This gives you the situation where some programs accept '-?' for help (and probably then accept '--?' because why not), and some sources of advice suggest running programs as 'program -?' to at least get a basic usage message to remind you and perhaps some help too.

PS: In normal Unix shells, there's no problem using '-?' as a command line argument even though '?' is a filename wildcard character. Normally, non-matching wildcard characters are passed through intact as argument characters. Tcsh behaves differently, but if you use tcsh as your shell that's up to you.

GetoptQuestionOptionForHelp written at 21:20:42; Add Comment


What goes into an X resource and its name

Most people who deal with X resources, me included, generally deal with them at a relatively superficial level. At this level, you can say that X resources are a text based key/value database, with the name (key) of every resource being a composite name that specifies both its program and some program specific name (although there are conventions for the name portion). But if you start to look at the actual names for X resources, things start looking a little more odd.

For example, all of the following are X resource names (and values), from this entry, this entry, and this entry:

XTerm*VT100.scrollbar.width: 24
URxvt*thickness: 24
Xft.dpi: 163
XTerm*VT100.Translations: <... elided ...>

! This isn't from an entry (yet):
XTerm*SimpleMenu*menuLabel.font: <... elided XLFD name ...>
XTerm*SimpleMenu*font: <...>

What is really going on is that the 'name' portion of an X resource is not a name as such but what I will call a selector, by analogy to CSS selectors. Every resource you can set has a fully qualified name, written as 'a.b.c.d' (for some number of components), with most of the components of the name being determined by the inner structure of the specific program involved. Rather than forcing you to find and write out the fully qualified name, the X resource system lets you shorten things. A '.' separates components, a '*' covers any number of components, and the rarely seen '?' means a single, arbitrarily named component.

(For gory details, including precedence rules, see XrmGetResource(3), Resource File Syntax (also), and the RESOURCES section of X(7).)

However, even component names are complicated because in X resources, programs and components have both a general class and a specific (resource) name (which sort of defaults to the class name under many situations). You can use either of them when writing resource names; conventionally, class names are capitalized ('XTerm') and (resource) names are lower case ('xterm'). This is the mechanism used to let you have multiple options for the same program (as I mentioned was theoretically possible); you set a general version using the class of XTerm and then more specific versions under special application names (in xterm and many other programs, you change the application name with a '-name whatever' command line argument). For example:

XTerm*foreground:    black
rootterm*foreground: DarkRed

This is why a lot of program documentation about X resources often tells you both the name and the class of a resource; for example, see the table of X resources for GNU Emacs. The class names for programs are not necessarily predictable from their actual names, although there are conventions. A good X program will tell you in its manual page; a bad one will leave you to ask your window manager for it (or pry it out with xprop).

At this point you may be wondering where the 'within the program' component names come from. The unfortunate answer is that normally they come from the program's widget structure. In X, 'widgets' mostly means X toolkit intrinsics (Xt) and Athena widgets. The idea is that most X programs would be composed of a nested series of widgets (some standard and some written for the program), and the full name of resources would be determined by tracing through that widget structure to the eventual end widget to be configured. If you wanted to configure some setting for a lot of the program's widgets at once, for example if you wanted to set the general background or the font, you used a wildcard to match everything.

(How normal Unix users are supposed to discover a program's widget hierarchy is a question that X never really answered. The closest we got is that good programs more or less documented it in their manual pages, if you read them carefully and understood enough about X resources.)

So, you might ask, how does xterm know that when you write the resource 'XTerm*font: <something>' that you only want to set the main terminal font, not the font used for its popup menus? After all, in theory the wildcard should make it match every font in every widget in xterm. The short answer is 'magic'. You can also ask how you, the person writing X resource settings for xterm, know that this won't set every font in every nook and cranny of xterm. The practical answer is that you don't know; you try setting the resource, and if things blow up you go and revert it.

(Sometimes this doesn't work. See the caution in GNU Emacs' X resources about not writing 'emacs*geometry: ...' to set the initial GNU Emacs size.)

There are a number of problems with this naming model. One of them is that the names people are using to configure portions of your program with resources may well be tied to the current internal structure of your program. If xterm someday wants to move away from using 'SimpleMenu' widgets for its popup menus, everyone who is currently configuring their menu fonts is going to lose (unless xterm goes out of its way to implement some backward compatibility thing). Another of them is that all of this hierarchy and special resource handling only really works well if you use Xt-based widgets for your program. Also, a whole bunch of command line argument handling related to resources only works well if you hand over initial processing of your program's command line arguments to the Xt library. X programs written in other languages often have limited handling of X resources and 'standard' X command line arguments, since you more or less have to work in C or C++ in order to really use Xt.

(See the OPTIONS section of X(7) for all of the things covered by standard Xt argument processing.)

Related to this is the fun issue that resource names can change depending on what options people run your program with, because some options can change the widget structure your program uses, which changes resource names. The xterm manual page has a great example of this in its section on VT100 Widget Resources. It's no wonder that a great many people superstitiously use '*' as the component separator whether or not they need it; it saves them from having to read through the program's manual page to try to figure out the widget structure, and from changes to that structure.

(However, nothing can save you if the program changes the class name or resource name it uses for resources. If 'Monster' gets renamed to 'OwlBrowser', either OwlBrowser perpetually uses the wrong class name for its resources or all of those 'Monster*...' X resources that people have stop working.)

Because of the X resource system's deep entanglement with Xt, programs that don't use Xt often make very minimal use of resources and when they do use resources, they tend to have little or no internal hierarchy (which results in resource names like 'Program.thing'). Such programs are much closer to using X resources as a simple 'program plus option name' key/value store, rather than the vague Xt idea that it would be a hierarchical configuration system.

Sidebar: xterm's SimpleMenu, a lesson in X resource naming

Xterm has popup menus, which are implemented as Athena SimpleMenu widgets, with the class name 'SimpleMenu'. So the following sounds like it should set the font used for the menus:

XTerm*SimpleMenu.font: <...>

Except it doesn't work. A SimpleMenu doesn't directly have a font resource, because it doesn't directly display menu entries. Menu entries are displayed in SmeBSB objects, which do have a font resource. But there are a bunch of SimpleBSB entries in every xterm SimpleMenu popup menu, so you need a wild card match to cover all of them. Hence:

XTerm*SimpleMenu*font: <...>

To understand all of this, I had to read fairly deep into the Athena documentation.

To actually control my xterm popup menu fonts, I did not do this. I did an Internet search for how to do it and copied what the web page said to do without even trying to understand which wildcards were necessary, or even what was going on at all. In practice, X resource names are black boxes that are copied by superstition from working examples.

XResourcesNaming written at 23:27:02; Add Comment


The failure of the idea of X resources

Since the beginning (or very close to it), X has had a resource system (also) to allow central specification of various parameters for programs (at multiple levels). X resources can be used to set options and to customize programs, and one of the things they usually allow you to control is the program's choice of fonts to use. In theory you would think that a central place like your X resources would be a great answer to needing to change all sorts of fonts for programs when you move to HiDPI displays. In practice it's never really worked out this way, and I have some thoughts on that.

One of the problems with X resources is that they're arcane and hard to manage. Even for experienced X people, putting many program options into scripts and window manager configurations has often been the easier way. X resources don't have good answers if you want the same program set up in a few different ways (several setups of xterm, for example), it's generally a complex dance to actually update them in your running session, and it's up to you to keep them organized, among other things. I probably use X resources more than most people who still use them at all, and my usage has steadily been cut back over the years because they're just not very nice.

(X resources can sort of be used to provide different options for different invocations of the same program, but how to do it isn't necessarily obvious, it requires additional command line options to the program, and even programs that use X resources don't always support it.)

A bigger issue with X resources is that by themselves they only solve one level of the issue. They may give me a central place to put configuration information, but (in practice) they require me to configure each program separately. There is no indirection in X resources, not natively, where I could set a 'default monospace font' and a 'default non-monospace font' and have this propagate through to everything. In the Unix way this was "solved" with an extra layer of indirection; the usual process of loading X resources into your session can run the source files through the C preprocessor, so you can use #define to create global settings. But you still have to specify that each program you want to control should use whatever global setting for, say, its font. If you miss a program (or a setting), things fall through to defaults.

(X resources also got badly affected by the migration from XLFD to XFT fonts. In the XLFD era, everything specified fonts in the same way, and often using the same resource name. With XFT fonts, things got much more varied. Some programs specify the XFT font name and size together in one resource, eg 'Monospace-12', while others split it into two resources, font name and font size, and some require you to tell them you want an XFT font, eg 'xft:Monospace-12'. The resource names aren't necessarily consistent either. This makes any sort of global setting through #defines rather more annoying.)

Modern Unix settings systems do tend to have such global defaults, although part of that is that they often see themselves as part of a coherent system of programs (Gnome programs, KDE programs, etc), and a coherent system naturally has a system-wide default.

Another issue is that X resources are a product of their time (to be charitable), and that time is long passed. Modern systems tend to integrate some degree of documentation for the settings, more or less live changes, some degree of discoverability and hierarchy, and exposes a GUI for changing things, for good reasons. X resources assume that the manual pages and a text editor are good enough (I know, real mavens read through /usr/share/X11/app-defaults too, never mind that many of those files have settings that aren't intended to be user serviceable).

In some ways I do like my X resources; at least they're in some text files that I can look at, they let me set xterm's behavior across all of the machines I use it on, and I can pull out the live settings from the X server if I want to. But in other ways I think I would be better off if everything used gsettings or gconf or something, with defaults and the ability to navigate through the keys and interact with values.

XResourcesFailure written at 23:38:18; Add Comment


Where cut comes into Unix (and a bit on the history of awk)

The cut command is in some ways one of those little Unix oddities, because in many ways (although not all of them) it duplicates the functionality of awk. Both commands have been part of my Unix landscape for long enough that I don't think about where they come from, but today I wound up curious about cut's history.

Awk famously comes from V7 Unix, and is one of the signature Unix programs introduced there (see the Wikipedia entry for more). By contrast, cut comes from System III and may have at least partially reached the rest of the world through being part of POSIX (as per the Wikipedia entry). In the BSD line, cut seems to have taken until 4.3BSD-Reno to show up, around 1990, although I think that commercial Unix vendors who used BSD Unix, such as Sun, might have added it earlier.

The motivations for adding cut to System III aren't clear, but System III itself is a mix of various other early Unixes, some of which predate V7 (PWB/Unix started out based on V6, for example). It's possible that cut was written for one of these early internal AT&T Unixes that were based on something before V7 and so didn't have awk. Alternately, some of the 'line of business' work that System III and other early Unixes were used for needed to deal with files that had fixed character positions but not useful awk-style whitespace separated field divisions.

(For what it's worth, the System III cut manpage specifically mentions 'character positions as on a punched card' as an example for fields.)

PS: paste(1) also seems to have first appeared in System III, unsurprisingly.

CutCommandHistory written at 23:12:09; Add Comment


The problem of keeping track of hardlinks as you traverse a directory tree

One of the reactions to my discovery about rsync not handling hardlinks by default was surprise that memory was an issue of concern when doing this. After all, you might ask, how hard is the problem and how much memory are we talking about? That depends on what sort of program you are and how thorough a job you want to do.

A program like du has the simplest case. GNU du's normal behavior is to ignore hardlinks after the first time it sees them, so all it needs to do is keep track of enough information to identify a file with more than one link if it sees it again. Normally this is the file's inode number and device. Modern Unixes use 64-bit inode numbers and generally 64-bit device numbers, so you're looking at 16 bytes per hardlinked file. Since you don't want to do a linear search of a potentially very large array every time you encounter a hardlinked file, you're also going to need additional space for some sort of random access data structure. It's probably not worth keeping track of the link count, especially since it's also a 64-bit integer on at least one popular platform (Linux); you're probably losing more space than you gain back by being able to drop entries when you've seen all their links. However, all of this space usage is relatively trivial by modern standards.

(You can reduce the memory needs in most situations by having a two-level data structure, where you give each separate device number its own inode lookup structure. This reduces the per-file information down to one 64-bit number from two, since you're storing the device number only once instead of once per hardlinked file. Usually you'll only be dealing with a few device numbers so this is a real win. It does complicate the code a bit. I don't know if GNU du uses this approach.)

A program like rsync or tar has a harder problem, because it also needs to keep track of the full file name for some instance of the hardlinked file (normally it will keep track of the first one). Full file names can be long, especially if you have something annoying like a deeply nested directory structure, and given the additional memory usage you now probably do want to also store the link count so you can delete entries once you no longer need them.

(The good news is that you don't need to look things up by file name, so you can continue to only have a fast random access data structure for the inode number plus the device. And you can still optimize the handling of device numbers, although this is no longer so much of a memory saving on a percentage basis.)

The worst case for both du style and rsync style programs is when you are working with a directory tree where all files have hardlinks that go outside the tree. In this case, both programs wind up permanently keeping track of all files in the tree in RAM; even if you track link count, you'll never see all of the links for any file so you can never drop any entries. Such directory trees can happen in a variety of situations, including space-saving backups that hardlink unchanged files between backup trees.

For scale, a large root filesystem on one of our compute servers has around 2.5 million files with the text of their full file names amounting to just over 200 Mbytes. On our NFS fileservers, the filesystem with the most files has around 7 million of them and a total full filename text size on the order of 665 Mbytes. On modern machines with multiple gigabytes of RAM that are often mostly unused, programs like du and rsync should be able to handle even the very unlikely worst case situation without blowing up. In more constrained environments, they might start to have issues.

(This was sparked by reddit comments on my rsync entry.)

HardlinksTrackingProblem written at 22:30:16; Add Comment


Unix environment variables (and the environment) are a fuzzy thing

Unix environment variables generally look like a straightforward thing initially, but I have recently been reminded that they are actually somewhat more fuzzy and under-defined in practice than you might think.

Generally speaking, the kernel API's only requirement is that the environment be an array of null-terminated strings, generally of a limited total size. Further interpretation of the contents of these strings is left up to user level programs. Almost everything interprets these strings as the names and values of 'environment variables', with the name and value separated by an '='. Although the kernel API allows for strings of 'STRING' or 'STRING=', I think most Unix programs will either ignore them or give you odd results if you ask about them.

Given the 'name=value' format expected of the environment, in theory the only character you can't put in the name is an '=' (and a null, which can't be put in environment variables at all). In practice most Unix shells limit what characters they will accept in the names of environment variables down to a relatively small set. POSIX probably sets a minimum requirement on this but I haven't looked it up (okay, now I have, it's here). Other programs that manipulate the environment (or create it from scratch) may be more liberal about what characters they allow. Unix shells (and other programs) may or may not pass through such oddly named environment variables, but not counting on it is probably your wisest course.

(There's no way of quoting environment variable names or special characters in them, although there could be. Probably no one's ever seen the need.)

By convention, the names of environment variables are in upper case. This is only a convention; pretty much every Unix shell is happy to deal with lower case environment variables. It's a social expectation among Unix people that pretty much all officially documented environment variables are in upper case (which is to say, environment variables that are part of the API of your system). I suspect that people think of lower case environment variables as being for local, internal use only, at best.

Many programs will interact with the environment (to the extent that they do) through the C library getenv() function, and will inherit any quirks, limitations, or peculiarities that it has. Some, like Python, can have additional restrictions like character set encoding issues (see os.environ). Others, like Go, have their own implementation that's independent of the C library one.

Dynamically linked programs almost always use the standard C library runtime loader (even if they're written in other languages), and on most Unixes that will check environment variables through the C library getenv() and similar functions. In programs that are setuid or otherwise executing in what the runtime loader and the C library think is a special situation, this may result in the C library sanitizing the environment in various ways.

EnvironmentVariablesAreFuzzy written at 23:24:01; Add Comment


The history (sort of) of service management in Unix

It's common for sophisticated Unix init systems to also be some degree of service management systems; the most obvious example is Linux's systemd. However, many people have observed that it doesn't have to be this way and have created separate systems for this, such as D. J. Bernstein's daemontools. Since service management (or lack of it) has become one of the important areas of Unix init systems, you might wonder why they've come to have this responsibility. A significant part of the reason is history, although there are also pragmatic reasons.

(I also think that it's what people want. System administratorys mostly don't want to have to deal with an init system and then a separate service supervision system; they want to deal with one thing.)

Specifically, for a long time Unix didn't have any sort of service management as such, beyond init restarting getty processes. All services were simply started as part of the boot process in what started as a very simple script and grew only somewhat from there in BSD Unixes. If you needed to check the status of services, you ran ps; if you needed to restart a service, you terminated it with kill and started the new version by hand. The System V init system moved this forward somewhat by creating scripts that encapsulated the knowledge of how to start, stop, and sometimes check the status of each service, but it did nothing to manage the services as such; it still merely booted (and shut down) the system. Noticing that a service's daemons had died and starting them again was up to you.

(In System V init you could theoretically use /etc/inittab for restarting daemons, but the overall init system environment didn't support doing it this way.)

Historically, starting services was considered intertwined with the process of booting Unix. Starting from when Sun introduced "diskless" NFS based workstations and other people copied them, some daemons needed to be started and running before /usr could be mounted. You couldn't defer starting all services until the system was 'up', but at the same time you couldn't just start all services in a bunch and be done with it, because many of them required filesystem mounts and the like. This entanglement of starting daemons and booting the system made putting everything in boot scripts the natural way forward from the mid 1980s onward. A daring Unix vendor could have introduced a separate services system (Sun eventually sort of did in SMF), but it still would have been deeply entwined in the boot process and thus the init system if it was going to handle all daemons and services on the system.

(Third party systems such as djb's daemontools generally had a simpler job because they weren't envisioned as handling all daemons and services; they were just going to handle some of them, such as djb's other programs such as qmail and tinydns.)

In practice, Unix vendors in the 1990s were not daring. Instead, they were busy fighting with each other (see OSF/1 versus System V release 4) and getting run over by the march of the cheap. The free Unixes did no better; the free BSDs were busy being faithful to the purity of UCB BSD 4.x, and Linux was hard at work building a Unix from scratch (and perhaps was not inclined to depart from various versions of what was 'Unix' at the time as a result of the 'Linux is not Unix' controversies of the time).

(This is a somewhat grumpy summary of the situation, since the free BSDs did make major changes in their init setups in practice. But for whatever reason, none of them changed drastically into a separate services manager setup, although daemontools and other implementations shows that the idea was definitely around in the open source Unix community. Possibly one problem is that Solaris SMF wasn't a good system.)

PS: I wrote up a somewhat different version of this history some years ago in How init wound up as Unix's daemon manager. Rereading that, I see that in writing this entry I forgot how the addition of networking in BSD Unix complicated system boot and daemon startup, because now you needed the network configured before some daemons got started.

ServiceManagementHistory written at 22:50:20; Add Comment

(Previous 10 or go back to February 2022 at 2022/02/19)

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.