Wandering Thoughts archives


The origin of POSIX as I learned the story (or mythology)

I recently wound up rereading Jeremy Allison's A Tale of Two Standards (via Everything you never wanted to know about file locking), which tells an origin story for the POSIX standard for Unix where it was driven by ISVs wanting a common 'Unix' API that they could write their products to so they'd be portable across all the various Unix versions. It's quite likely that this origin story is accurate, and certainly the divergence in Unixes irritated ISVs (and everyone else) at the time. However, it is not the origin mythology for POSIX that I learned during my early years with Unix, so here is the version I learned.

During the mid and late 1980s, the US government had a procurement problem; it wanted to buy Unix systems, but perfectly sensible procurement rules made this rather hard. If it tried to simply issue a procurement request to buy from, say, Sun, companies like SGI and DEC and so on would naturally object and demand answers for how and why the government had decided that their systems wouldn't do. If the government expanded the procurement request to include other Unix vendors so they could also bid on it (saying 'any workstation with these hardware specifications' or the like), people like IBM or DEC would demand answers for why their non-Unix systems wouldn't do. And if the government said 'fine, we want Unix systems', it was faced with the problem of actually describing what Unix was in the procurement request (ideally in a form that was vendor neutral, since procurement rules frown on supposedly open requests that clearly favour one vendor or a small group of them).

This government procurement problem is not unique to Unix, and the usual solution to it is a standard. Once the government has a standard, either of its own devising or specified by someone else, it can simply issue a procurement request saying 'we need something conforming to standard X', and in theory everyone with a qualifying product can bid and people who don't have such a product have no grounds for complaint (or at least they have less grounds for complaint; they have to try to claim you picked the wrong standard or an unnecessary standard).

Hence, straightforwardly, POSIX, and also why Unix vendors cared about POSIX as much as they did at the time. It wasn't just to make the life of ISVs easier; it was also because the government was going to be specifying POSIX in procurement bids, and most of the Unix vendors didn't want to be left out. In the process, POSIX painstakingly nailed down a great deal of what the 'Unix' API is (not just at the C level but also for things like the shell and command environment), invented some genuinely useful things, and pushed towards creating and standardizing some new ideas (POSIX threading, for example, was mostly not standardizing existing practice).

PS: You might wonder why the government didn't just say 'must conform to the System V Interface Definition version N' in procurement requests. My understanding is that procurement rules frown on single-vendor standards, and that was what the SVID was; it was created by and for AT&T. Also, at the time requiring the SVID would have left out Sun and various other people that the government probably wanted to be able to buy Unixes from.

(See also the Wikipedia entry on the Unix wars, which has some useful chronology.)

POSIXOriginStory written at 20:51:12; Add Comment


Shell builtin versions of standard commands have drawbacks

I'll start with a specific illustration of the general problem:

bash# kill -SIGRTMIN+22 1
bash: kill: SIGRTMIN+22: invalid signal specification
bash# /bin/kill -SIGRTMIN+22 1

The first thing is that yes, this is Linux being a bit unusual. Linux has significantly extended the usual range of Unix signal numbers to include POSIX.1-2001 realtime signals, and then can vary what SIGRTMIN is depending on how a system is set up. Once Linux had these extra signals (and defined in the way they are), people sensibly added support for them to versions of kill. All of this is perfectly in accord with the broad Unix philosophy; of course if you add a new facility to the system you want to expose it to shell scripts when that's possible.

Then along came Bash. Bash is cross-Unix, and it has a builtin kill command, and for whatever reason the Bash people didn't modify Bash so that on Linux it would support the SIGRTMIN+<n> syntax (some possible reasons for that are contained in this sentence). The results of that are a divergence between the behavior of Bash's kill builtin and the real kill program that have become increasingly relevant now that programs like systemd are taking advantage of the extra signals to allow you to control more of their operations by sending them more signals.

Of course, this is a generic problem with shell builtins that shadow real programs in any (and all) shells; it's not particularly specific to Bash (zsh also has this issue on Linux, for example). There are advantages to having builtins, including builtins of things like kill, but there are also drawbacks. How best to fix or work around them isn't clear.

(kill is often a builtin in shells with job control, Bash included, so that you can do 'kill %<n>' and the like. Things like test are often made builtins for shell script speed, although Unixes can take that too far.)

PS: certainly one answer is 'have Bash implement the union of all special kill, test, and so on features from all Unixes it runs on', but I'm not sure that's going to work in practice. And Bash is just one of several popular shells, all of whom would need to keep up with things (or at least people probably want them to do so).

BashKillBuiltinDrawback written at 21:40:28; Add Comment


A clever way of killing groups of processes

While reading parts of the systemd source code that handle late stage shutdown, I ran across an oddity in the code that's used to kill all remaining processes. A simplified version of the code looks like this:

void broadcast_signal(int sig, [...]) {
   kill(-1, SIGSTOP);

   killall(sig, pids, send_sighup);

   kill(-1, SIGCONT);

(I've removed error checking and some other things; you can see the original here.)

This is called to send signals like SIGTERM and SIGKILL to everything. At first the use of SIGSTOP and SIGCONT puzzled me, and I wondered if there was some special behavior in Linux if you SIGTERM'd a SIGSTOP'd process. Then the penny dropped; by SIGSTOPing processes first, we're avoiding any thundering herd problems when processes start dying.

Even if you use kill(-1, <signal>), the kernel doesn't necessarily guarantee that all processes will receive the signal at once before any of them are scheduled. So imagine you have a shell pipeline that's remained intact all the way into late-stage shutdown, and all of the processes involved in it are blocked:

proc1 | proc2 | proc3 | proc4 | proc5

It's perfectly valid for the kernel to deliver a SIGTERM to proc1, immediately kill the process because it has no signal handler, close proc1's standard output pipe as part of process termination, and then wake up proc2 because now its standard input has hit end-of-file, even though either you or the kernel will very soon send proc2 its own SIGTERM signal that will cause it to die in turn. This and similar cases, such as a parent waiting for children to exit, can easily lead to highly unproductive system thrashing as processes are woken up unnecessarily. And if a process has a SIGTERM signal handler, the kernel will of course schedule it to wake up and may start it running immediately, especially on a multi-core system.

Sending everyone a SIGSTOP before the real signal completely avoids this. With all processes suspended, all of them will get your signal before any of them can wake up from other causes. If they're going to die from the signal, they'll die on the spot; they're not going to die (because you're starting with SIGTERM or SIGHUP and they block or handle it), they'll only get woken up at the end, after most of the dust has settled. It's a great solution to a subtle issue.

(If you're sending SIGKILL to everyone, most or all of them will never wake up; they'll all be terminated unless something terrible has gone wrong. This means this SIGSTOP trick avoids ever having any of the processes run; you freeze them all and then they die quietly. This is exactly what you want to happen at the end of system shutdown.)

ProcessKillingTrick written at 02:42:54; Add Comment


System shutdown is complicated and involves policy decisions

I've been a little harsh lately on how systemd has been (not) shutting down our systems, and certainly it has some issues and it could be better. But I want to note that in general and in practice, shutting down a Unix system is a complicated thing that involves tradeoffs and policy decisions; in fact I maintain that it's harder than booting the system. Further, the more full-featured you attempt to make system shutdown, the more policy decisions and tradeoffs you need to make.

(The only way to make system shutdown simple is to have a very minimal view of it and to essentially crash the running system, as original BSD did. This is a valid choice and certainly systems should be able to deal with abrupt crashes, since they do happen, but it isn't necessarily a great one. Your database can recover after a crash-stop, but it will probably be happier if you let it shut down neatly and it may well start faster that way.)

One of the problems that makes shutdown complicated is that on the one hand, stopping things can fail, and on the other hand, when you shut down the system you want and often need for it to actually go down, so overall system shutdown can't fail. Reconciling these conflicting facts requires policy decisions, because there is no clear universal technical answer for what you do if a service shutdown fails (ie the service process or processes remain running), or a filesystem can't be unmounted, or some piece of hardware says 'no, I am not detaching and shutting down'. Do you continue on with the rest of the shutdown process and try again later? Do you start killing processes that might be holding things busy? What do you do about your normal shutdown ordering requirements, for example do you block further services and so on from shutting down just yet, or do you continue on (and perhaps let them make their own decisions about whether they can shut down)?

There are no one size fits all answers to these questions and issues, especially if the init system is essentially blind to the specific nature of the services involved and treats them as generic 'services' with generic 'shutdown' actions. Even in an init system where the answers to these questions can be configured on a per-service or per-item basis, someone has to do that configuration and get it right (which may be complicated by an init system that doesn't distinguish between the different contexts of stopping a specific service, which means that you get to pick your poison).

While it's not trivial, it's not particularly difficult for an init system to reliably shut down machines if and when all of the individual service and item shutdowns go fine and all of the dependencies are fully expressed (and correct), so that everything is stopped in the right order. But this is the easy case. The hard case for all init systems is when something goes wrong, and many init systems have historically had issues here.

(Many implementations of System V init would simply stall the entire system shutdown if an '/etc/init.d/<whatever> stop' operation hung, for example.)

PS: One obvious pragmatic question and problem is how and when you give up on an orderly shutdown of a service and (perhaps) switch over to things like killing processes. Services may legitimately take some time to shut down, in order to flush out data, close databases properly, and so on, but they can also hang during shutdown for all sorts of reasons. This is especially relevant in any init system that shuts down multiple services in parallel, because each service being shut down could suddenly want a bunch of resources.

(One of the fun cases is where you have heavyweight daemons that are all inactive and paged out of RAM, and you ask them to do an orderly shutdown, which suddenly causes everything to try to page back in to your limited RAM. I've been there in a similar situation.)

ShutdownComplicated written at 01:47:11; Add Comment


The different contexts of stopping a Unix daemon or service

Most Unix init systems have a single way of stopping a daemon or a service, and on the surface this feels correct. And mostly it is, and mostly it works. However, I've recently come around to believing that this is a mistake and an over-generalization. I now believe that there are three different contexts and you may well want to stop things somewhat differently in each, depending on the daemon or service. This is especially the case if the daemon spawns multiple and somewhat independent processes as part of its operation, but it can happen in other situations as well, such as the daemon handling relatively long-running requests. To make this concrete I'm going to use the case of cron and long-running cron jobs, as well as Apache (or the web server of your choice).

The first context of stopping a daemon is a service restart, for example if the package management system is installing an updated version. Here you often don't want to abruptly stop everything the daemon is running. In the case of cron, you probably don't want a daemon restart to kill and perhaps restart all currently running cron jobs; for Apache, you probably want to let current requests complete, although this depends on what you're doing with Apache and how you have it configured.

The second context is taking down the service with no intention to restart it in the near future. You're stopping Apache for a while, or perhaps shutting down cron during a piece of delicate system maintenance, or even turning off the SSH daemon. Here you're much more likely to want running cron jobs, web requests, and even SSH logins to shut down, although you may want the init system to give them some grace time. This may actually be two contexts, one where you want a relatively graceful stop versus one where you really want an emergency shutdown with everything screeching to an immediate halt.

The third context is stopping the service during system shutdown. Here you unambiguously want everything involved with the daemon to stop, because everything on the system has to stop sooner or later. You almost always want everything associated with the daemon to stop as a group, more or less at the same time; among other reasons this keeps shutdown ordering sensible. If you need Apache to shut down before some backend service, you likely don't want lingering Apache sub-processes hanging around just because their request is taking a while to finish. Or at a minimum you don't want Apache to be considered 'down' for shutdown ordering until the last little bits die off.

As we see here, the first and the third context can easily conflict with each other; what you want for service restart can be the complete opposite of what you want during system shutdown. And an emergency service stop might mean you want an even more abrupt halt than you do during system shutdown. In hindsight, trying to treat all of these different contexts the same is over-generalization. The only time when they're all the same is when you have a simple single-process daemon, at which point there's only ever one version of shutting down the daemon; if the daemon process isn't running, that's it.

(As you might suspect, these thoughts are fallout from our Ubuntu shutdown problems.)

PS: While not all init systems are supervisory, almost all of them include some broad idea of how services are stopped as well as how they're started. System V init is an example of a passive init system that still has a distinct and well defined process for shutting down services. The one exception that I know of is original BSD, where there was no real concept of 'shutting down the system' as a process; instead reboot simply terminated all processes on the spot.

ThreeTypesOfServiceStop written at 01:12:41; Add Comment

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.