Wandering Thoughts

2017-11-13

X11 PseudoColor displays could have multiple hardware colormaps

When I talked about PseudoColor displays and window managers, I described things assuming that there was only a single hardware colormap. However, if you read the X11 documentation you'll run across tantalizing things like:

Most workstations have only one hardware look-up table for colors, so only one application colormap can be installed at a given time.

(Emphasis mine.)

'Most' is not all, and indeed this is the case; there were Unix workstations with PseudoColor displays that had multiple hardware colormaps. As it happens I once used such a machine, my SGI R5K Indy. As a sysadmin machine we bought the version with SGI's entry level 8-bit XL graphics, but that was still advanced enough that it had multiple hardware colormaps instead of the single colormap that I was used to from my earlier machines.

When I was using the Indy I didn't really notice the multiple hardware colormaps, which is not too surprising (people rapidly stop noticing things that don't happen, like your display flashing as colormaps have to be swapped around), but in retrospect I think they enabled some things that I didn't think twice about at the time. I believe my Indy was the first time I used pictures as desktop backgrounds, and looking at the 1996 desktop picture in the appendix of this entry, that picture is full colour and not too badly dithered.

(As it happens I still have the source image for this desktop background and it's a JPEG with a reasonably large color range. Some of the dithering is in the original, probably as an artifact of it being scanned from an artbook in the early 1990s.)

In general, I think that having multiple hardware colormaps basically worked the way you'd expect. Any one program (well, window) couldn't have lots of colors, so JPEG images and so on still had to be approximated, but having a bunch of programs on the screen at once was no problem (even with the window manager's colors thrown in). I used that Indy through the era when websites started getting excessively colourful, so its multiple hardware colormaps likely got a good workout from Netscape windows.

(In 1996, Mozilla was well in the future.)

At the time and for years afterward, I didn't really think about how this was implemented in the hardware. Today, it makes me wonder, because X is normally what I'll call a software compositing display system where the X server assembles all pixels from all windows into a single RAM area and has the graphics display that (instead of telling the graphics hardware to composite together multiple separate bits and pieces). This makes perfect sense for a PseudoColor display when there's only one hardware colormap, but when you have multiple hardware colormaps, how does the display hardware know which pixel is associated with which hardware colormap? Perhaps there was a separate additional mapping buffer with two or three bits per pixel that specified the hardware colormap to use.

(Such a mapping buffer would be mostly static, as it only needs to change if a window with its own colormap is moved, added, or removed, and it wouldn't take up too much memory by 1996 standards.)

X11MultipleHWColormaps written at 00:20:28; Add Comment

2017-11-12

The fun of X11 PseudoColor displays and window managers

Yesterday, I described how X11's PseudoColor is an indirect colormap, where the 'colors' you assigned to pixels were actually indexes into a colormap that gave the real RGB colour values. In the common implementation (an 8-bit 'colour' index into a 24-bit colormap), you could choose colours out of 16 million of them, but you could only have 256 different ones in a colormap. This limitation creates an obvious question: on a Unix system with a bunch of different programs running, how do you decide on which 256 different colours you get? What happens when two programs want different sets of them (perhaps you have two different image display programs trying to display two different images at the same time)?

Since X's nominal motto is 'mechanism, not policy', the X server and protocol do not have an answer for you. In fact they aggressively provide a non-answer, because the X protocol allows for every PseudoColor window to have its own colormap that the program behind the window populates with whatever colours it wants. Programs can inherit colormaps, including from the display (technically the root window, but that's close enough because the root window is centrally managed), so you can build some sort of outside mechanism so everyone uses the same colormap and coordinates it, but programs are also free to go their own way.

(For example, I believe that X desktops like Motif/CDE had standard colormaps that all of their normal applications were expected to share.)

Whenever you have a distributed problem in X that needs some sort of central coordination, the normal answer is 'the window manager handles it'. PseudoColor colormaps are no exception, and so there is an entire X program to window manager communication protocol about colormap handling, as part of the ICCCM; the basic idea is that programs tell the window manager 'this window needs this colormap', and then the window manager switches the X server to the particular colormap whenever it feels like it. Usually this is whenever the window is the active window, because normally the user wants the active window to be the one that has correct colors.

(In X terminology, this is called 'installing' the colormap.)

The visual result of the window manager switching the colormap to one with completely different colors is that other windows go technicolour and get displayed with false and bizarre colors. The resulting flashing as you moved back and forth between programs, changed images in an image display program, or started and then quit colour-intensive programs was quite distinctive and memorable. There's nothing like it in a modern X environment, where things are far more visually stable.

The window manager generally had its own colormap (usually associated with the root window) because the window manager generally needed some colours for window borders and decorations, its menus, and so on. This colormap was basically guaranteed to always have black and white color values, so programs that only needed them could just inherit this colormap. In fact there was also a whole protocol for creating and managing standard (shared) colormaps, with a number of standard colormaps defined; you could use one of these standard colormaps if your program just needed some colors and wasn't picky about the exact shades. A minimal case of this was if your program only used black and white; as it happens, this describes many programs in a normal X system (especially in the days of PseudoColor displays), such as xterm, xclock, Emacs and other GUI text editors, and so on. All of these programs could use the normal default colormap, which was important to avoid colours changing all of the time as you switched windows.

(For much of X's life, monochrome X displays were still very much a thing, so programs tended to only use colour if they really needed to. Today color displays are pervasive so even programs that only really have a foreground and a background colour will let you set those to any colour you want, instead of locking you to black and white.)

One of the consequences of PseudoColor displays for window managers was that (colour) gradients were generally considered a bad idea, because they could easily eat up a lot of colormap entries. Window managers in the PseudoColor era were biased towards simple and minimal colour schemes, ideally using and reusing only a handful of colours. When TrueColor displays became the dominant thing in X, there was an explosion of window managers using and switching over to colour gradients in things like window title bars and decorations; not necessarily because it made sense, but because they now could. I think that has fortunately now died down and people are back to simpler colour schemes.

X11PseudocolorAndWMs written at 02:21:13; Add Comment

2017-11-11

What X11's TrueColor means (with some history)

If you've been around X11 long enough and peered under the hood a bit, you may have run across mentions of 'truecolor'. If you've also read through the manual pages for window managers with a sufficiently long history, such as fvwm, you may also have run across mentions of 'colormaps'. Perhaps you're wondering what the background of these oddities are.

Today, pixels are represented with one byte (8 bits) for each RGB color component, and perhaps another byte for the transparency level ('alpha'), partly because that makes each pixel 32 bits (4 bytes) and computers like 32-bit things much better than they like 24 bit (3 byte) things. However, this takes up a certain amount of memory. For instance, a simple 1024 by 768 display with 24 bits per pixel takes up just over 2 megabytes of RAM. Today 2 MB of RAM is hardly worth thinking about, but in the late 1980s and early 1990s it was a different matter entirely. Back then an entire workstation might have only 16 MB of RAM, and that RAM wasn't cheap; adding another 2 MB for the framebuffer would drive the price up even more. At the same time, people wanted color displays instead of black and white and were certainly willing to pay a certain amount extra for Unix workstations that had them.

If three bytes per pixel is too much RAM, there are at least two straightforward options. The first is to shrink how many bits you give to each color component; instead of 8-bit colour, you might do 5-bit color, packing a pixel into two bytes. The problem is that the more memory you save, the fewer colors and especially shades of gray you have. At 5-bit colour you're down to 32 shades of gray and only 32,768 different possible colors, and you've only saved a third of your framebuffer memory. The second is to do the traditional computer science thing by adding a layer of indirection. Instead of each pixel directly specifying its colour, it specifies an index into a colormap, which maps to the actual RGB color. The most common choice here is to use a byte for each pixel and thus to have a 256-entry colormap, with '24 bit' colour (ie, 8-bit RGB color components). The colormap itself requires less than a kilobyte of RAM, your 1024 by 768 screen only needs a more tolerable (and affordable) 768 KB of RAM, and you can still have your choice out of 16 million colors; it's just that you can only have 256 different colors at once.

(Well, sort of, but that's another entry.)

This 256-color indirect color mode is what was used for all affordable colour Unix workstations in the 1980s and most of the 1990s. In X11 terminology it's called a PseudoColor display, presumably because the pixel 'colour' values were not actually colors but instead were indexes into the colormap, which had to be maintained and managed separately. However, if you had a lot of money, you could buy a Unix workstation with a high(er) end graphics system that had the better type of color framebuffer, where every pixel directly specified its RGB color. In X11 terminology, this direct mapping from pixels to their colors is a TrueColor display (presumably because the pixel values are their true color).

(My memory is that truecolor systems were often called 24-bit color and pseudocolor systems were called 8-bit color. Depending on your perspective this isn't technically correct, but in practice everyone reading descriptions of Unix workstations at the time understood what both meant.)

Directly mapped 'truecolor' color graphics supplanted indirect pseudocolor graphics sometime in the late 1990s, with the growth of PCs (and the steady drop in RAM prices, which made two extra bytes per pixel increasingly affordable). It's probably been at least 15 years since you could find a pseudocolor graphics system on (then) decent current hardware; these days, 'truecolor' is basically the only colour model. Still, the terminology lingers on in X11, ultimately because X11 is at its heart a very old system and is still backward compatible to those days (at least in theory).

(I suspect that Wayland does away with all of the various options X11 has here and only supports the directly mapped truecolor model (probably with at least RGB and YUV). That would certainly be the sane approach.)

PS: It's true that in the late 1990s, you could still find Sun and perhaps SGI selling workstations with pseudocolor displays. This wasn't a good thing and contributed to the downfall of dedicated Unix workstations. At that point, decent PCs were definitely using truecolor 24-bit displays, which was part of what made PCs more attractive and most of the dedicated Unix workstations so embarrassing.

(Yes, I'm still grumpy at Sun about its pathetic 1999-era 'workstations'.)

X11TruecolorHistory written at 01:05:45; Add Comment

2017-10-26

The 'standard set' of Unix programs is something that evolves over time

I've recently been writing about how OmniOS's minimal set of additional programs has made it annoying to deal with. In the process of this I've casually talked about 'the standard set' of programs that I expect a Unix system to have. In a comment on yesterday's entry, David Magda provided a comment that is a perfect lead in for something that I was going to write about anyway:

Define "standard set". :) Isn't that one of the reasons POSIX was created?

There are a number of problems with leaning on POSIX here, but the one I want to focus on today is that it's outdated. At the time POSIX was created, its set of utilities was probably a decent selection (and in some ways it was forward-looking). But that was a long time ago, and the Unix world has not stood still since then (even though it may sometimes feel that way).

There hasn't been any actual standardization or updates to standards, of course, because that is not really how the Unix world works for things like that. Instead, to copy from the IETF, it is more 'rough consensus and sufficiently popular code'. If enough Unix environments provide something and that something is popular, it evolves into a de facto standard. Pragmatically, you when something has reached this point because if you leave it out, people are surprised and annoyed. For instance, here in 2017, shipping a non-bare-bones Unix system with networking but without SSH would get you plenty of grumpy reactions, especially if you included a telnet daemon.

(A lot of people would specifically expect OpenSSH; providing another SSH implementation that wasn't usage-compatible with it would be annoying.)

One consequence of this evolution in the 'standard set' of Unix programs is that any Unix that freezes their set of programs is going to increasingly disappoint and irritate people over time. More and more people are going to show up, try to use the system, and say 'what, it's missing obvious program <X>? This is bogus; how old and outdated is this thing?' This is inconvenient for people building and maintaining Unixes, but that's life. Unix is not a static entity and never has been; Unix has always evolved.

(Your view of what should be included in this standard set is affected both by what you do and what Unixes you do it on. As a sysadmin, I have my own biases and they include programs like top, tcpdump, and sudo. But my set also includes less and more ordinary programs like rsync and gzip, and these are probably on many people's lists.)

PS: Just as people expect new programs to be in this 'standard set' on your Unix, they also expect your versions of existing standard programs to evolve and keep up with the times. One obvious sign of this is that the GNU versions of tools (or close imitations of them) are probably now expected by many people.

UnixStandardProgramsEvolve written at 00:25:53; Add Comment

2017-10-21

Multi-Unix environments are less and less common now

For a long time, the Unix environments that I existed in had a lot of diversity. There was a diversity of versions of Unix and with them a diversity of architectures (and sometimes a single vendor had multiple architectures). This was most pronounced in a number of places here that used NFS heavily, where your $HOME could be shared between several different Unixes and architectures, but even with an unshared $HOME I did things like try to keep common dotfiles. And that era left its mark on Unix itself, for example in what is now the more or less standard split between /usr/share and /usr/lib and friends. Distinguishing between 'shared between architectures' and 'specific to a single architecture' only makes sense when you might have more than one in the same large-scale environment, and this is what /usr/share is about.

As you may have noticed, such Unix environments are increasingly uncommon now, for a number of reasons. For a start, the number of interesting computer architectures for Unix has shrunk dramatically; almost no one cares about anything other than 64-bit x86 now (although ARM is still waiting in the wings). This spills through to Unix versions, since generally all 64-bit x86 hardware will run your choice of Unix. The days when you might have bought a fire-breathing MIPS SMP server for compute work and got SGI Irix with it are long over.

(Buying either the cheapest Unix servers or the fastest affordable ones was one of the ways that multiple Unixes tended to show up around here, at least, because which Unix vendor was on top in either category tended to keep changing over the years.)

With no hardware to force you to pick some specific Unix, there's a strong motivation to standardize on one Unix that runs on all of your general-usage hardware, whatever that is. Even if you have a NFS-mounted $HOME, this means you only deal with one set of personal binaries and so on in a homogenous environment. Different versions of the same Unix count as a 'big difference' these days.

Beyond that, the fact is that Unixes are pretty similar from a user perspective these days. There once was a day when Unixes were very different, which meant that you might need to do a lot of work to deal with those differences. These days most Unixes feels more or less the same once you have your $PATH set up, partly because in many cases they're using the same shells and other programs (Bash, for example, as a user shell). The exceptions tend to make people grumpy and often to cause heartburn (and people avoid heartburn). The result may technically be a multi-Unix environment, but it doesn't feel like it and you might not really notice it.

(With all of this said, I'm sure that there are still multi-Unix environments out there, and some of them are probably still big. There's also the somewhat tricky issue of people who work with Macs as their developer machines and deploy to non-MacOS Unix servers. My impression as a distant bystander is that MacOS takes a fair amount of work to get set up with a productive and modern set of Unix tools, and you have to resort to some third party setup to do it; the result is inevitably a different feel than you get on a non-MacOS server.)

MultiUnixEnvNowUncommon written at 01:20:40; Add Comment

2017-09-30

The origin of POSIX as I learned the story (or mythology)

I recently wound up rereading Jeremy Allison's A Tale of Two Standards (via Everything you never wanted to know about file locking), which tells an origin story for the POSIX standard for Unix where it was driven by ISVs wanting a common 'Unix' API that they could write their products to so they'd be portable across all the various Unix versions. It's quite likely that this origin story is accurate, and certainly the divergence in Unixes irritated ISVs (and everyone else) at the time. However, it is not the origin mythology for POSIX that I learned during my early years with Unix, so here is the version I learned.

During the mid and late 1980s, the US government had a procurement problem; it wanted to buy Unix systems, but perfectly sensible procurement rules made this rather hard. If it tried to simply issue a procurement request to buy from, say, Sun, companies like SGI and DEC and so on would naturally object and demand answers for how and why the government had decided that their systems wouldn't do. If the government expanded the procurement request to include other Unix vendors so they could also bid on it (saying 'any workstation with these hardware specifications' or the like), people like IBM or DEC would demand answers for why their non-Unix systems wouldn't do. And if the government said 'fine, we want Unix systems', it was faced with the problem of actually describing what Unix was in the procurement request (ideally in a form that was vendor neutral, since procurement rules frown on supposedly open requests that clearly favour one vendor or a small group of them).

This government procurement problem is not unique to Unix, and the usual solution to it is a standard. Once the government has a standard, either of its own devising or specified by someone else, it can simply issue a procurement request saying 'we need something conforming to standard X', and in theory everyone with a qualifying product can bid and people who don't have such a product have no grounds for complaint (or at least they have less grounds for complaint; they have to try to claim you picked the wrong standard or an unnecessary standard).

Hence, straightforwardly, POSIX, and also why Unix vendors cared about POSIX as much as they did at the time. It wasn't just to make the life of ISVs easier; it was also because the government was going to be specifying POSIX in procurement bids, and most of the Unix vendors didn't want to be left out. In the process, POSIX painstakingly nailed down a great deal of what the 'Unix' API is (not just at the C level but also for things like the shell and command environment), invented some genuinely useful things, and pushed towards creating and standardizing some new ideas (POSIX threading, for example, was mostly not standardizing existing practice).

PS: You might wonder why the government didn't just say 'must conform to the System V Interface Definition version N' in procurement requests. My understanding is that procurement rules frown on single-vendor standards, and that was what the SVID was; it was created by and for AT&T. Also, at the time requiring the SVID would have left out Sun and various other people that the government probably wanted to be able to buy Unixes from.

(See also the Wikipedia entry on the Unix wars, which has some useful chronology.)

POSIXOriginStory written at 20:51:12; Add Comment

2017-09-29

Shell builtin versions of standard commands have drawbacks

I'll start with a specific illustration of the general problem:

bash# kill -SIGRTMIN+22 1
bash: kill: SIGRTMIN+22: invalid signal specification
bash# /bin/kill -SIGRTMIN+22 1
bash#

The first thing is that yes, this is Linux being a bit unusual. Linux has significantly extended the usual range of Unix signal numbers to include POSIX.1-2001 realtime signals, and then can vary what SIGRTMIN is depending on how a system is set up. Once Linux had these extra signals (and defined in the way they are), people sensibly added support for them to versions of kill. All of this is perfectly in accord with the broad Unix philosophy; of course if you add a new facility to the system you want to expose it to shell scripts when that's possible.

Then along came Bash. Bash is cross-Unix, and it has a builtin kill command, and for whatever reason the Bash people didn't modify Bash so that on Linux it would support the SIGRTMIN+<n> syntax (some possible reasons for that are contained in this sentence). The results of that are a divergence between the behavior of Bash's kill builtin and the real kill program that have become increasingly relevant now that programs like systemd are taking advantage of the extra signals to allow you to control more of their operations by sending them more signals.

Of course, this is a generic problem with shell builtins that shadow real programs in any (and all) shells; it's not particularly specific to Bash (zsh also has this issue on Linux, for example). There are advantages to having builtins, including builtins of things like kill, but there are also drawbacks. How best to fix or work around them isn't clear.

(kill is often a builtin in shells with job control, Bash included, so that you can do 'kill %<n>' and the like. Things like test are often made builtins for shell script speed, although Unixes can take that too far.)

PS: certainly one answer is 'have Bash implement the union of all special kill, test, and so on features from all Unixes it runs on', but I'm not sure that's going to work in practice. And Bash is just one of several popular shells, all of whom would need to keep up with things (or at least people probably want them to do so).

BashKillBuiltinDrawback written at 21:40:28; Add Comment

2017-09-23

A clever way of killing groups of processes

While reading parts of the systemd source code that handle late stage shutdown, I ran across an oddity in the code that's used to kill all remaining processes. A simplified version of the code looks like this:

void broadcast_signal(int sig, [...]) {
   [...]
   kill(-1, SIGSTOP);

   killall(sig, pids, send_sighup);

   kill(-1, SIGCONT);
   [...]
}

(I've removed error checking and some other things; you can see the original here.)

This is called to send signals like SIGTERM and SIGKILL to everything. At first the use of SIGSTOP and SIGCONT puzzled me, and I wondered if there was some special behavior in Linux if you SIGTERM'd a SIGSTOP'd process. Then the penny dropped; by SIGSTOPing processes first, we're avoiding any thundering herd problems when processes start dying.

Even if you use kill(-1, <signal>), the kernel doesn't necessarily guarantee that all processes will receive the signal at once before any of them are scheduled. So imagine you have a shell pipeline that's remained intact all the way into late-stage shutdown, and all of the processes involved in it are blocked:

proc1 | proc2 | proc3 | proc4 | proc5

It's perfectly valid for the kernel to deliver a SIGTERM to proc1, immediately kill the process because it has no signal handler, close proc1's standard output pipe as part of process termination, and then wake up proc2 because now its standard input has hit end-of-file, even though either you or the kernel will very soon send proc2 its own SIGTERM signal that will cause it to die in turn. This and similar cases, such as a parent waiting for children to exit, can easily lead to highly unproductive system thrashing as processes are woken up unnecessarily. And if a process has a SIGTERM signal handler, the kernel will of course schedule it to wake up and may start it running immediately, especially on a multi-core system.

Sending everyone a SIGSTOP before the real signal completely avoids this. With all processes suspended, all of them will get your signal before any of them can wake up from other causes. If they're going to die from the signal, they'll die on the spot; they're not going to die (because you're starting with SIGTERM or SIGHUP and they block or handle it), they'll only get woken up at the end, after most of the dust has settled. It's a great solution to a subtle issue.

(If you're sending SIGKILL to everyone, most or all of them will never wake up; they'll all be terminated unless something terrible has gone wrong. This means this SIGSTOP trick avoids ever having any of the processes run; you freeze them all and then they die quietly. This is exactly what you want to happen at the end of system shutdown.)

ProcessKillingTrick written at 02:42:54; Add Comment

2017-09-13

System shutdown is complicated and involves policy decisions

I've been a little harsh lately on how systemd has been (not) shutting down our systems, and certainly it has some issues and it could be better. But I want to note that in general and in practice, shutting down a Unix system is a complicated thing that involves tradeoffs and policy decisions; in fact I maintain that it's harder than booting the system. Further, the more full-featured you attempt to make system shutdown, the more policy decisions and tradeoffs you need to make.

(The only way to make system shutdown simple is to have a very minimal view of it and to essentially crash the running system, as original BSD did. This is a valid choice and certainly systems should be able to deal with abrupt crashes, since they do happen, but it isn't necessarily a great one. Your database can recover after a crash-stop, but it will probably be happier if you let it shut down neatly and it may well start faster that way.)

One of the problems that makes shutdown complicated is that on the one hand, stopping things can fail, and on the other hand, when you shut down the system you want and often need for it to actually go down, so overall system shutdown can't fail. Reconciling these conflicting facts requires policy decisions, because there is no clear universal technical answer for what you do if a service shutdown fails (ie the service process or processes remain running), or a filesystem can't be unmounted, or some piece of hardware says 'no, I am not detaching and shutting down'. Do you continue on with the rest of the shutdown process and try again later? Do you start killing processes that might be holding things busy? What do you do about your normal shutdown ordering requirements, for example do you block further services and so on from shutting down just yet, or do you continue on (and perhaps let them make their own decisions about whether they can shut down)?

There are no one size fits all answers to these questions and issues, especially if the init system is essentially blind to the specific nature of the services involved and treats them as generic 'services' with generic 'shutdown' actions. Even in an init system where the answers to these questions can be configured on a per-service or per-item basis, someone has to do that configuration and get it right (which may be complicated by an init system that doesn't distinguish between the different contexts of stopping a specific service, which means that you get to pick your poison).

While it's not trivial, it's not particularly difficult for an init system to reliably shut down machines if and when all of the individual service and item shutdowns go fine and all of the dependencies are fully expressed (and correct), so that everything is stopped in the right order. But this is the easy case. The hard case for all init systems is when something goes wrong, and many init systems have historically had issues here.

(Many implementations of System V init would simply stall the entire system shutdown if an '/etc/init.d/<whatever> stop' operation hung, for example.)

PS: One obvious pragmatic question and problem is how and when you give up on an orderly shutdown of a service and (perhaps) switch over to things like killing processes. Services may legitimately take some time to shut down, in order to flush out data, close databases properly, and so on, but they can also hang during shutdown for all sorts of reasons. This is especially relevant in any init system that shuts down multiple services in parallel, because each service being shut down could suddenly want a bunch of resources.

(One of the fun cases is where you have heavyweight daemons that are all inactive and paged out of RAM, and you ask them to do an orderly shutdown, which suddenly causes everything to try to page back in to your limited RAM. I've been there in a similar situation.)

ShutdownComplicated written at 01:47:11; Add Comment

2017-09-12

The different contexts of stopping a Unix daemon or service

Most Unix init systems have a single way of stopping a daemon or a service, and on the surface this feels correct. And mostly it is, and mostly it works. However, I've recently come around to believing that this is a mistake and an over-generalization. I now believe that there are three different contexts and you may well want to stop things somewhat differently in each, depending on the daemon or service. This is especially the case if the daemon spawns multiple and somewhat independent processes as part of its operation, but it can happen in other situations as well, such as the daemon handling relatively long-running requests. To make this concrete I'm going to use the case of cron and long-running cron jobs, as well as Apache (or the web server of your choice).

The first context of stopping a daemon is a service restart, for example if the package management system is installing an updated version. Here you often don't want to abruptly stop everything the daemon is running. In the case of cron, you probably don't want a daemon restart to kill and perhaps restart all currently running cron jobs; for Apache, you probably want to let current requests complete, although this depends on what you're doing with Apache and how you have it configured.

The second context is taking down the service with no intention to restart it in the near future. You're stopping Apache for a while, or perhaps shutting down cron during a piece of delicate system maintenance, or even turning off the SSH daemon. Here you're much more likely to want running cron jobs, web requests, and even SSH logins to shut down, although you may want the init system to give them some grace time. This may actually be two contexts, one where you want a relatively graceful stop versus one where you really want an emergency shutdown with everything screeching to an immediate halt.

The third context is stopping the service during system shutdown. Here you unambiguously want everything involved with the daemon to stop, because everything on the system has to stop sooner or later. You almost always want everything associated with the daemon to stop as a group, more or less at the same time; among other reasons this keeps shutdown ordering sensible. If you need Apache to shut down before some backend service, you likely don't want lingering Apache sub-processes hanging around just because their request is taking a while to finish. Or at a minimum you don't want Apache to be considered 'down' for shutdown ordering until the last little bits die off.

As we see here, the first and the third context can easily conflict with each other; what you want for service restart can be the complete opposite of what you want during system shutdown. And an emergency service stop might mean you want an even more abrupt halt than you do during system shutdown. In hindsight, trying to treat all of these different contexts the same is over-generalization. The only time when they're all the same is when you have a simple single-process daemon, at which point there's only ever one version of shutting down the daemon; if the daemon process isn't running, that's it.

(As you might suspect, these thoughts are fallout from our Ubuntu shutdown problems.)

PS: While not all init systems are supervisory, almost all of them include some broad idea of how services are stopped as well as how they're started. System V init is an example of a passive init system that still has a distinct and well defined process for shutting down services. The one exception that I know of is original BSD, where there was no real concept of 'shutting down the system' as a process; instead reboot simply terminated all processes on the spot.

ThreeTypesOfServiceStop written at 01:12:41; Add Comment

(Previous 10 or go back to August 2017 at 2017/08/20)

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.