Wandering Thoughts

2014-09-18

Ubuntu's packaging failure with mcelog in 14.04

For vague historical reasons we've had the mcelog package in our standard package set. When we went to build our new 14.04 install setup, this blew up on us; on installation, some of our machines would report more or less the following:

Setting up mcelog (100-1fakesync1) ...
Starting Machine Check Exceptions decoder: CPU is unsupported
invoke-rc.d: initscript mcelog, action "start" failed.
dpkg: error processing package mcelog (--configure):
 subprocess installed post-installation script returned error exit status 1
Errors were encountered while processing:
 mcelog
E: Sub-process /usr/bin/dpkg returned an error code (1)

Here we see a case where a collection of noble intentions have had terrible results.

The first noble intention is a desire to warn people that mcelog doesn't work on all systems. Rather than silently run uselessly or silently exit successfully, mcelog instead reports an error and exits with a failure status.

The second noble intention is the standard Debian noble intention (inherited by Ubuntu) of automatically starting most daemons on installation. You can argue that this is a bad idea for things like database servers, but for basic system monitoring tools like mcelog and SMART monitoring I think most people actually want this; certainly I'd be a bit put out if installing smartd didn't actually enable it for me.

(A small noble intention is that the init script passes mcelog's failure status up, exiting with a failure itself.)

The third noble intention is that it is standard Debian behavior for an init script that fails when it is started in the package's postinstall script to cause the postinstall script itself to exit out with errors (it's in a standard dh_installinit stanza). When the package postinstall script errors out, dpkg itself flags this as a problem (as well it should) and boom, your entire package install step is reporting an error and your auto-install scripts fall down. Or at least ours do.

The really bad thing about this is that server images can change hardware. You can transplant disks from one machine to another for various reasons; you can upgrade the hardware of a machine but preserve the system disks; you can move virtual images around; you can (as we do) have standard machine building procedures that want to install a constant set of packages without having to worry about the exact hardware you're installing on. This mcelog package behavior damages this hardware portability in that you can't safely install mcelog in anything that may change hardware. Even if the initial install succeeds or is forced, any future update to mcelog will likely cause you problems on some of your machines (since a package update will likely fail just like a package install).

(This is a packaging failure, not an mcelog failure; given that mcelog can not work on some machines it's installed on, the init script failure should not cause a fatal postinstall script failure. Of course the people who packaged mcelog may well not have known that it had this failure mode on some machines.)

I'm sort of gratified to report that Debian has a bug for this, although the progress of the bug does not fill me with great optimism and of course it's probably important enough to ever make it into Ubuntu 14.04 (although there's also an Ubuntu bug).

PS: since mcelog has never done anything particularly useful for us, we have not been particularly upset over dropping it from our list of standard packages. Running into the issue was a bit irritating though, but mcelog seems to be historically good at irritation.

PPS: the actual problem mcelog has is even more stupid than 'I don't support this CPU'; in our case it turns out to be 'I need a special kernel module loaded for this machine but I won't do it for you'. It also syslogs (but does not usefully print) a message that says:

mcelog: AMD Processor family 16: Please load edac_mce_amd module.#012: Success

See eg this Fedora bug and this Debian bug. Note that the message really means 'family 16 and above', not 'family 16 only'.

linux/McelogUbuntuFailure written at 01:57:09; Add Comment

2014-09-17

In praise of Solaris's pfiles command

I'm sure that at one point I was introduced to pfiles through a description that called it the Solaris version of lsof for a single process. This is true as far as it goes and I'm certain that I used pfiles as nothing more than this for a long time, but it understates what pfiles can do for you. This is because pfiles will give you a fair amount more information than lsof will, and much of that information is useful stuff to know.

Like lsof, pfiles will generally report what a file descriptor maps to (file, device, network connection, and Solaris IPC 'doors', often with information about what process is on the other end of the door). Unlike on some systems, the pfiles information is good enough to let you track down who is on the other end of Unix domain sockets and pipes. Sockets endpoints are usually reported directly; pipe information generally takes cross-correlating with other processes to see who else has an S_IFIFO with the specific ino open.

(You would think that getting information on the destination of Unix domain sockets would be basic information, but on some systems it can take terrible hacks.)

Pfiles will also report some socket state information for sockets, like the socket flags and the send and receive buffers. Personally I don't find this deeply useful and I wish that pfiles also showed things like the TCP window and ACK state. Fortunately you can get this protocol information with 'netstat -f inet -P tcp' or 'netstat -v -f inet -P tcp' (if you want lots of details).

Going beyond this lsof-like information, pfiles will also report various fcntl() and open() flags for the file descriptor. This will give you basic information like the FD's read/write status, but it goes beyond this; for example, you can immediately see whether or not a process has its sockets open in non-blocking mode (which can be important). This is often stuff that is not reported by other tools and having it handy can save you from needing deep dives with DTrace, a debugger, or the program source code.

(I'm sensitive to several of these issues because my recent Amanda troubleshooting left me needing to chart out the flow of pipes and to know whether some sockets were nonblocking or not. I could also have done with information on TCP window sizes at the time, but I didn't find the netstat stuff until just now. That's how it goes sometimes.)

solaris/PfilesPraise written at 02:03:57; Add Comment

2014-09-15

My collection of spam and the spread of SMTP TLS

One of the things that my sinkhole SMTP server does that's new on my workstation is that it supports TLS, unlike my old real mail server there (which dates from a very, very long time ago). This has given me the chance to see how much of my incoming spam is delivered with TLS, which in turn has sparked some thoughts about the spread of SMTP TLS.

The starting point is that a surprising amount of my incoming spam is actually delivered with TLS; right now about 30% of the successful deliveries have used TLS. This is somewhat more striking than it sounds for two reasons; first, the Go TLS code I'm relying on for TLS is incomplete (and thus not all TLS-capable sending MTAs can actually do TLS with it), and second a certain amount of the TLS connection attempts fail because the sending MTA is offering an invalid client certificate.

(I also see a fair number of rejected delivery attempts in my SMTP command log that did negotiate TLS, but the stats there are somewhat tangled and I'm not going to try to summarize them.)

While there are some persistent spammers, most of the incoming email is your typical advance fee fraud and phish spam that's send through various sorts of compromised places. Much of the TLS email I get is this boring sort of spam, somewhat to my surprise. My prejudice is that a fair amount of this spam comes from old and neglected machines, which are exactly the machines that I would expect are least likely to do TLS.

(Some amount of such spam comes from compromised accounts at places like universities, which can and do happen to even modern and well run MTAs. I'm not surprised when they use TLS.)

What this says to me is that support for initiating TLS is fairly widespread in MTAs, even relatively old MTAs, and fairly well used. This is good news (it's now clear that pervasive encryption of traffic on the Internet is a good thing, even casual opportunistic encryption). I suspect that it's happened because common MTAs have enabled client TLS by default and the reason they've been able to do that is that it basically takes no configuration and almost always works.

(It's clear that at least some client MTAs take note when STARTTLS fails and don't try it again even if the server MTA offers it to them, because I see exactly this pattern in my SMTP logs from some clients.)

PS: you might wonder if persistent spammers use TLS when delivering their spam. I haven't done a systematic measurement for various reasons but on anecdotal spot checks it appears that my collection of them basically doesn't use TLS. This is probably unsurprising since TLS does take some extra work and CPU. I suspect that spammers may start switching if TLS becomes something that spam filtering systems use as a trust signal, just as some of them have started advertising DKIM signatures.

spam/SpamAndTLSSpread written at 23:25:16; Add Comment

I want my signed email to work a lot like SSH does

PGP and similar technologies have been in the news lately, and as a result of this I added the Enigmail extension to my testing Thunderbird instance. Dealing with PGP through Enigmail reminded me of why I'm not fond of PGP. I'm aware that people have all sorts of good reasons and that PGP itself has decent reasons for working the way it does, but for me the real strain point is not the interface but fundamentally how PGP wants me to work. Today I want to talk just about signed email, or rather however I want to deal with signed email.

To put it simply, I want people's keys for signed email to mostly work like SSH host keys. For most people the core of using SSH is not about specifically extending trust to specific, carefully validated host keys but instead about noticing if things change. In practical use you accept a host's SSH key the first time you're offered it and then SSH will scream loudly and violently if it ever changes. This is weaker than full verification but is far easier to use, and it complicates the job of an active attacker (especially one that wants to get away with it undetected). Similarly, in casual use of signed email I'm not going to bother carefully verifying keys; I'm instead going to trust that the key I fetched the first time for the Ubuntu or Red Hat or whatever security team is in fact their key. If I suddenly start getting alerts about a key mismatch, then I'm going to worry and start digging. A similar thing applies to personal correspondents; for the most part I'm going to passively acquire their keys from keyservers or other methods and, well, that's it.

(I'd also like this to extend to things like DKIM signatures of email, because frankly it would be really great if my email client noticed that this email is not DKIM-signed when all previous email from a given address had been.)

On the other hand, I don't know how much sense it makes to even think about general MUA interfaces for casual, opportunistic signed email. There is a part of me that thinks signed email is a sexy and easy application (which is why people keep doing it) that actually doesn't have much point most of the time. Humans do terribly at checking authentication, which is why we mostly delegate that to computers, yet casual signed email in MUAs is almost entirely human checked. Quick, are you going to notice that the email announcement of a new update from your vendor's security team is not signed? Are you going to even care if the update system itself insists on signed updates downloaded from secure mirrors?

(My answers are probably not and no, respectively.)

For all that it's nice to think about the problem (and to grumble about the annoyances of PGP), a part of me thinks that opportunistic signed email is not so much the wrong problem as an uninteresting problem that protects almost nothing that will ever be attacked.

(This also ties into the problem of false positives in security. The reality is that for casual message signatures, almost all missing or failed signatures are likely to have entirely innocent explanations. Or at least I think that this is the likely explanation today; perhaps mail gets attacked more often than I think on today's Internet.)

tech/MySignedMailDesire written at 01:42:10; Add Comment

2014-09-14

My current hassles with Firefox, Flash, and (HTML5) video

When I've written before about my extensions, I've said that I didn't bother with any sort of Flash blocking because NoScript handled that for me. The reality turns out to be that I was sort of living a charmed life, one that has recently stopped working the way I want it and forced me into a series of attempts at workarounds.

How I want Flash and video to work is that no Flash or video content activates automatically (autoplay is evil, among other things) but that I can activate any particular piece of content if and when I want to. Ideally this activation (by default) only last while the window is open; if I discard the window and re-visit the URL again, I don't get an autoplaying video or the like. In particular I want things to work this way on YouTube, which is my single most common source of videos (and also my most common need for JavaScript).

For a long time, things worked this way with just NoScript. Then at some point recently this broke down; if I relied only on NoScript, YouTube videos either would never play or would autoplay the moment the page loaded. If I turned on Firefox's 'ask to activate' for Flash, Firefox enabled and disabled things on a site-wide basis (so the second YouTube video I'd visit would autoplay). I wound up having to add two extensions to stop this:

  • Flashblock is the classic Flash blocker. Unlike Firefox's native 'ask to activate', it acts by default on a per-item basis, so activating one YouTube video I watch doesn't auto-play all future ones I look at. To make Flashblock work well I have disabled NoScript's blocking of Flash content so that I rely entirely on Flashblock; this has had the useful side effect of allowing me to turn on Flash elements on various things besides YouTube.

    But recently YouTube added a special trick to their JavaScript arsenal; if Flash doesn't seem to work but HTML5 video does, they auto-start their HTML5 player instead. For me this includes if Flashblock is blocking their Flash, so I had to find some way to deal with that.

  • StopTube stops YouTube autoplaying HTML5 videos. With both Flashblock and StopTube active, YouTube winds up using Flash (which is blocked and enabled by StopTube). I don't consider this ideal as I'd rather use HTML5, but YouTube is what it is. As the name of this addon sort of suggests, StopTube has the drawback that it only stops HTML5 video on YouTube itself. HTML5 video elsewhere are not blocked by it, including YouTube videos embedded on other people's pages. So far those embedded videos aren't autoplaying for me, but they may in the future. That might force me to not whitelist YouTube for JavaScript (at which point I almost might as well turn JavaScript off entirely in my browser).

    (An energetic person might be able to make such an addon starting from StopTube's source code.)

Some experimentation suggests that I might get back to what I want with just NoScript if I turn on NoScript's 'Apply these restrictions to whitelisted sites too' option for embedded content it blocks. But for now I like Flashblock's interface better (and I haven't been forced into this by being unable to block autoplaying HTML5 video).

There are still unfortunate aspects to this setup. One of them is that Firefox doesn't appear to have an 'ask to activate' (or more accurately 'ask to play') option for its HTML5 video support; this forces me to keep NoScript blocking that content instead of being able to use a nicer interface for enabling it if I want to. It honestly surprises me that Firefox doesn't already do this; it's an obviously feature and is only going to be more and more asked for as more people start using auto-playing HTML5 video for ads.

(See also this superuser.com question and its answers.)

web/FirefoxFlashVideoHassles written at 01:15:58; Add Comment

2014-09-13

What can go wrong with polling for writability on blocking sockets

Yesterday I wrote about how our performance problem with amandad were caused by amandad doing IO multiplexing wrong by only polling for whether it could read from its input file descriptors and assuming it could always write to its network sockets. But let's ask a question: suppose that amandad was also polling for writability on those network sockets. Would it work fine?

The answer is no, not without even more code changes, because amandad's network sockets aren't set to be non-blocking. The problem here is what it really means when poll() reports that something is ready for write (or for that matter, for read). Let me put it this way:

That poll() says a file descriptor is ready for writes doesn't mean that you can write an arbitrary amount of data to it without blocking.

When I put it this way, of course it can't. Can I write a gigabyte to a network socket or a pipe without blocking? Pretty much any kernel is going to say 'hell no'. Network sockets and pipes can never instantly absorb arbitrary amounts of data; there's always a limit somewhere. What poll()'s readiness indicator more or less means is that you can now write some data without blocking. How much data is uncertain.

The importance of non-blocking sockets is due to an API decision that Unix has made. Given that you can't write an arbitrary amount of data to a socket or a pipe without blocking, Unix has decided that by default when you write 'too much' you get blocked instead of getting a short write return (where you try to write N bytes and get told you wrote less than that). In order to not get blocked if you try a too large write you must explicitly set your file descriptor to non-blocking mode; at this point you will either get a short write or just an error (if you're trying to write and there is no room at all).

(This is a sensible API decision for reasons beyond the scope of this entry. And yes, it's not symmetric with reading from sockets and pipes.)

So if amandad just polled for writability but changed nothing else in its behavior, it would almost certainly still wind up blocking on writes to network sockets as it tried to stuff too much down them. At most it would wind up blocked somewhat less often because it would at least send some data immediately every time it tried to write to the network.

(The pernicious side of this particular bug is whether it bites you in any visible way depends on how much network IO you try to do how fast. If you send to the network (or to pipes) at a sufficiently slow rate, perhaps because your source of data is slow, you won't stall visibly on writes because there's always the capacity for how much data you're sending. Only when your send rates start overwhelming the receiver will you actively block in writes.)

Sidebar: The value of serendipity (even if I was wrong)

Yesterday I mentioned that my realization about the core cause of our amandad problem was sparked by remembering an apparently unrelated thing. As it happens, it was my memory of reading Rusty Russell's POLLOUT doesn't mean write(2) won't block: Part II that started me on this whole chain. A few rusty neurons woke up and said 'wait, poll() and then long write() waits? I was reading about that...' and off I went, even if my initial idea turned out to be wrong about the real cause. Had I not been reading Rusty Russell's blog I probably would have missed noticing the anomaly and as a result wasted a bunch of time at some point trying to figure out what the core problem was.

The write() issue is clearly in the air because Ewen McNeill also pointed it out in a comment on yesterday's entry. This is a good thing; the odd write behavior deserves to be better known so that it doesn't bite people.

programming/PollBlockingWritesBad written at 00:47:49; Add Comment

2014-09-12

How not to do IO multiplexing, as illustrated by Amanda

Every so often I have a belated slow-motion realization about what's probably wrong with an otherwise mysterious problem. Sometimes this is even sparked by remembering an apparently unrelated thing I read in passing. As it happens, that happened the other day.

Let's rewind to this entry, where I wrote about what I'd discovered while looking into our slow Amanda backups. Specifically this was a dump of amandad handling multiple streams of backups at once, which we determined is the source of our slowness. In that entry I wrote in passing:

[Amandad is] also spending much more of its IO wait time writing the data rather than waiting for there to be more input, although the picture here is misleading because it's also making pollsys() calls and I wasn't tracking the time spent waiting in those [...]

This should have set off big alarm bells. If amandad is using poll(), why is it spending any appreciable amount of time waiting for writes to complete? After all the whole purpose of poll() et al is to only be woken when you can do work, so you should spend minimal time blocked in the actual IO functions. The unfortunate answer is that Amanda is doing IO multiplexing wrong, in that I believe it's only using poll() to check for readability on its input FDs, not writability on its output FDs. Instead it tacitly assumes that whenever it has data to read it can immediately write all of this data out with no fuss, muss, or delays.

(Just checking for writability on the network connections wouldn't be quite enough, of course, but that's another entry.)

The problem is that this doesn't necessarily work. You can easily have situations where one TCP stream will accept much more data than another one, or where all (or just most) TCP streams will accept only a modest amount of data at a time; this is especially likely when the TCP streams are connected to different processes on the remote end. If one remote process stalls its TCP stream can stop accepting much more data, at which point a large write to the stream may stall in turn, which stalls all amandad activity even if it could succeed, which stalls upstream activity that is trying to send data to amandad. What amandad's handling of multiplexing does is to put all of the data streams flowing through it at the mercy of whatever is the slowest write stream at any given point in time. If anything blocks everything can block, and sooner or later we seem to wind up in a situation where anything can and does block. The result is a stuttering stop-start process full of stalls that reduces the overall data flow significantly.

In short, what you don't want in good IO multiplexing is a situation where IO on stream A has to wait because you're stalled doing IO on stream B, and that is just what amandad has arranged here.

Correct multiplexing is complex (even in the case of a single flow) but the core of it is never overcommitting yourself. What amandad should be doing is only writing as much output at a time as any particular connection can take, buffering some data internally, and passing any back pressure up to the things feeding it data (by stopping reading its inputs when it cannot write output and it has too much buffered). This would ensure that the multiplexing does no harm in that it can always proceed if any of the streams can proceed.

(Splitting apart the multiplexing into separate processes (or threads) does the same thing; because each one is only handling a single stream, that stream blocking doesn't block any other stream.)

PS: good IO multiplexing also needs to be fair, ie if if multiple streams or file descriptors can all do IO at the current time then all of them do some IO, instead of a constant stream of ready IO on one stream starving IO for other streams. Generally the easiest way to do this is to have your code always process all ready file descriptors before returning to poll(). This is also the more CPU-efficient way to do it.

programming/IOMultiplexingDoneWrong written at 01:10:43; Add Comment

2014-09-10

The cause of our slow Amanda backups and our workaround

A while back I wrote about the challenges in diagnosing slow (Amanda) backups. It's time for a followup entry on that, because we found what I can call 'the problem' and along with it a workaround. To start with, I need to talk about how we had configured our Amanda clients.

In order to back up our fileservers in a sensible amount of time, we run multiple backups on each them at once. We don't really try to do anything sophisticated to balance the load across multiple disks both because this is hard in our environment (especially given limited Amanda features) and because we've never seen much evidence that reducing overlaps was useful in speeding things up; instead we just have Amanda run three backups at once on each fileserver ('maxdumps 3' in Amanda configuration). For historical reasons we were also using Amanda's 'auth bsd' style of authentication and communication.

As I kind of mentioned in passing in my entry on Amanda data flows, 'auth bsd' communication causes all concurrent backup activity to flow through a single master amandad process. It turned out that this was our bottleneck. When we had a single amandad process handling sending all backups back to the Amanda server and it was running more than one filesystem backup at a time, things slowed down drastically and we experienced our problem. When an amandad process was only handling a single backup, things went fine.

We tested and demonstrated this in two ways. The first was we dropped one fileserver down to one dump at a time and then it ran fine. The more convincing test was to use SIGSTOP and SIGCONT to pause and then resume backups on the fly on a server running multiple backups at once. This demonstrated that network bandwidth usage jumped drastically when we paused two out of the three backups and tanked almost immediately when we allowed more than one to run at once. It was very dramatic. Further work with a DTrace script provided convincing evidence that it was the amandad process itself that was the locus of the problem and it wasn't that, eg, tar reads slowed down drastically if more than one tar was running at once.

Our workaround was to switch to Amanda's 'auth bsdtcp' style of communication. Although I initially misunderstood what it does, it turns out that this causes each concurrent backup to use a separate amandad process and this made everything work fine for us; performance is now up to the level where we're saturating the backup server disks instead of the network. Well, mostly. It turns out that our first-generation ZFS fileservers probably also have the slow backup problem. Unfortunately they're running a much older Amanda version and I'm not sure we'll try to switch them to 'auth bsdtcp' since they're on the way out anyways.

I call this a workaround instead of a solution because in theory a single central amandad process handling all backup streams shouldn't be a problem. It clearly is in our environment for some reason, so it sort of would be better to understand why and if it can be fixed.

(As it happens I have a theory for why this is happening, but it's long enough and technical enough that it needs another entry. The short version is that I think the amandad code is doing something wrong with its socket handling.)

sysadmin/SlowBackupsCause written at 23:14:34; Add Comment

Does init actually need to do daemon supervision?

Sure, init has historically done some sort of daemon supervision (or at least starting and stopping them) and I listed it as one of init's jobs. But does it actually need to do this? This is really two questions and thus two answers.

Init itself, PID 1, clearly does not have to be the process that does daemon supervision. We have a clear proof of this in Solaris, where SMF moves daemon supervision to a separate set of processes. SMF is not a good init system but its failures are failures of execution, not of its fundamental design; it does work, it's just annoying.

Whether the init system as a whole needs to do daemon supervision is a much more philosophical question and thus harder to answer. However I believe that on the whole the init system is the right place for this. The pragmatics of why are simple: the init system is responsible for booting and shutting down the system and doing this almost always needs at least some daemons to be started or stopped in addition to more scripted steps like filesystem checks. This means that part of daemon supervision is at least quite tightly entwined with booting, what I called infrastructure daemons when I talked about init's jobs. And since your init system must handle infrastructure daemons it might as well handle all daemons.

(In theory you could define an API for communication between the init system and a separate daemon supervision system in order to handle this. In practice, until this API is generally adopted your init system is tightly coupled with whatever starts and stops infrastructure daemons for it, ie you won't be able to swap one infrastructure daemon supervision system for another and whichever one your init system needs might as well be considered part of the init system itself.)

I feel that the pragmatic argument is also the core of a more philosophical one. There is no clear break between infrastructure daemons and service daemons (and in fact what category a daemon falls into can vary from system to system), which makes it artificial to have two separate daemon supervision systems. If you want to split the job of an init system apart at all, the 'right' split is between the minimal job of PID 1 and the twin jobs of booting the system and supervising daemons.

(This whole thing was inspired by an earlier entry being linked to by this slashdot comment, and then a reply to said comment arguing that the role of init is separate from a daemon manager. As you can see, I don't believe that it is on Unix in practice.)

Sidebar: PID 1 and booting the system

This deserves its own entry to follow all of the threads, but the simple version for now: in a Unix system with (only) standard APIs, the only way to guarantee that a process winds up as PID 1 is for the kernel to start it as such. The easiest way to arrange for this is for said process to be the first process started so that PID 1 is the first unused PID. This naturally leads into PID 1 being responsible for booting the system, because if it wasn't the kernel would have to also start another process to do this (and there would have to be a decision about what the process is called and so on).

This story is increasingly false in modern Unix environments which do various amounts of magic setup before starting the final real init, but there you have it.

unix/InitDaemonSupervision written at 01:58:15; Add Comment

2014-09-08

What an init system needs to do in the abstract

I've talked before about what init does historically, but that's not the same thing as what an init system actually needs to do, considered abstractly and divorced from the historical paths that got us here and still influence how we think about init systems. So, what does a modern init system in a modern Unix need to do?

At the abstract level, I think a modern init system has three jobs:

  1. Being the central process on the system. This is both the modest job of being PID 1 (inheriting parentless processes and reaping them when they die) and the larger, more important job of supervising and (re)starting any other components of the init system.

  2. Starting and stopping the system, and also transitioning it between system states like single user and multiuser. The second job has diminished in importance over the years; in practice most systems today almost never transition between runlevels or the equivalent except to boot or reboot.

    (At one point people tried to draw a runlevel distinction between 'multiuser without networking' and 'multiuser with networking' and maybe 'console text logins' and 'graphical logins with X running' but today those distinctions are mostly created by stopping and starting daemons, perhaps abstracted through high level labels for collections of daemons.)

  3. Supervising (daemon) processes to start, stop, and restart them on demand or need or whatever. This was once a sideline but has become the major practical activity of an init system and why people spend most of the time interacting with it. Today this encompasses both regular getty processes (which die and restart regularly) and a whole collection of daemons (which are often not expected to die and may not be restarted automatically if they do).

    You can split this job into two sorts of daemons, infrastructure processes that must be started in order for the core system to operate (and for other daemons to run sensibly) and service processes that ultimately just provide services to people using the machine. Service processes are often simpler to start, restart, and manage than infrastructure processes.

In practice modern Unixes often add a fourth job, that of managing the appearance and disappearance of devices. This job is not strictly part of init but it is inextricably intertwined with at least booting the system (and sometimes shutting it down) and in a dependency-based init system it will often strongly influence what jobs/processes can be started or must be stopped at any given time (eg you start network configuration when the network device appears, you start filesystem mounts when devices appear, and so on).

The first job mostly or entirely requires being PID 1; at a minimum your PID 1 has to inherit and reap orphans. Since stopping and starting daemons and processes in general is a large part of booting and rebooting, the second and third jobs are closely intertwined in practice although you could in theory split them apart and that might simplify each side. The fourth job is historically managed by separate tools but often talks with the init system as a whole because it's a core dependency of the second and third jobs.

(Booting and rebooting is often two conceptually separate steps in that first you check filesystems and do other initial system setup then you start a whole bunch of daemons (and in shutdown you stop a bunch of daemons and then tear down core OS bits). If you do this split, you might want to transfer responsibility for infrastructure daemons to the second job.)

The Unix world has multiple existence proofs that all of these roles do not have to be embedded in a single PID 1 process and program. In particular there is a long history of (better) daemon supervision tools that people can and do use as replacements for their native init system's tools for this (often just for service daemons), and as I've mentioned Solaris's SMF splits the second and third role out into a cascade of additional programs.

unix/InitTheoreticalJobs written at 21:54:24; Add Comment

(Previous 10 or go back to September 2014 at 2014/09/07)

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.