Ubuntu's packaging failure with
mcelog in 14.04
For vague historical reasons we've had the
mcelog package in our
standard package set. When we went to build our new 14.04 install
setup, this blew up on us; on installation, some of our machines
would report more or less the following:
Setting up mcelog (100-1fakesync1) ... Starting Machine Check Exceptions decoder: CPU is unsupported invoke-rc.d: initscript mcelog, action "start" failed. dpkg: error processing package mcelog (--configure): subprocess installed post-installation script returned error exit status 1 Errors were encountered while processing: mcelog E: Sub-process /usr/bin/dpkg returned an error code (1)
Here we see a case where a collection of noble intentions have had terrible results.
The first noble intention is a desire to warn people that
doesn't work on all systems. Rather than silently run uselessly or
silently exit successfully,
mcelog instead reports an error and
exits with a failure status.
The second noble intention is the standard Debian noble intention
(inherited by Ubuntu) of automatically starting most daemons on
installation. You can argue that this is a bad idea for things like
database servers, but for basic system monitoring tools like
and SMART monitoring I think most people actually want this; certainly
I'd be a bit put out if installing
smartd didn't actually enable
it for me.
(A small noble intention is that the init script passes
failure status up, exiting with a failure itself.)
The third noble intention is that it is standard Debian behavior
for an init script that fails when it is started in the package's
postinstall script to cause the postinstall script itself to exit
out with errors (it's in a standard
When the package postinstall script errors out,
dpkg itself flags
this as a problem (as well it should) and boom, your entire package
install step is reporting an error and your auto-install scripts fall
down. Or at least ours do.
The really bad thing about this is that server images can change
hardware. You can transplant disks from one machine to another for
various reasons; you can upgrade the hardware of a machine but
preserve the system disks; you can move virtual images around; you
can (as we do) have standard machine building procedures that want
to install a constant set of packages without having to worry about
the exact hardware you're installing on. This
behavior damages this hardware portability in that you can't safely
mcelog in anything that may change hardware. Even if the
initial install succeeds or is forced, any future update to
will likely cause you problems on some of your machines (since a
package update will likely fail just like a package install).
(This is a packaging failure, not an
mcelog failure; given that
mcelog can not work on some machines it's installed on, the init
script failure should not cause a fatal postinstall script failure.
Of course the people who packaged
mcelog may well not have known
that it had this failure mode on some machines.)
I'm sort of gratified to report that Debian has a bug for this, although the progress of the bug does not fill me with great optimism and of course it's probably important enough to ever make it into Ubuntu 14.04 (although there's also an Ubuntu bug).
mcelog has never done anything particularly useful for
us, we have not been particularly upset over dropping it from our
list of standard packages. Running into the issue was a bit irritating
mcelog seems to be historically good at irritation.
PPS: the actual problem
mcelog has is even more stupid than 'I
don't support this CPU'; in our case it turns out to be 'I need a
special kernel module loaded for this machine but I won't do it for
you'. It also syslogs (but does not usefully print) a message that
mcelog: AMD Processor family 16: Please load edac_mce_amd module.#012: Success
In praise of Solaris's
I'm sure that at one point I was introduced to
pfiles through a
description that called it the Solaris version of
lsof for a
single process. This is true as far as it goes and I'm certain that
pfiles as nothing more than this for a long time, but it
pfiles can do for you. This is because
will give you a fair amount more information than
lsof will, and
much of that information is useful stuff to know.
pfiles will generally report what a file descriptor
maps to (file, device, network connection, and Solaris IPC 'doors',
often with information about what process is on the other end of
the door). Unlike on some systems, the
pfiles information is good
enough to let you track down who is on the other end of Unix domain
sockets and pipes. Sockets endpoints are usually reported directly;
pipe information generally takes cross-correlating with other
processes to see who else has an
S_IFIFO with the specific
(You would think that getting information on the destination of Unix domain sockets would be basic information, but on some systems it can take terrible hacks.)
Pfiles will also report some socket state information for sockets,
like the socket flags and the send and receive buffers. Personally
I don't find this deeply useful and I wish that
pfiles also showed
things like the TCP window and ACK state. Fortunately you can get
this protocol information with '
netstat -f inet -P tcp' or '
-v -f inet -P tcp' (if you want lots of details).
Going beyond this
pfiles will also report
open() flags for the file descriptor. This
will give you basic information like the FD's read/write status,
but it goes beyond this; for example, you can immediately see whether
or not a process has its sockets open in non-blocking mode (which
can be important). This is
often stuff that is not reported by other tools and having it handy
can save you from needing deep dives with DTrace, a debugger, or
the program source code.
(I'm sensitive to several of these issues because my recent Amanda
troubleshooting left me needing to chart out the flow of pipes and
to know whether some sockets were nonblocking or not. I could also
have done with information on TCP window sizes at the time, but I
didn't find the
netstat stuff until just now. That's how it goes
My collection of spam and the spread of SMTP TLS
One of the things that my sinkhole SMTP server does that's new on my workstation is that it supports TLS, unlike my old real mail server there (which dates from a very, very long time ago). This has given me the chance to see how much of my incoming spam is delivered with TLS, which in turn has sparked some thoughts about the spread of SMTP TLS.
The starting point is that a surprising amount of my incoming spam is actually delivered with TLS; right now about 30% of the successful deliveries have used TLS. This is somewhat more striking than it sounds for two reasons; first, the Go TLS code I'm relying on for TLS is incomplete (and thus not all TLS-capable sending MTAs can actually do TLS with it), and second a certain amount of the TLS connection attempts fail because the sending MTA is offering an invalid client certificate.
(I also see a fair number of rejected delivery attempts in my SMTP command log that did negotiate TLS, but the stats there are somewhat tangled and I'm not going to try to summarize them.)
While there are some persistent spammers, most of the incoming email is your typical advance fee fraud and phish spam that's send through various sorts of compromised places. Much of the TLS email I get is this boring sort of spam, somewhat to my surprise. My prejudice is that a fair amount of this spam comes from old and neglected machines, which are exactly the machines that I would expect are least likely to do TLS.
(Some amount of such spam comes from compromised accounts at places like universities, which can and do happen to even modern and well run MTAs. I'm not surprised when they use TLS.)
What this says to me is that support for initiating TLS is fairly widespread in MTAs, even relatively old MTAs, and fairly well used. This is good news (it's now clear that pervasive encryption of traffic on the Internet is a good thing, even casual opportunistic encryption). I suspect that it's happened because common MTAs have enabled client TLS by default and the reason they've been able to do that is that it basically takes no configuration and almost always works.
(It's clear that at least some client MTAs take note when
fails and don't try it again even if the server MTA offers it to
them, because I see exactly this pattern in my SMTP logs from some
PS: you might wonder if persistent spammers use TLS when delivering their spam. I haven't done a systematic measurement for various reasons but on anecdotal spot checks it appears that my collection of them basically doesn't use TLS. This is probably unsurprising since TLS does take some extra work and CPU. I suspect that spammers may start switching if TLS becomes something that spam filtering systems use as a trust signal, just as some of them have started advertising DKIM signatures.
I want my signed email to work a lot like SSH does
PGP and similar technologies have been in the news lately, and as a result of this I added the Enigmail extension to my testing Thunderbird instance. Dealing with PGP through Enigmail reminded me of why I'm not fond of PGP. I'm aware that people have all sorts of good reasons and that PGP itself has decent reasons for working the way it does, but for me the real strain point is not the interface but fundamentally how PGP wants me to work. Today I want to talk just about signed email, or rather however I want to deal with signed email.
To put it simply, I want people's keys for signed email to mostly work like SSH host keys. For most people the core of using SSH is not about specifically extending trust to specific, carefully validated host keys but instead about noticing if things change. In practical use you accept a host's SSH key the first time you're offered it and then SSH will scream loudly and violently if it ever changes. This is weaker than full verification but is far easier to use, and it complicates the job of an active attacker (especially one that wants to get away with it undetected). Similarly, in casual use of signed email I'm not going to bother carefully verifying keys; I'm instead going to trust that the key I fetched the first time for the Ubuntu or Red Hat or whatever security team is in fact their key. If I suddenly start getting alerts about a key mismatch, then I'm going to worry and start digging. A similar thing applies to personal correspondents; for the most part I'm going to passively acquire their keys from keyservers or other methods and, well, that's it.
(I'd also like this to extend to things like DKIM signatures of email, because frankly it would be really great if my email client noticed that this email is not DKIM-signed when all previous email from a given address had been.)
On the other hand, I don't know how much sense it makes to even think about general MUA interfaces for casual, opportunistic signed email. There is a part of me that thinks signed email is a sexy and easy application (which is why people keep doing it) that actually doesn't have much point most of the time. Humans do terribly at checking authentication, which is why we mostly delegate that to computers, yet casual signed email in MUAs is almost entirely human checked. Quick, are you going to notice that the email announcement of a new update from your vendor's security team is not signed? Are you going to even care if the update system itself insists on signed updates downloaded from secure mirrors?
(My answers are probably not and no, respectively.)
For all that it's nice to think about the problem (and to grumble about the annoyances of PGP), a part of me thinks that opportunistic signed email is not so much the wrong problem as an uninteresting problem that protects almost nothing that will ever be attacked.
(This also ties into the problem of false positives in security. The reality is that for casual message signatures, almost all missing or failed signatures are likely to have entirely innocent explanations. Or at least I think that this is the likely explanation today; perhaps mail gets attacked more often than I think on today's Internet.)
My current hassles with Firefox, Flash, and (HTML5) video
When I've written before about my extensions, I've said that I didn't bother with any sort of Flash blocking because NoScript handled that for me. The reality turns out to be that I was sort of living a charmed life, one that has recently stopped working the way I want it and forced me into a series of attempts at workarounds.
For a long time, things worked this way with just NoScript. Then at some point recently this broke down; if I relied only on NoScript, YouTube videos either would never play or would autoplay the moment the page loaded. If I turned on Firefox's 'ask to activate' for Flash, Firefox enabled and disabled things on a site-wide basis (so the second YouTube video I'd visit would autoplay). I wound up having to add two extensions to stop this:
is the classic Flash blocker. Unlike Firefox's native 'ask to
activate', it acts by default on a per-item basis, so activating
one YouTube video I watch doesn't auto-play all future ones I
look at. To make Flashblock work well I have disabled NoScript's
blocking of Flash content so that I rely entirely on Flashblock;
this has had the useful side effect of allowing me to turn on
Flash elements on various things besides YouTube.
stops YouTube autoplaying HTML5 videos. With both Flashblock and
StopTube active, YouTube winds up using Flash (which is blocked
and enabled by StopTube). I don't consider this ideal as I'd rather
use HTML5, but YouTube is what it is.
As the name of this addon sort of suggests, StopTube has the
drawback that it only stops HTML5 video on YouTube itself. HTML5
video elsewhere are not blocked by it, including YouTube videos
embedded on other people's pages. So far those embedded videos
aren't autoplaying for me, but they may in the future. That might
(An energetic person might be able to make such an addon starting from StopTube's source code.)
Some experimentation suggests that I might get back to what I want with just NoScript if I turn on NoScript's 'Apply these restrictions to whitelisted sites too' option for embedded content it blocks. But for now I like Flashblock's interface better (and I haven't been forced into this by being unable to block autoplaying HTML5 video).
There are still unfortunate aspects to this setup. One of them is that Firefox doesn't appear to have an 'ask to activate' (or more accurately 'ask to play') option for its HTML5 video support; this forces me to keep NoScript blocking that content instead of being able to use a nicer interface for enabling it if I want to. It honestly surprises me that Firefox doesn't already do this; it's an obviously feature and is only going to be more and more asked for as more people start using auto-playing HTML5 video for ads.
(See also this superuser.com question and its answers.)
What can go wrong with polling for writability on blocking sockets
Yesterday I wrote about how our performance problem with
amandad were caused by
amandad doing IO
multiplexing wrong by only polling for
whether it could read from its input file descriptors and assuming
it could always write to its network sockets. But let's ask a question:
amandad was also polling for writability on those network
sockets. Would it work fine?
The answer is no, not without even more code changes, because
amandad's network sockets aren't set to be non-blocking. The
problem here is what it really means when
poll() reports that
something is ready for write (or for that matter, for read).
Let me put it this way:
poll()says a file descriptor is ready for writes doesn't mean that you can write an arbitrary amount of data to it without blocking.
When I put it this way, of course it can't. Can I write a gigabyte
to a network socket or a pipe without blocking? Pretty much any
kernel is going to say 'hell no'. Network sockets and pipes can
never instantly absorb arbitrary amounts of data; there's always a
limit somewhere. What
poll()'s readiness indicator more or less
means is that you can now write some data without blocking. How
much data is uncertain.
The importance of non-blocking sockets is due to an API decision that Unix has made. Given that you can't write an arbitrary amount of data to a socket or a pipe without blocking, Unix has decided that by default when you write 'too much' you get blocked instead of getting a short write return (where you try to write N bytes and get told you wrote less than that). In order to not get blocked if you try a too large write you must explicitly set your file descriptor to non-blocking mode; at this point you will either get a short write or just an error (if you're trying to write and there is no room at all).
(This is a sensible API decision for reasons beyond the scope of this entry. And yes, it's not symmetric with reading from sockets and pipes.)
amandad just polled for writability but changed nothing
else in its behavior, it would almost certainly still wind up
blocking on writes to network sockets as it tried to stuff too
much down them. At most it would wind up blocked somewhat less
often because it would at least send some data immediately every
time it tried to write to the network.
(The pernicious side of this particular bug is whether it bites you in any visible way depends on how much network IO you try to do how fast. If you send to the network (or to pipes) at a sufficiently slow rate, perhaps because your source of data is slow, you won't stall visibly on writes because there's always the capacity for how much data you're sending. Only when your send rates start overwhelming the receiver will you actively block in writes.)
Sidebar: The value of serendipity (even if I was wrong)
Yesterday I mentioned that my realization about the core cause of
amandad problem was sparked by remembering an apparently
unrelated thing. As it happens, it was my memory of reading Rusty
Russell's POLLOUT doesn't mean write(2) won't block: Part II that started me on this whole
chain. A few rusty neurons woke up and said 'wait,
write() waits? I was reading about that...' and off I
went, even if my initial idea turned out to be wrong about the
Had I not been reading Rusty Russell's blog I probably would have
missed noticing the anomaly and as a result wasted a bunch of time
at some point trying to figure out what the core problem was.
write() issue is clearly in the air because Ewen McNeill also
pointed it out in a comment on yesterday's entry. This is a good thing; the odd write
behavior deserves to be better known so that it doesn't bite people.
How not to do IO multiplexing, as illustrated by Amanda
Every so often I have a belated slow-motion realization about what's probably wrong with an otherwise mysterious problem. Sometimes this is even sparked by remembering an apparently unrelated thing I read in passing. As it happens, that happened the other day.
Let's rewind to this entry, where
I wrote about what I'd discovered while looking into our slow
Amanda backups. Specifically
this was a dump of
amandad handling multiple streams of backups
at once, which we determined is the source of our slowness. In that entry I wrote in passing:
[Amandad is] also spending much more of its IO wait time writing the data rather than waiting for there to be more input, although the picture here is misleading because it's also making
pollsys()calls and I wasn't tracking the time spent waiting in those [...]
This should have set off big alarm bells. If
amandad is using
poll(), why is it spending any appreciable amount of time waiting
for writes to complete? After all the whole purpose of
et al is to only be woken when you can do work, so you should spend
minimal time blocked in the actual IO functions. The unfortunate
answer is that Amanda is doing IO multiplexing wrong, in that I
believe it's only using
poll() to check for readability on its
input FDs, not writability on its output FDs. Instead it tacitly
assumes that whenever it has data to read it can immediately write
all of this data out with no fuss, muss, or delays.
(Just checking for writability on the network connections wouldn't be quite enough, of course, but that's another entry.)
The problem is that this doesn't necessarily work. You can easily have situations where one TCP stream will accept much more data than another one, or where all (or just most) TCP streams will accept only a modest amount of data at a time; this is especially likely when the TCP streams are connected to different processes on the remote end. If one remote process stalls its TCP stream can stop accepting much more data, at which point a large write to the stream may stall in turn, which stalls all amandad activity even if it could succeed, which stalls upstream activity that is trying to send data to amandad. What amandad's handling of multiplexing does is to put all of the data streams flowing through it at the mercy of whatever is the slowest write stream at any given point in time. If anything blocks everything can block, and sooner or later we seem to wind up in a situation where anything can and does block. The result is a stuttering stop-start process full of stalls that reduces the overall data flow significantly.
In short, what you don't want in good IO multiplexing is a situation
where IO on stream A has to wait because you're stalled doing IO on
stream B, and that is just what
amandad has arranged here.
Correct multiplexing is complex (even in the case of a single flow) but the core of it is never overcommitting yourself. What amandad should be doing is only writing as much output at a time as any particular connection can take, buffering some data internally, and passing any back pressure up to the things feeding it data (by stopping reading its inputs when it cannot write output and it has too much buffered). This would ensure that the multiplexing does no harm in that it can always proceed if any of the streams can proceed.
(Splitting apart the multiplexing into separate processes (or threads) does the same thing; because each one is only handling a single stream, that stream blocking doesn't block any other stream.)
PS: good IO multiplexing also needs to be fair, ie if if multiple
streams or file descriptors can all do IO at the current time then all
of them do some IO, instead of a constant stream of ready IO on one
stream starving IO for other streams. Generally the easiest way to do
this is to have your code always process all ready file descriptors
before returning to
poll(). This is also the more CPU-efficient way to
The cause of our slow Amanda backups and our workaround
A while back I wrote about the challenges in diagnosing slow (Amanda) backups. It's time for a followup entry on that, because we found what I can call 'the problem' and along with it a workaround. To start with, I need to talk about how we had configured our Amanda clients.
In order to back up our fileservers
in a sensible amount of time, we run multiple backups on each them
at once. We don't really try to do anything sophisticated to balance
the load across multiple disks both because this is hard in our
environment (especially given limited Amanda features) and because
we've never seen much evidence that reducing overlaps was useful
in speeding things up; instead we just have Amanda run three backups
at once on each fileserver ('
maxdumps 3' in Amanda configuration).
For historical reasons we were also using Amanda's '
auth bsd' style
of authentication and communication.
As I kind of mentioned in passing in my entry on Amanda data flows, '
auth bsd' communication causes all
concurrent backup activity to flow through a single master
process. It turned out that this was our bottleneck. When we had a
amandad process handling sending all backups back to the
Amanda server and it was running more than one filesystem backup
at a time, things slowed down drastically and we experienced our
problem. When an
amandad process was only handling a single backup,
things went fine.
We tested and demonstrated this in two ways. The first was we dropped
one fileserver down to one dump at a time and then it ran fine. The
more convincing test was to use
SIGCONT to pause and then resume backups
on the fly on a server running multiple backups at once. This
demonstrated that network bandwidth usage jumped drastically when
we paused two out of the three backups and tanked almost immediately
when we allowed more than one to run at once. It was very dramatic.
Further work with a DTrace script
provided convincing evidence that it was the
itself that was the locus of the problem and it wasn't that, eg,
tar reads slowed down drastically if more than one
running at once.
Our workaround was to switch to Amanda's '
auth bsdtcp' style of
communication. Although I initially misunderstood what it does, it
turns out that this causes each concurrent backup to use a separate
amandad process and this made everything work fine for us;
performance is now up to the level where we're saturating the
backup server disks instead of the network.
Well, mostly. It turns out that our first-generation ZFS fileservers probably also have the slow backup
problem. Unfortunately they're running a much older Amanda version
and I'm not sure we'll try to switch them to '
auth bsdtcp' since
they're on the way out anyways.
I call this a workaround instead of a solution because in theory a
amandad process handling all backup streams shouldn't
be a problem. It clearly is in our environment for some reason, so it
sort of would be better to understand why and if it can be fixed.
(As it happens I have a theory for why this is happening, but it's
long enough and technical enough that it needs another entry. The short version is that
I think the
amandad code is doing something wrong with its socket
Does init actually need to do daemon supervision?
Sure, init has historically done some sort of daemon supervision (or at least starting and stopping them) and I listed it as one of init's jobs. But does it actually need to do this? This is really two questions and thus two answers.
Init itself, PID 1, clearly does not have to be the process that does daemon supervision. We have a clear proof of this in Solaris, where SMF moves daemon supervision to a separate set of processes. SMF is not a good init system but its failures are failures of execution, not of its fundamental design; it does work, it's just annoying.
Whether the init system as a whole needs to do daemon supervision is a much more philosophical question and thus harder to answer. However I believe that on the whole the init system is the right place for this. The pragmatics of why are simple: the init system is responsible for booting and shutting down the system and doing this almost always needs at least some daemons to be started or stopped in addition to more scripted steps like filesystem checks. This means that part of daemon supervision is at least quite tightly entwined with booting, what I called infrastructure daemons when I talked about init's jobs. And since your init system must handle infrastructure daemons it might as well handle all daemons.
(In theory you could define an API for communication between the init system and a separate daemon supervision system in order to handle this. In practice, until this API is generally adopted your init system is tightly coupled with whatever starts and stops infrastructure daemons for it, ie you won't be able to swap one infrastructure daemon supervision system for another and whichever one your init system needs might as well be considered part of the init system itself.)
I feel that the pragmatic argument is also the core of a more philosophical one. There is no clear break between infrastructure daemons and service daemons (and in fact what category a daemon falls into can vary from system to system), which makes it artificial to have two separate daemon supervision systems. If you want to split the job of an init system apart at all, the 'right' split is between the minimal job of PID 1 and the twin jobs of booting the system and supervising daemons.
(This whole thing was inspired by an earlier entry being linked to by this slashdot comment, and then a reply to said comment arguing that the role of init is separate from a daemon manager. As you can see, I don't believe that it is on Unix in practice.)
Sidebar: PID 1 and booting the system
This deserves its own entry to follow all of the threads, but the simple version for now: in a Unix system with (only) standard APIs, the only way to guarantee that a process winds up as PID 1 is for the kernel to start it as such. The easiest way to arrange for this is for said process to be the first process started so that PID 1 is the first unused PID. This naturally leads into PID 1 being responsible for booting the system, because if it wasn't the kernel would have to also start another process to do this (and there would have to be a decision about what the process is called and so on).
This story is increasingly false in modern Unix environments which do various amounts of magic setup before starting the final real init, but there you have it.
What an init system needs to do in the abstract
I've talked before about what
init does historically, but that's not the same thing as what an
init system actually needs to do, considered abstractly and divorced
from the historical paths that got us here and still influence how
we think about init systems. So, what does a modern init system in
a modern Unix need to do?
At the abstract level, I think a modern init system has three jobs:
- Being the central process on the system. This is both the modest
job of being PID 1 (inheriting parentless processes and reaping
them when they die) and the larger, more important job of supervising
and (re)starting any other components of the init system.
- Starting and stopping the system, and also transitioning it
between system states like single user and multiuser. The second
job has diminished in importance over the years; in practice most
systems today almost never transition between runlevels or the
equivalent except to boot or reboot.
(At one point people tried to draw a runlevel distinction between 'multiuser without networking' and 'multiuser with networking' and maybe 'console text logins' and 'graphical logins with X running' but today those distinctions are mostly created by stopping and starting daemons, perhaps abstracted through high level labels for collections of daemons.)
- Supervising (daemon) processes to start, stop, and restart them on
demand or need or whatever. This was once a sideline but has
become the major practical activity of an init system and why
people spend most of the time interacting with it. Today this
encompasses both regular
gettyprocesses (which die and restart regularly) and a whole collection of daemons (which are often not expected to die and may not be restarted automatically if they do).
You can split this job into two sorts of daemons, infrastructure processes that must be started in order for the core system to operate (and for other daemons to run sensibly) and service processes that ultimately just provide services to people using the machine. Service processes are often simpler to start, restart, and manage than infrastructure processes.
In practice modern Unixes often add a fourth job, that of managing the appearance and disappearance of devices. This job is not strictly part of init but it is inextricably intertwined with at least booting the system (and sometimes shutting it down) and in a dependency-based init system it will often strongly influence what jobs/processes can be started or must be stopped at any given time (eg you start network configuration when the network device appears, you start filesystem mounts when devices appear, and so on).
The first job mostly or entirely requires being PID 1; at a minimum your PID 1 has to inherit and reap orphans. Since stopping and starting daemons and processes in general is a large part of booting and rebooting, the second and third jobs are closely intertwined in practice although you could in theory split them apart and that might simplify each side. The fourth job is historically managed by separate tools but often talks with the init system as a whole because it's a core dependency of the second and third jobs.
(Booting and rebooting is often two conceptually separate steps in that first you check filesystems and do other initial system setup then you start a whole bunch of daemons (and in shutdown you stop a bunch of daemons and then tear down core OS bits). If you do this split, you might want to transfer responsibility for infrastructure daemons to the second job.)
The Unix world has multiple existence proofs that all of these roles do not have to be embedded in a single PID 1 process and program. In particular there is a long history of (better) daemon supervision tools that people can and do use as replacements for their native init system's tools for this (often just for service daemons), and as I've mentioned Solaris's SMF splits the second and third role out into a cascade of additional programs.