2014-09-19
What I mean by passive versus active init systems
I have in the past talked about passive versus active init systems without quite defining what I meant by that, except sort of through context. Since this is a significant division between init systems that dictates a lot of other things, I've decided to fix that today.
Put simply, an active init system is one that actively tracks the status of services as part of its intrinsic features; a passive init system is one that does not. The minimum behavior of an active init system is that it knows what services have been activated and not later deactivated. Better active init systems know whether services are theoretically still active or if they've failed on their own.
(Systemd, upstart, and Solaris's SMF are all active init systems.
In general any 'event-based' init system that starts services in
response to events will need to be active, because it needs to know
which services have already been started and which ones haven't and
thus are candidates for starting now. System V init's /etc/init.d
scripts are a passive init system, although /etc/inittab is an
active one. Most modern daemon supervision systems are active
systems.)
One direct consequence is that an active init system essentially
has to do all service starting and stopping itself, because this
is what lets it maintain an accurate record of what services are
active. You may run commands to do this, but they have to talk to
the init system itself. By contrast, in a passive init system the
commands you run to start and stop services can be and often are
just shell scripts; this is the archetype of System V init.d
scripts. You can even legitimately start and stop services outside
of the scripts at all, although things may get a bit confused.
(In the *BSDs things can be even simpler in that you don't have scripts and you may just run the daemons. I know that OpenBSD tends to work this way but I'm not sure if FreeBSD restarts stuff quite that directly.)
An active init system is also usually more communicative with the
outside world. Since it knows the state of services it's common for
the init system to have a way to report this status to people who
ask, and of course it has to have some way of being told either to
start and stop services or at least that particular services have
started and stopped. Passive init systems are much less talkative;
System V init basically has 'change runlevel' and 'reread /etc/inittab'
and that's about it as far its communication goes (and it doesn't
even directly tell you what the runlevel is; that's written to a
file that you read).
Once you start down the road to an active init system, in practice you wind up wanting some way to track daemon processes so you can know if a service has died. Without this an active init system is basically flying blind in that it knows what theoretically started okay but it doesn't necessarily know what's still running. This can be done by requiring cooperative processes that don't do things like detach themselves from their parents or it can be done with various system specific Unix extensions to track groups of processes even if they try to wander off on their own.
As we can see from this, active init systems are more complicated than passive ones. Generally the more useful features they offer and the more general they are the more complicated they will be. A passive init system can be done with shell scripts; an attractive active one requires some reasonably sophisticated C programming.
PS: An active init system that notices when services die can offer a feature where it will restart them for you. In practice most active init systems aren't set up to do this for most services for various reasons (that may or may not be good ones).
(This entry was partly sparked by reading parts of this mail
thread that
showed up in my Referer logs because it linked to some of my other
entries.)
2014-09-10
Does init actually need to do daemon supervision?
Sure, init has historically done some sort of daemon supervision (or at least starting and stopping them) and I listed it as one of init's jobs. But does it actually need to do this? This is really two questions and thus two answers.
Init itself, PID 1, clearly does not have to be the process that does daemon supervision. We have a clear proof of this in Solaris, where SMF moves daemon supervision to a separate set of processes. SMF is not a good init system but its failures are failures of execution, not of its fundamental design; it does work, it's just annoying.
Whether the init system as a whole needs to do daemon supervision is a much more philosophical question and thus harder to answer. However I believe that on the whole the init system is the right place for this. The pragmatics of why are simple: the init system is responsible for booting and shutting down the system and doing this almost always needs at least some daemons to be started or stopped in addition to more scripted steps like filesystem checks. This means that part of daemon supervision is at least quite tightly entwined with booting, what I called infrastructure daemons when I talked about init's jobs. And since your init system must handle infrastructure daemons it might as well handle all daemons.
(In theory you could define an API for communication between the init system and a separate daemon supervision system in order to handle this. In practice, until this API is generally adopted your init system is tightly coupled with whatever starts and stops infrastructure daemons for it, ie you won't be able to swap one infrastructure daemon supervision system for another and whichever one your init system needs might as well be considered part of the init system itself.)
I feel that the pragmatic argument is also the core of a more philosophical one. There is no clear break between infrastructure daemons and service daemons (and in fact what category a daemon falls into can vary from system to system), which makes it artificial to have two separate daemon supervision systems. If you want to split the job of an init system apart at all, the 'right' split is between the minimal job of PID 1 and the twin jobs of booting the system and supervising daemons.
(This whole thing was inspired by an earlier entry being linked to by this slashdot comment, and then a reply to said comment arguing that the role of init is separate from a daemon manager. As you can see, I don't believe that it is on Unix in practice.)
Sidebar: PID 1 and booting the system
This deserves its own entry to follow all of the threads, but the simple version for now: in a Unix system with (only) standard APIs, the only way to guarantee that a process winds up as PID 1 is for the kernel to start it as such. The easiest way to arrange for this is for said process to be the first process started so that PID 1 is the first unused PID. This naturally leads into PID 1 being responsible for booting the system, because if it wasn't the kernel would have to also start another process to do this (and there would have to be a decision about what the process is called and so on).
This story is increasingly false in modern Unix environments which do various amounts of magic setup before starting the final real init, but there you have it.
2014-09-08
What an init system needs to do in the abstract
I've talked before about what init does historically, but that's not the same thing as what an
init system actually needs to do, considered abstractly and divorced
from the historical paths that got us here and still influence how
we think about init systems. So, what does a modern init system in
a modern Unix need to do?
At the abstract level, I think a modern init system has three jobs:
- Being the central process on the system. This is both the modest
job of being PID 1 (inheriting parentless processes and reaping
them when they die) and the larger, more important job of supervising
and (re)starting any other components of the init system.
- Starting and stopping the system, and also transitioning it
between system states like single user and multiuser. The second
job has diminished in importance over the years; in practice most
systems today almost never transition between runlevels or the
equivalent except to boot or reboot.
(At one point people tried to draw a runlevel distinction between 'multiuser without networking' and 'multiuser with networking' and maybe 'console text logins' and 'graphical logins with X running' but today those distinctions are mostly created by stopping and starting daemons, perhaps abstracted through high level labels for collections of daemons.)
- Supervising (daemon) processes to start, stop, and restart them on
demand or need or whatever. This was once a sideline but has
become the major practical activity of an init system and why
people spend most of the time interacting with it. Today this
encompasses both regular
gettyprocesses (which die and restart regularly) and a whole collection of daemons (which are often not expected to die and may not be restarted automatically if they do).You can split this job into two sorts of daemons, infrastructure processes that must be started in order for the core system to operate (and for other daemons to run sensibly) and service processes that ultimately just provide services to people using the machine. Service processes are often simpler to start, restart, and manage than infrastructure processes.
In practice modern Unixes often add a fourth job, that of managing the appearance and disappearance of devices. This job is not strictly part of init but it is inextricably intertwined with at least booting the system (and sometimes shutting it down) and in a dependency-based init system it will often strongly influence what jobs/processes can be started or must be stopped at any given time (eg you start network configuration when the network device appears, you start filesystem mounts when devices appear, and so on).
The first job mostly or entirely requires being PID 1; at a minimum your PID 1 has to inherit and reap orphans. Since stopping and starting daemons and processes in general is a large part of booting and rebooting, the second and third jobs are closely intertwined in practice although you could in theory split them apart and that might simplify each side. The fourth job is historically managed by separate tools but often talks with the init system as a whole because it's a core dependency of the second and third jobs.
(Booting and rebooting is often two conceptually separate steps in that first you check filesystems and do other initial system setup then you start a whole bunch of daemons (and in shutdown you stop a bunch of daemons and then tear down core OS bits). If you do this split, you might want to transfer responsibility for infrastructure daemons to the second job.)
The Unix world has multiple existence proofs that all of these roles do not have to be embedded in a single PID 1 process and program. In particular there is a long history of (better) daemon supervision tools that people can and do use as replacements for their native init system's tools for this (often just for service daemons), and as I've mentioned Solaris's SMF splits the second and third role out into a cascade of additional programs.
2014-09-05
Some uses for SIGSTOP and some cautions
If you ask, many people will tell you that Unix doesn't have a
general mechanism for suspending processes and later resuming them.
These people are correct in general, but sometimes you can cheat
and get away with a good enough substitute. That substitute is
SIGSTOP, which is at the core of job control.
Although processes can catch and react to other job control signals, SIGSTOP is a non-blockable signal like
SIGKILL (aka 'kill -9'). When a process is sent it, the kernel
stops the process on the spot and suspends it until the process
gets a SIGCONT (more or less). You can thus pause processes and
continue them by manually sending them SIGSTOP and SIGCONT as
appropriate and desired.
(Since it's a regular signal, you can use a number of standard
mechanisms to send SIGSTOP to an entire process group or all of
a user's processes at once.)
There are any number of uses for this. Do you have too many processes banging away on the disk (or just think you might)? You can stop some of them for a while. Is a process saturating your limited network bandwidth? Pause it while you get a word in edgewise. And so on. Basically this is more or less job control for relatively arbitrary user processes, as you might expect.
Unfortunately there are some cautions and limitations attached to
use of SIGSTOP on arbitrary processes. The first one is
straightforward: if you SIGSTOP something that is talking to the
network or to other processes, its connections may break if you
leave it stopped too long. The other processes don't magically know
that the first process has been suspended and so they should let
it be, and many of them will have limits on how much data they'll
queue up or how long they'll wait for responses and the like. Hit
the limits and they'll assume something has gone wrong and cut your
suspended process off.
(The good news is that it will be application processes that do
this, and only if they go out of their way to have timeouts and
other limits. The kernel is perfectly happy to leave things be for
however long you want to wait before a SIGCONT.)
The other issue is that some processes will detect and react to one
of their children being hit with a SIGSTOP. They may SIGCONT
the child or they may kill the process outright; in either case
it's probably not what you wanted to happen. Generally you're safest
when the parent of the process you want to pause is something simple,
like a shell script. In particular, init (PID 1) is historically
somewhat touchy about SIGSTOP'd processes and may often either
SIGCONT them or kill them rather than leave them be. This is
especially likely if init inherits a SIGSTOP'd process because
its original parent process died.
(This is actually relatively sensible behavior to avoid init
having a slowly growing flock of orphaned SIGSTOP'd processes
hanging around.)
These issues, especially the second, are why I say that SIGSTOP
is not a general mechanism for suspending processes. It's a mechanism
and on one level it always works, but the problem is the potential
side effects and aftereffects. You can't just SIGSTOP an arbitrary
process and be confident that it will still be there to be continued
ten minutes later (much less over longer time intervals). Sometimes
or often you'll get away with it but every so often you won't.