Wandering Thoughts archives

2014-02-14

'Broken by design: systemd' is itself kind of broken

Recently, an article by Rich Felker called Broken by design: systemd has been making the rounds. While I am sympathetic with complaints about systemd, the problem is that this article is both more or less deliberately misleading and factually wrong in various of its sections. Normally I would pass over this (per the lesson of the famous xkcd strip), but not today for various reasons. I'll be quoting from the article to comment on specific issues I have with it.

(To hopefully avoid possible misunderstandings, I've written up my overall views of systemd and put them in a sidebar at the bottom of this entry.)

Felker more or less opens with:

My view is that this idea is wrong: systemd is broken by design, and despite offering highly enticing improvements over legacy init systems, it also brings major regressions in terms of many of the areas Linux is expected to excel: security, stability, and not having to reboot to upgrade your system.

To start with, when Felker talks about 'broken by design' and 'major regressions' he means both in a theoretical or philosophical sense; in other words he objects to how systemd is designed and feels that it is a bad idea. He does not point out anything that systemd fails at, can't do, or does wrong today in actual use. In practice systems running systemd have not been less secure or less stable and do not have to reboot to upgrade any more (or less) than non-systemd Linux systems do.

(Desktop Linux systems have increasingly been wanting to reboot after upgrades but this is driven by factors independent from systemd.)

On a hardened system without systemd, you have at most one root-privileged process with any exposed surface: sshd. Everything else is either running as unprivileged users or does not have any channel for providing it input except local input from root. Using systemd then more than doubles the attack surface.

Unfortunately this is false on a modern Linux system unless part of Felker's hardening involves disabling DBus and then fixing everything that stops working as a result of that. Any Linux system using DBus has a DBus daemon running as root, whether it is part of systemd or not, and that is a significant and user-accessible exposed surface (although only to local users). It may also expose DBus APIs for other root processes such as udev stuff.

(My understanding is that DBus has become essentially mandatory because udev wants to talk to it to broadcast hotplug events. Udev itself is deeply entwined in the modern Linux boot process to the point where removing it is less 'hardening your system' and more 'creating a new Linux distribution'.)

Update: I'm less and less confident of my understanding of how udev and DBus are linked to each other and how DBus runs. I may be wrong here about how necessary DBus is for udev and the security implications of DBus; this would mean that I'm wrong here and systemd offering DBus services is a real new exposure.

This increased and unreasonable risk is not inherent to systemd's goal of fixing legacy init. However it is inherent to the systemd design philosophy of putting everything into the init process.

I disagree with this view because I feel that a great deal of the increased attack surface systemd exposes is inherent in a number of core design decisions. Systemd is an active supervising init, so you must be able to somehow tell it to manipulate services (and load information about new ones). It holds service state in memory instead of trying to write status files on disk and keep them in sync; this implies you need a way of querying that service state. Systemd has further decided that unprivileged users can query that state, which means that unprivileged users can talk to it in general.

While systemd uses DBus for most or all of this I think that there is a serious argument that it is better to use a general core facility that a lot of people are paying a lot of attention to rather than reinvent the wheel on your own. A lot of people are worrying about the security and integrity of DBus and DBus libraries, many more than would be worrying about a systemd-specific protocol and set of message encoding and decoding code.

Unfortunately, by moving large amounts of functionality that's likely to need to be upgraded into PID 1, systemd makes it impossible to upgrade without rebooting. [...]

As Felker later admits, this is somewhere between 'factually incorrect' and 'aggressively misleading'. Systemd can and does serialize its state and re-exec itself during upgrades, and in practice this works reliably. My machines have upgraded systemd repeatedly without any kernel reboots involved (and this includes upgrades as drastic as Fedora version upgrades, eg from Fedora 19 to Fedora 20; yes I rebooted afterwards, but systemd was upgraded before then).

Yes, there are theoretical failure modes of this (as Felker agonizes about). I have a number of views on this but the simple version is that this problem exists in any other init system (most of which have been re-execing themselves on upgrades for years) and for any number of important system daemons as well as init. For example, if sshd fails to restart during an upgrade many servers are just as screwed as if init dies.

Felker also raises the issue of compatibility problems with the serialized state between an old and a new version. If it happened, this would be a distribution bug; when a distribution ships any upgrade it's that distribution's responsibility to make sure that the upgrade is compatible and won't make an upgraded system explode. Distributions have failed at this without systemd, but this is not a failure of what they are packaging, it is a failure of the distribution and their processes.

  • Many of the selling-point features of systemd are server-oriented. State-of-the-art transaction-style handling of daemon starting and stopping is not a feature that's useful on desktop systems. The intended audience for that sort of thing is clearly servers.

If you read the systemd design documents, this is clearly incorrect. One of systemd's explicit goals is to not start daemons on desktop systems until they're needed, especially heavyweight daemons like CUPS. If anything this is a drawback on servers, where people like me want to know right away on reboot if something is not going to work a day from now when someone tries to use it for the first time.

(Systemd's fast boot time due to starting services in parallel and various other tricks is also primarily a desktop advantage in my opinion, with perhaps a sideline in cloud virtual instances. Physical servers reboot infrequently and their boot is often drastically slowed down by the firmware's burning need to lovingly fondle ever bit of hardware in sight. Not that I'm grumpy about it or anything.)

  • The desktop is quickly becoming irrelevant. The future platform is going to be mobile and is going to be dealing with the reality of running untrusted applications. While the desktop made the unix distinction of local user accounts largely irrelevant, the coming of mobile app ecosystems full of potentially-malicious apps makes "local security" more important than ever.

The systemd developers disagree about the future irrelevance of the desktop, as do I. Beyond that, systemd has a significant amount of support for running services and other things in confined environments via use of Linux cgroups, something that is highly useful on both servers (for running daemons in lesser-privileged environments or with strong resource limits) and on desktops and other user machines for exactly this sort of untrusted applications.

None of the things systemd "does right" are at all revolutionary. They've been done many times before. DJB's daemontools, runit, and Supervisor, among others, have solved the "legacy init is broken" problem over and over again (though each with some of their own flaws). Their failure to displace legacy sysvinit in major distributions had nothing to do with whether they solved the problem, and everything to do with marketing. [...]

I disagree with this at sufficient length that I wrote an entire entry on why systemd is winning the init wars and other things aren't. The short version is that only Upstart has even been trying to do so.

If none of [of the alternate init systems] are ready for prime time, then the folks eager to replace legacy init in their favorite distributions need to step up and either polish one of the existing solutions or write a better implementation based on the same principles. Either of these options would be a lot less work than fixing what's wrong with systemd.

The final sentence is demonstrably false. Systemd works today on a great number of machines and the alternate init systems do not. Making the alternative init systems work would be a significant amount of effort, especially if you do as Felker advocates and completely replace the current init code to shove most of what init historically has done off to new programs. What might take 'a lot less work' for alternate init systems than systemd is changing them to fit Felker's vision of how init should work, a vision that is not how things work today even in System V init.

Felker does not make it clear if he thinks that legacy init even needs to be replaced (and there is certainly a contingent of people who feel that it doesn't need to be). I feel that System V init has a number of significant issues, issues that really do make a difference when managing systems. Other people seem to share this view given that major Linux distributions have moved to adopt other init systems (first with Upstart in Ubuntu, Fedora, and RHEL, and now with a move to systemd). And going outside of Linux, Solaris's SMF is the granddaddy of drastic modern init overhauls. Clearly this is an idea that has resonated with a lot of technical people over time.

(And as Felker forthrightly says, systemd offers 'highly enticing improvements over legacy init systems'.)

Sidebar: Smaller issues in Felker's article

Among the reasons systemd wants/needs to run as PID 1 is getting parenthood of badly-behaved daemons that orphan themselves, preventing their immediate parent from knowing their PID to signal or wait on them.

This is not the case. Systemd runs parts of itself as PID 1 because that is what an init system does. Systemd actually handles badly behaved daemon processes not through noticing when they are reparented to PID 1 but through Linux cgroups, which provide accurate tracking of what service a process belongs to.

In general inheriting the parentage of badly behaved daemon processes is useless for an init system because in standard Unix the init system has no way of figuring out what (abstract) service a random process it has just inherited is associated with or otherwise where it came from. In short, inits inherit random daemon processes only because they inherit all random processes.

(Why does PID 1 inherit orphan processes as opposed to something else happening to them? The ultimate answer is 'because that's how Unix works'.)

[...] While legacy init systems basically deal with no inputs except SIGCHLD from orphaned processes exiting and manual runlevel changes performed by the administrator, [...]

This is the case much of the time on modern servers but is not historically the case. One of init's major roles over time has been handling getty processes for the console and for serial connections, a role which involves a fair amount of complexity (for instance, most inits have had rate-limiting so that a broken getty or line wouldn't eat the system). And runlevel changes are actually a subset of the more general init-managed facilities exposed in /etc/inittab in System V init.

With that said, it's completely true that systemd deals with a lot more input sources than traditional System V init. Some of this is intrinsic in being an active supervision-based init system instead of a passive one like System V init, as an active init system must have some way of telling to manipulate services.

Sidebar: My overall views of systemd

I want to summarize my view of systemd to avoid misunderstandings. First, I feel that systemd is currently the best Linux init system from a sysadmin's perspective for reasons that I mostly covered in an earlier entry on things that systemd gets right. Second, I don't think that systemd is the ultimate init system (especially the ultimate Unixy init system). Instead I see it as part of Unix's necessary experimentation and growth. System V init is not flawless and systemd is one of a number of attempts to move the state of the art in init systems forward. We'll collectively learn from this over time and either improve systemd or come up with better solutions and replace it.

linux/SystemdAndBrokenByDesign written at 22:10:55; Add Comment

The good and bad of the System V init system

The good of System V init is that it gave us several big improvements over what came before in V7 and BSD Unix. First and largest, it modularized the boot process; instead of a monolithic shell script (or two, if you counted /etc/rc.local) you had a collection of little ones, one for each separate service. This alone is a massive win and enabled all sorts of things that we take for granted today (for example, casually stopping or starting a service).

The other big change is that System V init turned the entire work of init from a collection of hacks into a systematic and generalized thing. It formally defined runlevels and runlevel transitions and created in /etc/inittab a general mechanism for specifying all of the work init did, from booting to running gettys on serial lines (or running anything) to how to reboot the system. System V init removed the magic and hardcoding in favour of transparency. Things like reboot stopped killing processes and making special system calls and turned into 'tell init to go into runlevel ...', and then /etc/inittab and runlevel transitions said what to do so that this actually rebooted the machine. In the process it added a way to specify how services shut down.

(Simply defining runlevels formally meant that other systems could now tell what state the system was in and behave differently between eg single user mode and multiuser mode.)

The very general and high level view of the bad of the System V init system is that fundamentally all it does is blindly run shell scripts (and that only when the runlevel changes). This creates all sorts of lower-level consequences:

  • SysV init doesn't know what services are even theoretically running right now, much less which ones of them might have failed since they were started.

  • It doesn't know what processes are associated with what services. Even individual init scripts don't know this reliably, especially for modern multi-process services.

  • Even init scripts themselves can't be certain what the state of their service is. They must resort to ad hoc approaches like PID files, flag files for 'did someone run <script> start at some time this boot', checking process listings, and so on. These can misfire.

  • Services are restarted in a different environment than how they are started on boot. Often contamination leaks in to a restarted service (in the form of stray environment variables and other things).

  • Output from services being started is not logged or captured in any systematic way. Many init scripts simply throw it away and there's certainly no official proper place to put it.

  • The ordering of service starts is entirely linear, by explicit specification and guarantee. System V init explicitly says 'I start things in the following order'. There is no parallelism.

  • Services are only started and stopped when the runlevel changes. There is no support for starting services on demand, on events, or when their prerequisites become ready (or stopping them when a prerequisite is being shut down).

  • System V init has no idea of dependencies and thus no way for services to declare 'if X is restarted I need to be restarted too' or 'don't start me until X declares itself ready'.

  • There is no provision for restarting services on failure. Technically you can give your service a direct /etc/inittab entry (if it doesn't background itself) but then you move it outside of what people consider 'the init system' and lose everything associated with a regular init script.

  • Since init scripts are shell scripts, they're essentially impossible for programs to analyse to determine various things about them.

  • It's both hard and system-dependent to write a completely correct init script (and many init scripts are mostly boilerplate). As a result it's common for init scripts to not be completely correct.

  • Init scripts are not lightweight things in general, either in reading them to understand them or in executing them to do things.

In theory you can try to fix many of these issues by adding workarounds in your standard init script functionality. Your 'standard' init script utilities would capture all daemon output in a documented place and way, start everything in cgroups (on Linux) or containers to track processes reliably, have support for restarting services on failure, carefully scrub every last bit of the environment on restarts, monitor things even after start, et cetera et cetera, and then you would insist that absolutely every init script use your utilities and only your utilities. In practice nothing like this has ever worked in practice (people always show up with init scripts that have bugs, take shortcuts, or do not even try to use your complex 'standard' init utilities) and the result would not particularly be a 'System V init system' except in a fairly loose sense.

(It would also make each init script do even more work and run even more slowly than they do now.)

unix/SystemVInitGoodBad written at 02:41:30; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.