The bad side of systemd: two recent systemd failures
In the past I've written a number of favorable entries about systemd.
In the interests of balance, among other things, I now feel that I
should rake it over the coals for today's bad experiences that I
ran into in the course of trying to do a
yum upgrade of one system
from Fedora 20 to Fedora 21, which did not go well.
The first and worst failure is that I've consistently had systemd's master process (ie, PID 1, the true init) segfault during the upgrade process on this particular machine. I can say it's a consistent thing because this is a virtual machine and I snapshotted the disk image before starting the upgrade; I've rolled it back and retried the upgrade with variations several times and it's always segfaulted. This issue is apparently Fedora bug #1167044 (and I know of at least one other person it's happened to). Needless to say this has put somewhat of a cramp in my plans to upgrade my office and home machines to Fedora 21.
(Note that this is a real segfault and not an assertion failure. In fact this looks like a fairly bad code bug somewhere, with some form of memory scrambling involved.)
The slightly good news is that PID 1 segfaulting does not reboot
the machine on the spot. I'm not sure if PID 1 is completely stopped
afterwards or if it's just badly damaged, but the bad news is that
a remarkably large number of things stop working after this happens.
Everything trying to talk to systemd fails and usually times out
after a long wait, for example attempts to do '
from postinstall scripts. Attempts to log in or to
su to root from
an existing login either fail or hang. A plain
reboot will try to
talk to systemd and thus fails, although you can force a reboot in
various ways (including '
The merely bad experience is that as a result of this I had occasion
journalctl (I normally don't). More specifically, I had
occasion to use '
journalctl -l', because of course if you're going
to make a bug report you want to give full messages. Unfortunately,
journalctl -l does not actually show you the full message.
Not if you just run it by itself. Oh, the full message is available,
all right, but
journalctl specifically and deliberately invokes
the pager in a mode where you have to scroll sideways to see long
lines. Under no circumstance is all of a long line visible on screen
at once so that you may, for example, copy it into a bug report.
This is not a useful decision. In fact it is a screamingly frustrating
decision, one that is about the complete reverse of what I think most
people would expect
-l to do. In the grand systemd tradition, there is
no option to control this; all you can do is force
journalctl to not
use a pager or work out how to change things inside the pager to not do
journalctl goes out of its way to set up this behavior. Not
by passing command line arguments to
less, because that would be too
obvious (you might spot it in a
ps listing, for example); instead
$LESS to effectively add the '
-S' option, among other
While I'm here, let me mention that
journalctl's default behavior
of 'show all messages since the beginning of time in forward
chronological order' is about the most useless default I can imagine.
Doing it is robot logic, not human logic.
Unfortunately the systemd journal is unlikely to change its course
in any significant way so I expect we'll get to live
with this for years.
(I suppose what I need to do next is find out wherever
al puts core dumps from root processes so that I can run
systemd core to poke around. Oh wait, I think it's in the
systemd journal now.
This is my unhappy face, especially since I am having to deal with
a crash in systemd itself.)
What good kernel messages should be about and be like
Linux is unfortunately a haven of terrible kernel messages and terrible kernel message handling, as I have brought up before. In a spirit of shouting at the sea, today I feel like writing down my principles of good kernel messages.
The first and most important rule of kernel messages is that any kernel message that is emitted by default should be aimed at system administrators, not kernel developers. There are very few kernel developers and they do not look at very many systems, so it's pretty much guaranteed that most kernel messages are read by sysadmins. If a kernel message is for developers, it's useless for almost everyone reading it (and potentially confusing). Ergo it should not be generated by default settings; developers who need it for debugging can turn it on in various ways (including kernel command line parameters). This core rule guides basically all of the rest of my rules.
The direct consequence of this is that all messages should be clear, without in-jokes or cleverness that is only really comprehensible to kernel developers (especially only subsystem developers). In other words, no yama-style messages. If sysadmins looking at your message have no idea what it might refer to, no lead on what kernel subsystem it came from, and no clue where to look for further information, your message is bad.
Comprehensible messages are only half of the issue, though; the other half is only emitting useful messages. To be useful, my view is that a kernel message should be one of two things: it should either be what they call actionable or it should be necessary in order to reconstruct system state (one example is hardware appearing or disappearing, another is log messages that explain why memory allocations failed). An actionable message should cause sysadmins to do something and really it should mean that sysadmins need to do something.
It follows that generally other systems should not be able to cause the kernel to log messages by throwing outside traffic at it (these days that generally means network traffic), because outsiders should not be able to harm your kernel to the degree where you need to do anything; if this is the case, they are not actionable for the sysadmin of the local machine. And yes, I bang on this particular drum a fair bit; that's because it keeps happening.
Finally, almost all messages should be strongly ratelimited. Unfortunately I've come around to the view that this is essentially impossible to do at a purely textual level (at least with acceptable impact for kernel code), so it needs to be considered everywhere kernel code can generate a message. This very definitely includes things like messages about hardware coming and going, because sooner or later someone is going to have a flaky USB adapter or SATA HD that starts disappearing and then reappearing once or twice a second.
To say this more compactly, everything in your kernel messages should be important to you. Kernel messages should not be a random swamp that you go wading in after problems happen in order to see if you can spot any clues amidst the mud; they should be something that you can watch live to see if there are problems emerging.
How to delay your fileserver replacement project by six months or so
This is not exactly an embarrassing confession, because I think we made the right decisions for the long term, but it is at least an illustration of how a project can get significantly delayed one little bit at a time. The story starts back in early January, where we had basically finalized the broad details of our new fileserver environment; we had the hardware picked out and we knew we'd run OmniOS on the fileservers and our current iSCSI target software on some distribution of Linux. But what Linux?
At first the obvious answer was CentOS 6, since that would get us a nice long support period and RHEL 5 had been trouble-free on our existing iSCSI backends. Then I really didn't like RHEL/CentOS 6 and didn't want to use it here for something we'd have to deal with for four or five years to come (especially since it was already long in the tooth). So we switched our plans to Ubuntu, since we already run it everywhere else, and in relatively short order I had a version of our iSCSI backend setup running on Ubuntu 12.04. This was probably complete some time in late February, based on circumstantial evidence.
Eliding some rationale, Ubuntu 12.04 was an awkward thing to settle on in March or so of this year because Ubuntu 14.04 was just around the corner. Given that we hadn't built and fully tested the production installation, we might actually have wound up in the position of deploying 12.04 iSCSI backends after 14.04 had actually come out. Since we didn't feel in a big rush at the time, we decided it was worthwhile to wait for 14.04 to be released and for us to spin up the 14.04 version of our local install system, which we expected to have done by not too long after the 14.04 release. As it happened it was June before I picked the new fileserver project up again and I turned out to dislike Ubuntu 14.04 too.
By the time we knew we didn't really want to use Ubuntu 14.04, RHEL 7 was out (it came out June 10th). While we couldn't use it directly for local reasons, we though that CentOS 7 was probably going to be released soon and that we could at least wait a few weeks to see. CentOS 7 was released on July 7th and I immediately got to work, finally getting us back on track to where we probably could have been at the end of January if we'd stuck with CentOS 6.
(Part of the reason that we were willing to wait for CentOS 7 was that I actually built a RHEL 7 test install and everything worked. That not only proved that CentOS 7 was viable, it meant that we had an emergency fallback if CentOS 7 was delayed too long; we could go into at least initial production with RHEL 7 instead. I believe I did builds with CentOS 7 beta spins as well.)
Each of these decisions was locally sensible and delayed things only a moderate bit, but the cumulative effects delayed us by five or six months. I don't have any great lesson to point out here, but I do think I'm going to try to remember this in the future.
Why I do unit tests from inside my modules, not outside them
In reading about how to do unit testing, one of the divisions I've run into is between people who believe that you should unit test your code strictly through its external API boundaries and people who will unit test code 'inside' the module itself, taking advantage of internal features and so on. The usual arguments I've seen for doing unit tests from outside the module are that your API working is what people really care about and this avoids coupling your tests too closely to your implementation, so that you don't have the friction of needing to revise tests if you revise the internals. I don't follow this view; I write my unit tests inside my modules, although of course I test the public API as much as possible.
The primary reason why I want to test from the inside is that this gives me much richer and more direct access to the internal operation of my code. To me, a good set of unit tests involves strongly testing hypotheses about how the code behaves. It is not enough to show that it works for some cases and then call it a day; I want to also poke the dark corners and the error cases. The problem with going through the public API for this is that it is an indirect way of testing things down in the depths of my code. In order to reach down far enough, I must put together a carefully contrived scenario that I know reaches through the public API to reach the actual code I want to test (and in the specific way I want to test it). This is extra work, it's often hard and requires extremely artificial setups, and it still leaves my tests closely coupled to the actual implementation of my module code. Forcing myself to work through the API alone is basically testing theater.
(It's also somewhat dangerous because the coupling of my tests to the module's implementation is now far less obvious. If I change the module implementation without changing the tests, the tests may well still keep passing but they'll no longer be testing what I think they are. Oops.)
Testing from inside the module avoids all of this. I can directly test that internal components of my code work correctly without having to contrive peculiar and fragile scenarios that reach them through the public API. Direct testing of components also lets me immediately zero in on the problem if one of them fails a test, instead of forcing me to work backwards from a cascade of high level API test failures to find the common factor and realize that oh, yeah, a low level routine probably isn't working right. If I change the implementation and my tests break, that's okay; in a way I want them to break so that I can revise them to test what's important about the new implementation.
(I also believe that directly testing internal components is likely to lead to cleaner module code due to needing less magic testing interfaces exposed or semi-exposed in my APIs. If this leads to dirtier testing code, that's fine with me. I strongly believe that my module's public API should not have anything that is primarily there to let me test the code.)
Why I don't believe in generic TLS terminator programs
In some security circles it's popular to terminate TLS connections
with standalone generic programs such as
The stated reason for this boils down to 'separation of concerns';
since TLS is an encrypted TCP session, we can split TLS termination
from actually understanding the data streams that are being transported
over. A weakness in the TLS terminator doesn't compromise the actual
application and vice versa. I've seen people harsh on protocols
that entangle the two issues, such
as SMTP with STARTTLS.
I'm afraid that I don't like (and believe) in generic TLS terminator programs, though. The problem is their pragmatic impact; in practice you are giving up some important things when you use them. In specific, what you're giving up is easy knowledge of the origin IP address of the connection. A generic TLS terminator turns a TLS stream into the underlying data stream but by definition doesn't understand anything about the structure of the data stream (that's what makes it generic). This lack of understanding means it has no way to pass the origin IP address along to whatever is handling the actual data stream; to do so would require it to modify or augment the data stream somehow, and it has no idea how to do that.
You can of course log enough information to be able to reconstruct this information after the fact, which means that in theory you can recover it during the live session with suitably modified backend software. But this requires customization in both the TLS terminator and the backend software, which means that your generic TLS terminator is no longer a drop in part.
(Projects such as titus can apparently get around this with what is presumably a variant of NAT. This adds a lot of complexity to the process and requires additional privileges.)
I consider losing the origin IP address for connections to be a significant issue. There are lots of situations where you really want to know this information, which means that a generic TLS terminator that strips it is not suitable for unqualified general use; before you tell someone 'here, use this to support TLS' you need to ask them about how they use IP origin information and so on. As a result I tend to consider generic TLS terminators as suitable mostly for casual use, because it's exactly in casual uses that you don't really care about IP origin information.
(You can make a general TLS terminator that inserts this information at the start of the data stream, but then it's no longer transparent to the application; the application has to recognize this new information before the normal start of protocol and so on.)
(These issues are of course closely related to why I don't like HTTP as a frontend to backend transport mechanism, as you have the same loss of connection information.)
How we install Ubuntu machines here
We have a standard install system for our Ubuntu machines (which are the majority of machines that we build). I wouldn't call it an automated install system (in the past I've used the term 'scripted'), but it is mostly automated with only a relatively modest amount of human intervention. The choice of partially automated installs may seem odd to people, but in our case it meets our needs and is easy to work with.
Our install process runs in three stages. First we have a customized
Ubuntu server install image that is set up with a preseed file and a
few other things on the CD image. The preseed file pre-selects a basic
package set, answers a bunch of the installer questions for static
things like time zones, sets a standard initial root password, and
drops some files in
/root on the installed system, most importantly a
After the system is installed and reboots, we log into it (generally over the network) and run the pre-dropped postinstall script. This grinds through a bunch of standard setup things (including making sure that the machine's time is synchronized to our NTP servers) but its most important job is bootstrapping the system far enough that it can do NFS mounts in our NFS mount authentication environment. Among other things this requires setting up the machine's canonical SSH host keys, which involves a manual authentication step to fetch them and thus demands a human there to type the access password. After getting the system to a state where it can do NFS mounts, it mounts our central administrative filesystem.
The third step is a general postinstall script that lives on this central administrative filesystem. This script asks a few questions about how the machine will be set up and what sort of package set it should have, then grinds through all the work of installing many packages (some of them from non-default repositories), setting up various local configuration files from the master versions, and applying various other customizations. After this process finishes, a standard login or compute server is basically done and in general the machine is fully enrolled in various automated management systems for things like password propagate and NFS mount management.
(Machines that aren't standard login or compute servers generally then need some additional steps following our documented build procedures.)
In general our approach here has been to standardize and script everything that is either easy to do, that's very tricky to do by hand, or that's something we do a lot. We haven't tried to go all the way to almost fully automated installs, partly because it seems too much work for the reward given the modest amount of (re)installs we do and partly because there's some steps in this process that intrinsically require human involvement. Our current system works very well; we can spin up standard new systems roughly as fast as the disks can unpack packages and with minimal human involvement, and the whole system is easy to develop and manage.
Also, let me be blunt about one reason I prefer the human in the loop approach here: unfortunately Debian and Ubuntu packages have a really annoying habit of pausing to give you quizzes every so often. These quizzes basically function as land mines if you're trying to do a fully automated install, because you can never be sure if you've pre-answered all of them and if force-ignoring one is going to blow up in your face. Having a human in the loop to say 'augh no I need to pick THIS answer' is a lot safer.
(I can't say that humans are better at picking up on problems if something comes up, because the Ubuntu package install process spews out so much text that it's basically impossible to read. In practice we all tune out while it flows by.)
Browser addons can effectively create a new browser
(Certainly this is the case for me with my extensions. Adding gestures to Firefox significantly changes the UI I experience, while NoScript and other screening tools give me a much different and more pleasant view of the web.)
The direct consequence of this is that in many cases, people's core addons are not optional. If your addons stop working, what you wind up with is effectively a different browser; its UI is different, its behavior is different. This means that from a user's perspective, breaking addons can be the same as breaking the browser. Regardless of the technical details about what happened, you wind up in a browser that doesn't work right, one that no longer behaves the way it used to.
(A corollary is that once your browser is broken, you may well have no particular reason to stay with the underlying base it was built on. Humans being humans, you are probably much more likely to feel angry that your browser has been broken and switch to a browser from some other people.)
This is of course not the case for all addons or all people. Some addons have too small an effect, and not everyone will do much with addons that can have major effects on the UI or the browsing experience. But even small addons may have big effects for some people; if you use an addon that tweaks a site that you use all the time in a way that strongly affects your use of the site, you've kind of got a different browser. Certainly losing the addon would significantly change your experience even though that one site is only a small part of the web. I'm sure there are people using extensions related to the big time-consuming social websites who fall into this category.
(If you install a gestures extension but rarely use gestures, or install NoScript but whitelist almost everything you run into, you're not really changing your browsing experience much from the baseline browser.)
How security sensitive is information about your network architecture?
One of the breathless things that I've seen said recently about the recent Sony Pictures intrusion is that having their network layout and infrastructure setup disclosed publicly is really terrible and will force Sony Pictures to change it. This doesn't entirely make sense to me; I'm hard pressed to see how network layout information and so on is terribly security sensitive in a sensibly run environment. Switch and router and database passwords, certainly; but just the network architecture?
(This information is clearly business sensitive, but that's a different thing.)
There is clearly one case where this is terrible for security, namely if you've left holes and back doors in your infrastructure. But this is badly design infrastructure in the first place that you just tried to protect with security through obscurity (call this the ostrich approach; if people don't see it it's still secure). It's not that disclosure has made your infrastructure insecure, the disclosure has just revealed that it is.
Beyond that, having full information on your network architecture will certainly make an attacker's work easier. Rather than having to fumble around exploring the networks and risking discovery through mistakes, they can just go straight to whatever they're interested in. But making an attacker's job somewhat easier is a far cry from total disaster. If things are secure to start with this doesn't by itself enable the attacker to compromise systems or get credentials (although it'll probably make the job easier).
Or in short: if your network architecture isn't fundamentally insecure to start with, I don't see how disclosing it is a fatal weakness. I suppose there are situations where you're simply forced to run your systems in a way that are fundamentally insecure because the software and protocols you're using don't allow any better and you have to allow enough access to the systems so that people could exploit this if they knew about it and wanted to.
(I may well be missing things here. I'm aware that I work in an unusually open environment, which is that way partly because this is the culture of academia and partly due to pragmatics. As I've put it before, part of our threat model has to be inside the building.)
(Also, probably I should be remembering my old comments on the trade press here.)
Log retention versus log analysis, or really logs versus log analysis
yea but. keeping logs longer is not particularly interesting if you have no heavy duty tools to chew through them. [...]
Unsurprisingly, I disagree with this.
Certainly in an ideal world we would have good log analysis tools
that we use to process raw logs into monitoring, metrics data, and
other ongoing uses of the raw pieces we're gathering and retaining.
However ongoing processing of logs is far from the only reason to
have them. Another important use is to go back through the data you
already have in order to answer (new) questions, and this can be
done without having to process the logs through heavy duty tools.
Many questions can be answered with basic Unix tools such as
awk, and these can be very important post-facto ad-hoc
A lack of good tools may limit the sophistication of the questions you can ask (at least with moderate effort) and the volume of questions you can deal with, but they don't make logs totally useless. Far from it, in fact. In addition, given that logs are the raw starting point you can always keep logs now and build processing for them later, either on an 'as you have time' or an 'as you have the need' basis. As result I feel that this is a the perfect is the enemy of the good situation unless your log volume is so big that you can't just keep raw logs and do anything with them.
(And on modern machines you can get quite far with plain text, Unix tools, and some patience, even with quite large log files.)
If you want the really short version: having information is almost always better than not having information, even if you're not doing anything with it right now.
Security capabilities and reading process memory
Today, reading Google's Project Zero explanation of an interesting IE sandbox escape taught me something that I probably should have known already but hadn't thought about and realized. So let's start at the beginning, with capability-based security. You can read about it in more detail in that entry, but the very short version is that capabilities are tokens that give you access rights and generally are subject to no further security checks. If you've got the token, you have access to whatever it is in whatever way the token authorizes. Modern Unix file descriptors are a form of capabilities; for example, a privileged program can open a file that a less privileged one doesn't have access to and pass the file descriptor to it to give access to that file. The less privileged program only has as much access to the file (read, write, both, etc) as is encoded in the file descriptor; if the file descriptor is read only, that's it.
When you design a capability system, you have to answer some questions. First, do programs hold capabilities themselves in their own memory space (for example, as unforgeable or unguessable blobs of cryptographic data) or do they merely hold handles that reference them and the kernel has the real capability? Unix file descriptors are an example of the second option, as the file descriptor number at user level is just a reference to the real kernel information. Second, are the capabilities or handles held by the process bound to the process or are they process independent? Unix file descriptors are bound to processes and so your FD 10 is not my FD 10.
My impression is that many strong capability based systems answer that user processes hold as much of the capability token as possible (up to 'all of it') and that the token is not bound to the process. This is an attractive idea from the system design perspective because it means that the kernel doesn't have to have much involvement in storing or passing around capability tokens. Again, contrast this with Unix, where the kernel has to do a whole pile of work when process A passes a file descriptor to process B. The sheer work involved might well bog down a Unix system that tried to make really heavy use of file descriptor passing to implement various sorts of security.
What I had not realized until I read the Google article is that in systems where processes hold capability tokens that are not bound to the particular process, being able to read a process's memory is a privilege escalation attack. If you can read the memory of a process with a capability token, you can read the token's value and then use it yourself. The system's kernel doesn't know any better, and in fact it not knowing is a significant point of the whole design.
This matters for two reasons. First, reading process memory is often an explicit feature of systems to allow for debugging (it's hard to do it otherwise). Second, in practice there have been all sorts of attacks that cause processes to leak some amount of memory to the attacker and often these disclosure leaks are considered relatively harmless (although not always; you may have heard about Heartbleed). In a capability based system, guarding against both of these issues clearly has to be a major security concern (and I'm not sure how you handle debugging).
(In many systems you probably don't need to use the capabilities in the process that read them out of someone's memory. If they really are process independent, you can read them out in one process, transfer them through the Internet, and pass them back in to or with a completely separate process that you establish in some other manner. All that matters is that the process can give the kernel the magic token.)