2013-01-23
What I want to know about kernel security updates
This is kind of a rant. The issue is on my mind because we spent a chunk of this evening applying kernel updates to our Ubuntu machines and rebooting them, something that we feel forced to do once every few months or so. One of the reasons that we don't do this more often, such as every time when Ubuntu releases a kernel update, is that kernel updates are among the most disruptive updates that there are; in order to make them take effect you must reboot the machine, which is completely disruptive to anyone using the machine (especially if they're logged in to it).
But another reason we don't apply Ubuntu kernel updates all that often is that Ubuntu's kernel updates are terrible at giving us useful information about how severe the issues are and how urgent doing an update is. Except in terribly obvious extreme situations (eg 'locally exploitable bug, gives root, an exploit is public') we wind up faced with a flurry of issues of extremely uncertain but generally low seeming impact. Unsurprisingly we wind up defaulting to not doing major disruptions on a regular basis, then periodically we decided that we should get up to date just in case.
While Ubuntu has its specific failings here, this is not just an Ubuntu problem. I think every Linux distribution I've seen a kernel security update from has failed to include the information we'd need to make meaningful decisions. All of them irritate me.
As a sysadmin, here is what I want to know about every issue fixed in a kernel security update:
- how severe is the consequence of the issue? Does the exploit give
you root, disclose some sort of information (and if so, what sort
and can it be leveraged to disclose things like passwords), or
just allow you to lock the machine up?
- is this remotely exploitable or does it require running your code
on the machine? If it's remotely exploitable, how remote is remote;
'on the same LAN' is a lot different than 'anyone on the Internet'.
('Exploitable from inside a VM' is another case.)
The most common sort of issue that I see bugfixes for is a locally exploitable denial of service issue. While it's nice to fix these bugs, they are fundamentally unimportant for many sysadmins since any local user generally already has plenty of ways to lock up or crash a Unix system. But you'd never know this from how distributions phrase things in kernel update notices.
- is this exploitable on a default configuration machine? Or does
it require some specific hardware to be present or some specific,
non-default configuration or protocol to have been set up?
You would not believe how many updates don't make this clear. This matters hugely to whether a particular issue is even relevant to us and it makes me angry every time a distribution or vendor forces me to research this myself.
- how currently exploitable is this issue? This ranges from
'a weaponized exploit has been made public' all the way through
'we think that someone might someday be able to figure out how
to exploit this'.
Yes, yes, I'm sure that distribution security teams hate having to say anything about this (unless it's the former), but trust me, this is the kind of thing that my manager asks me when I say 'this seems pretty urgent, I think we need to do an emergency reboot without our usual one-week advance notice (if there are no conference or paper deadlines)'.
- what is the primary source for this issue, or at least what is an
index page with links to the primary source information? Many
kernel security issues are reported, disclosed, or announced on
things like public mailing lists, generally with far more technical
detail than the distribution wants to put in their update notice.
I want to read this primary source material and I become angry
when a distribution (which had all of this information itself)
hides it and forces me to do web searches.
And everyone should link to the CVE page for CVE issues as well. There is nothing I like quite so much as doing web searches for information that a distribution's security team already had but decided not to give me. Really.
I suspect that most distributions would want to put together their own information page in some standardized format. This is fine, just as long as they put a link to their own info page in the announcement and their info page links to the primary source (and the CVE information and so on). This would also be a good place to put extended discussions of things like how to tell if your particular system is potentially vulnerable to the issue.
My excessively cynical side suspects that distribution security teams leave out a lot of this information in order to push people towards applying every kernel update as soon as possible. If so, I have news for those security teams: they have it exactly backwards. There are powerful forces pushing us (and anyone) against applying updates, especially disruptive updates like new kernels. Every doubt and quibble and uncertainty in a kernel update message feeds those forces and makes it less likely that the update will be applied. In order to get us to apply an important update on an urgent basis, it must be clear that it is urgent. If it is not clear, everyone loses.
Everything works much better when the security team is honest and clear about kernel updates. We'll still sit on all of the updates that are just yet more ways for local users to lock the machine up, but that's no different than what we're already doing. But when you release something that's genuinely dangerous we'll be much more likely to notice, understand, and update much earlier than we would otherwise.
(By the way, for everyone who is about to advise us that we should have dynamic load balancers and pools of machines where we can take some out of service on a rolling basis for kernel upgrades and so on: there is no such thing as a general dynamic load balancer for user login sessions, established sessions in general, or actual running user processes. Thanks.)
2013-01-08
Why we wound up using Linux for our iSCSI targets
I was recently asked in email why we chose Linux for our iSCSI targets and if it had to do with Linux's wide variety of hardware support. The answer is 'sort of', but there's a story here (which I'm going to simplify a bit).
To start with, our fileserver environment didn't appear out of nowhere; instead it's an evolution of an earlier fileserver environment, one that was built out of SPARC Solaris NFS servers talking to (mostly SATA) RAID appliances over (lower-end) FibreChannel. When we set out to modernize this but keep the same basic shape, our initial decision was to stay with an 'appliance' style solution for the backend disks rather than try to engineer our own backends. Doing our own backends might wind up cheaper, if they worked and if you didn't count staff time, but it exposed us to a bunch of risks that didn't seem worth it for what looked like relatively modest savings (at our planned size). So we picked Solaris 10 on x86 as the NFS fileserver, iSCSI over 1GB Ethernet as the SAN interconnect, and found a nice iSCSI appliance.
Then the iSCSI appliance failed badly. This sank plan A in a rather spectacular and grim fashion, and (along with some other things) left us feeling not really happy about appliance solutions in general. Suddenly it seemed worthwhile to try building our own test iSCSI backend to see how it would work (we had plenty of random spare servers, although I think we may have bought one for testing).
At the time we did this, in late 2007 and early 2008, our choices for the OS on a build-our-own iSCSI target boiled down to Linux and Solaris. There were two problems with Solaris. First, at the time the Solaris iSCSI target implementation simply wasn't mature (eg). Second and less important, it looked very much like the cheap ways to connect a dozen or so SATA disks to a machine would involve hardware that had Linux drivers and did not have Solaris drivers that I could see. By contrast the Linux iSCSI target worked (and performed well), we were familiar with Linux already, and Linux supported all the hardware we needed. By the time the dust settled there wasn't much of a choice.
Linux was a risk in some ways because our iSCSI target software was third-party software, although open source and with active developers. It would have been more reassuring to go with an integrated, fully-supported iSCSI target implementation, but we didn't have that option. What we could get on Linux did work, so we went with it.
(It's possible that we overlooked an iSCSI target implementation on FreeBSD at the time. I'm not sure that it would have made a difference unless the FreeBSD version walked on water, because we had no other experience with FreeBSD. When in doubt we pick the OS that we already run other instances of.)