What I want to know about kernel security updates
This is kind of a rant. The issue is on my mind because we spent a chunk of this evening applying kernel updates to our Ubuntu machines and rebooting them, something that we feel forced to do once every few months or so. One of the reasons that we don't do this more often, such as every time when Ubuntu releases a kernel update, is that kernel updates are among the most disruptive updates that there are; in order to make them take effect you must reboot the machine, which is completely disruptive to anyone using the machine (especially if they're logged in to it).
But another reason we don't apply Ubuntu kernel updates all that often is that Ubuntu's kernel updates are terrible at giving us useful information about how severe the issues are and how urgent doing an update is. Except in terribly obvious extreme situations (eg 'locally exploitable bug, gives root, an exploit is public') we wind up faced with a flurry of issues of extremely uncertain but generally low seeming impact. Unsurprisingly we wind up defaulting to not doing major disruptions on a regular basis, then periodically we decided that we should get up to date just in case.
While Ubuntu has its specific failings here, this is not just an Ubuntu problem. I think every Linux distribution I've seen a kernel security update from has failed to include the information we'd need to make meaningful decisions. All of them irritate me.
As a sysadmin, here is what I want to know about every issue fixed in a kernel security update:
- how severe is the consequence of the issue? Does the exploit give
you root, disclose some sort of information (and if so, what sort
and can it be leveraged to disclose things like passwords), or
just allow you to lock the machine up?
- is this remotely exploitable or does it require running your code
on the machine? If it's remotely exploitable, how remote is remote;
'on the same LAN' is a lot different than 'anyone on the Internet'.
('Exploitable from inside a VM' is another case.)
The most common sort of issue that I see bugfixes for is a locally exploitable denial of service issue. While it's nice to fix these bugs, they are fundamentally unimportant for many sysadmins since any local user generally already has plenty of ways to lock up or crash a Unix system. But you'd never know this from how distributions phrase things in kernel update notices.
- is this exploitable on a default configuration machine? Or does
it require some specific hardware to be present or some specific,
non-default configuration or protocol to have been set up?
You would not believe how many updates don't make this clear. This matters hugely to whether a particular issue is even relevant to us and it makes me angry every time a distribution or vendor forces me to research this myself.
- how currently exploitable is this issue? This ranges from
'a weaponized exploit has been made public' all the way through
'we think that someone might someday be able to figure out how
to exploit this'.
Yes, yes, I'm sure that distribution security teams hate having to say anything about this (unless it's the former), but trust me, this is the kind of thing that my manager asks me when I say 'this seems pretty urgent, I think we need to do an emergency reboot without our usual one-week advance notice (if there are no conference or paper deadlines)'.
- what is the primary source for this issue, or at least what is an
index page with links to the primary source information? Many
kernel security issues are reported, disclosed, or announced on
things like public mailing lists, generally with far more technical
detail than the distribution wants to put in their update notice.
I want to read this primary source material and I become angry
when a distribution (which had all of this information itself)
hides it and forces me to do web searches.
And everyone should link to the CVE page for CVE issues as well. There is nothing I like quite so much as doing web searches for information that a distribution's security team already had but decided not to give me. Really.
I suspect that most distributions would want to put together their own information page in some standardized format. This is fine, just as long as they put a link to their own info page in the announcement and their info page links to the primary source (and the CVE information and so on). This would also be a good place to put extended discussions of things like how to tell if your particular system is potentially vulnerable to the issue.
My excessively cynical side suspects that distribution security teams leave out a lot of this information in order to push people towards applying every kernel update as soon as possible. If so, I have news for those security teams: they have it exactly backwards. There are powerful forces pushing us (and anyone) against applying updates, especially disruptive updates like new kernels. Every doubt and quibble and uncertainty in a kernel update message feeds those forces and makes it less likely that the update will be applied. In order to get us to apply an important update on an urgent basis, it must be clear that it is urgent. If it is not clear, everyone loses.
Everything works much better when the security team is honest and clear about kernel updates. We'll still sit on all of the updates that are just yet more ways for local users to lock the machine up, but that's no different than what we're already doing. But when you release something that's genuinely dangerous we'll be much more likely to notice, understand, and update much earlier than we would otherwise.
(By the way, for everyone who is about to advise us that we should have dynamic load balancers and pools of machines where we can take some out of service on a rolling basis for kernel upgrades and so on: there is no such thing as a general dynamic load balancer for user login sessions, established sessions in general, or actual running user processes. Thanks.)