2017-02-24
What an actual assessment of Ubuntu kernel security updates looks like
Ubuntu recently released some of their usual not particularly helpful kernel security update announcements and I tweeted:
Another day, another tedious grind through Ubuntu kernel security announcements to do the assessment that Ubuntu should be doing already.
I have written about the general sorts of things we want to know about kernel security updates, but there's nothing like a specific example (and @YoloPerdiem asked). So here is essentially the assessment email that I sent to my co-workers.
First, the background. We currently have Ubuntu 16.04 LTS, 14.04 LTS, and 12.04 LTS systems, so we care about security updates for the mainline kernels for all of those (we aren't using any of the special ones). The specific security notices I was assessing are USN-3206-1 (12.04), USN-3207-1 (14.04), and USN-3208-1 (16.04). I didn't bother looking at CVEs that require hardware or subsytems that we don't have or use, such as serial-to-USB hardware (CVE-2017-5549) or KVM (several CVEs here). We also don't update kernels just for pure denial of service issues (eg CVE-2016-9191, which turns out to require containers anyway), because our users already have plenty of ways to make our systems crash if they want to.
So here is a slightly edited and cleaned up version of my assessment email:
Subject: Linux kernel CVEs and my assessment of them
16.04 is only affected by CVE-2017-6074, which we've mitigated, and
CVE-2016-10088, which doesn't apply to us because we don't have
people who can access /dev/sg*
devices.
12.04 and 14.04 are both affected by additional CVEs that are use-after-frees. They are not explicitly exploitable so far, but CVE-2017-6074 is also a use-after-free and is said to be exploitable with an exploit released soon, so I think they are probably equally dangerous.
[Local what-to-do discussion elided.]
Details:
CVE-2017-6074:
Andrey Konovalov discovered a use-after-free vulnerability in the DCCP implementation in the Linux kernel. A local attacker could use this to cause a denial of service (system crash) or possibly gain administrative privileges.
This is bad if not mitigated, with an exploit to be released soon (per here), but we should have totally mitigated it by blocking the DCCP modules. See my worklog on that.
CVE-2016-7911:
Dmitry Vyukov discovered a use-after-free vulnerability in the sys_ioprio_get() function in the Linux kernel. A local attacker could use this to cause a denial of service (system crash) or possibly gain administrative privileges.
The latter URL has a program that reproduces it, but it's not clear if this can be exploited to do more than crash. But CVE-2017-6074's use-after-free is apparently exploitable, so...
CVE-2016-7910:
It was discovered that a use-after-free vulnerability existed in the block device layer of the Linux kernel. A local attacker could use this to cause a denial of service (system crash) or possibly gain administrative privileges.
Link: 1
Oh look, another use-after-free issue. Ubuntu's own link for the issue says 'allows local users to gain privileges by leveraging the execution of [...]' although their official release text is less alarming.
CVE-2016-10088:
It was discovered that the generic SCSI block layer in the Linux kernel did not properly restrict write operations in certain situations. A local attacker could use this to cause a denial of service (system crash) or possibly gain administrative privileges.
Finally some good news! As far as I can tell from Ubuntu's actual
CVE-2016-10088 page,
this is only exploitable if you have access to a /dev/sg*
device,
and on our machines people don't.
(The actual email was plain text, so the various links were just URLs dumped into the text.)
As you can maybe see from this, doing a proper assessment requires
reading at least the detailed Ubuntu CVE information in order to
work out under what circumstances the issue can be triggered, for
instance to know that CVE-2016-10088 requires access to a /dev/sg*
device. Not infrequently you have to go chasing further; for
example, only Andrey Konovalov's initial notice mentions that he will
release an exploit in a few days. In this case we could mitigate
the issue anyways by blacklisting the DCCP modules, but in other
cases 'an exploit will soon be released' drastically raises the
importance of a security exposure (at least for us).
The online USN pages usually link to Ubuntu's pages on the CVEs they include, but the email announcements that Ubuntu sends out don't. Ubuntu's CVE pages usually have additional links, but not a full set; often I wind up finding Debian's page on a CVE because they generally have a full set of search links for elsewhere (eg Debian's CVE-2016-9191 page). I find that sometimes the Red Hat or SuSE bug pages will have the most technical detail and thus help me most in understanding the impact of a bug and how exposed we are.
The amount of text that I wind up writing in these emails is generally
way out of proportion to the amount of reading and searching I have
to do to figure out what to write. Everything here is a sentence
or two, but getting to the point where I could write those is the
slog. And with CVE-2017-6074, I had to jump in to set up and test
an entire mitigation of blacklisting all the DCCP modules via a new
/etc/modprobe.d
file and then propagating that file around to all
of our Ubuntu machines.
How ZFS bookmarks can work their magic with reasonable efficiency
My description of ZFS bookmarks covered what they're good for, but it didn't talk about what they are at a mechanical level. It's all very well to say 'bookmarks mark the point in time when [a] snapshot was created', but how does that actually work, and how does it allow you to use them for incremental ZFS send streams?
The succinct version is that a bookmark is basically a transaction group (txg) number. In ZFS, everything is created as part of a transaction group and gets tagged with the TXG of when it was created. Since things in ZFS are also immutable once written, we know that an object created in a given TXG can't have anything under it that was created in a more recent TXG (although it may well point to things created in older transaction groups). If you have an old directory with an old file and you change a block in the old file, the immutability of ZFS means that you need to write a new version of the data block, a new version of the file metadata that points to the new data block, a new version of the directory metadata that points to the new file metadata, and so on all the way up the tree, and all of those new versions will get a new birth TXG.
This means that given a TXG, it's reasonably efficient to walk down an entire ZFS filesystem (or snapshot) to find everything that was changed since that TXG. When you hit an object with a birth TXG before (or at) your target TXG, you know that you don't have to visit the object's children because they can't have been changed more recently than the object itself. If you bundle up all of the changed objects that you find in a suitable order, you have an incremental send stream. Many of the changed objects you're sending will contain references to older unchanged objects that you're not sending, but if your target has your starting TXG, you know it has all of those unchanged objects already.
To put it succinctly, I'll quote a code comment from libzfs_core.c (via):
If "from" is a bookmark, the indirect blocks in the destination snapshot are traversed, looking for blocks with a birth time since the creation TXG of the snapshot this bookmark was created from. This will result in significantly more I/O and be less efficient than a send space estimation on an equivalent snapshot.
(This is a comment about getting a space estimate for incremental sends, not about doing the send itself, but it's a good summary and it describes the actual process of generating the send as far as I can see.)
Yesterday I said that ZFS bookmarks could
in theory be used for an imprecise version of 'zfs diff
'. What
makes this necessarily imprecise is that while scanning forward
from a TXG this way can tell you all of the new objects and it can
tell you what is the same, it can't explicitly tell you what has
disappeared. Suppose we delete a file. This will necessarily create
a new version of the directory the file was in and this new version
will have a recent TXG, so we'll find the new version of the directory
in our tree scan. But without the original version of the directory
to compare against we can't tell what changed, just that something
did.
(Similarly, we can't entirely tell the difference between 'a new file was added to this directory' and 'an existing file had all its contents changed or rewritten'. Both will create new file metadata that will have a new TXG. We can tell the case of a file being partially updated, because then some of the file's data blocks will have old TXGs.)
Bookmarks specifically don't preserve the original versions of things; that's why they take no space. Snapshots do preserve the original versions, but they take up space to do that. We can't get something for nothing here.
(More useful sources on the details of bookmarks are this reddit ZFS entry and a slide deck by Matthew Ahrens. Illumos issue 4369 is the original ZFS bookmarks issue.)
Sidebar: Space estimates versus actually creating the incremental send
Creating the actual incremental send stream works exactly the same
for sends based on snapshots and sends based on bookmarks. If you
look at dmu_send
in dmu_send.c,
you can see that in the case of a snapshot it basically creates a
synthetic bookmark from snapshot's creation information; with a real
bookmark, it retrieves the data through dsl_bookmark_lookup
. In
both cases, the important piece of data is zmb_creation_txg
, the
TXG to start from.
This means that contrary to what I said yesterday, using bookmarks as the origin for an incremental send stream is just as fast as using snapshots.
What is different is if you ask for something that requires estimating
the size of the incremental sends. Space estimates for snapshots
are pretty efficient because they can be made using information
about space usage in each snapshot. For details, see the comment
before dsl_dataset_space_written
in dsl_dataset.c.
Estimating the space of a bookmark based incremental send requires
basically doing the same walk over the ZFS object tree that will be
done to generate the send data.
(The walk over the tree will be somewhat faster than the actual send, because in the actual send you have to read the data blocks too; in the tree walk, you only need to read metadata.)
So, you might wonder how you ask for something that requires a space
estimate. If you're sending from a snapshot, you use 'zfs send -v
...
'. If you're sending from a bookmark or a resume token, well,
apparently you just don't; sending from a bookmark doesn't accept
-v
and -v
on resume tokens means something different from what
it does on snapshots. So this performance difference is kind of a
shaggy dog story right now, since it seems that you can never
actually use the slow path of space estimates on bookmarks.