2012-06-08
An (accessible) explanation of the Flame malware's Windows Update compromise
I've been quite curious about the details of how the Flame malware managed to compromise Windows Update, as this sort of crypto wonkery is one of my interests. Unfortunately pretty much all of the coverage that I could find has either been vague and non-technical or written by specialists for other specialists. Since I've recently done a bunch of reading on this in an attempt to force things into my head, I'm going follow my usual tradition and write up what I think I understand and have guessed.
(The obligatory disclaimer is that I am not a security specialist so I may be misunderstanding parts of this.)
The story starts with Microsoft's Terminal Services licensing mechanisms. Part of enterprise TS licensing involves Microsoft signing a certificate for you which you then use to prove to the TS management software that you're properly licensed (and then it's apparently used to issue sub-certificates for client side licenses, which are tied to your organization through your certificate). This is a perfectly rational design with a number of useful features but Microsoft made at least three mistakes in the implementation.
The first mistake is that Microsoft was signing enterprise TS license certificates with a certificate chain that ran up to a general Microsoft Root certificate, one that is used as the certificate root for all sorts of things. This creates a situation where these enterprise TS certs and any sub-certs that they signed would be seen as 'signed by Microsoft' by general certificate verification code (such as that used by, oh, Windows Update).
(This chain running to a Microsoft Root CA is unnecessary; the TS license verification could perfectly well have its own detached and independent root certificate, one used only for license verification. Microsoft is apparently in the process of changing TS licensing to do just this.)
The second mistake is that these signed enterprise TS certificates were (indirectly) authorized as code signing certificates instead of being restricted to just license verification. In combination with the first mistake this becomes a 'signed by Microsoft' code signing certificate, which is good enough to compromise Windows Update in everything before Windows Vista.
(The authorization is indirect because the enterprise TS cert has no specific usage restrictions; instead the usage restrictions come from two of the intermediate certificates in the chain. Both of them specifically allow code signing plus some other things.)
Why the basic compromise doesn't work from Vista onwards is that the TS certificate signing system adds a custom TS-related extension (called 'Hydra') to the signed certificate that's marked 'critical'. Critical in X.509 means that if your code doesn't understand the field you're supposed to reject the certificate even if it otherwise validates. The TS license code presumably understands this field, but apparently the general Vista-and-later crypto library does not and thus would have rejected the certificate. This leads to the third mistake.
The third mistake Microsoft made is that the TS cert signing process was still using MD5, which is now quite susceptible to serious collision attacks (although practical ones remain what mathematicians call 'non-trivial'). The Flame authors exploited the weaknesses in MD5 to create a version of their signed TS certificate (with the same MD5 hash) that transformed the critical (must-be-handled) Hydra extension (along with several other fields) into just the data payload of a disused non-critical field that would be ignored by the signature verification code in the Vista-and-later crypto library that Windows Update uses. This finally created a 'signed by Microsoft' code signing certificate that would be accepted by all versions of Windows Update.
One of several scary things about this compromise is this was not a weakness in Windows Update's cryptography. The Windows Update people seem to have done everything or almost everything right; instead, they got compromised by a series of certificate handling mistakes in a completely different area of Microsoft (one that seems to have been neglected if not outright ignored before this).
(You can argue that Windows Update is so crucial that it should have had its own independent root certificate instead of relying on a general use Microsoft Root CA that was also the root of other certificate chains used for other purposes.)
PS: there seem to have been a number of other, less important flaws in how enterprise TS certificates were set up. From my reading so far, most of the other issues weren't important for the compromise; they just suggest that TS licensing never received the cryptography attention it needed. The one exception is that apparently enterprise TS certificate signing used predictable timing and field contents for things like the serial number, which is believed to have made the MD5 collision attack easier.
Sidebar: many references
Here is a bunch of the links that I've been relying on:
- The best technical but reasonably understandable entries I've read are from Ryan Hurst: one, two, three, four. He'll probably write more about this. I found his entries through this Marsh Ray mailing list message, which has other links.
- there is also Matthew Green's entry
which covers this whole territory in one entry.
- one important primary source is what Microsoft themselves have said:
one,
two,
three,
four
(so far). The Microsoft notes are rather vague, but among other things
I suspect that anyone at Microsoft who could write something more
technical about this is far too busy with analysis and mitigation work
to have the time to do so.
- on the challenge of the MD5 hash collision
(and Ars Technica).
- Ars Technica's coverage has been one of the few sources of reporting on
this that actually links to the primary sources (and to more than
just the Microsoft notes): one,
two,
three.
Not all of it is great, though.
- Symantec has more background on Terminal Services here, also in general.
There are probably a number of other accessible explanations of the whole thing that I just didn't find in my web searching before I got overwhelmed by all of the breathless news coverage from the usual suspects.
2012-06-03
Another view of the merge versus rebase debate in version control
One of the persistent debates in modern version control is between merging changes and rebasing them, with the customary pro-merge argument being that rebasing destroys history. From the developer's view this is completely correct; rebasing destroys the history of the changes that are being rebased, causing them to spring into existence fully-formed.
But there's another view you can have on this; you can have the view of a user of the repository, someone (or something) who is pulling and tracking the tip/head of the mainline. And from this user's view, the history that merges preserve effectively doesn't really exist. Before the merge commit it wasn't there at all, and when the merge commit was made all of the changes appeared instantly from nowhere (and they appeared as an indivisible whole; as you step back and forth through the history of the mainline tip, you either have all of the merge or none of it).
The important thing to understand is that this 'user' perspective always sees a linear history of the repository (unless you're doing something very unusual with head/tip), in that every time they update the head marches forward along a continuous unbranching line of development. This means that from the user perspective, a merge is functionally equivalent to a single giant rebase commit; both have the same net effect of causing a large indivisible block of changes to show up all at once.
(A merge is theoretically superior to a single giant rebase in that you can perhaps bisect back through a merge to find a specific broken change. But you should never do single giant rebases; you should rebase with a sequence of single-change commits, which makes bisect even simpler and more likely to actually work in practice.)
Thus I think it's clear that from the user view the best thing is a series of rebased commits. Although they may appear all at once they look as much as possible like separate changes, and as separate commits you can still look at the changes individually and step through them one by one if you need to (and similarly, bisect through them).
Sidebar: a general problem with merges and bisecting
There are a number of practical problems with bisecting through merges, but a general one is that once you start bisecting into a merge you are no longer working in the mainline code. From the user view this means that you are in completely unfamiliar territory where even if the code theoretically works it may lack changes, bugfixes, or features from the mainline that you need.
(The more long-lived and isolated the branch was, the more you may run into this.)
My view on Mercurial versus Git
I've kind of alluded to my views in passing before but since I've already written a certain amount on these two systems (and a chunk of it sort of in favour of Mercurial) I feel like writing about this explicitly, just to be clear.
(You should insert implicit 'in my view' disclaimers in the following if desired.)
For my own use, Mercurial is easier
to start with and use simply, more user friendly, and more 'humane'
(in that in general it works more how people expect). However, Git is technically better, more powerful and complex,
and is more willing to be pragmatic and useful. Mercurial people (at
least in my perceptions) are still somewhat tied to the 'proper' VCS
way to do things; Git people are much more flexible and willing to
compromise in order to do the 'impure' but right thing (the primary
exhibit of this is git rebase). Git's drawback is that it has far
more exposed complexity than Mercurial does; you cannot really learn or
use Git without understanding things like the index, the (abstracted)
repository format, and so on. But once you do, the good news is that
everything makes sense.
(Saying that Git is technically better may irritate people, but I do feel that it's true. Over and over I've found myself persuaded that Git makes the right fundamental choices on things like repository formats, whether or not to try to track file renames, and so on. I don't currently know enough to have an opinion in the great debate over Mercurial branches versus Git branches.)
The net result is that these days I like Git more and it's what I'm focusing on. Mercurial is okay and I know enough about it to get by, but I would rather use Git for future repos and spend my time learning more about it. This marks a change of my mind from how I was a few years ago (when I found Git intimidating and Mercurial nicely easy), but I figure I'm allowed to do that. I know that there will be a learning curve and some frustrations in using Git, but I'm okay with that; I think that it will be worth it in the end.
(Things for work will continue to be in Mercurial repos, because that's our standard for good reasons.)
As a side note, I would say that Git's flaw is that it has never been willing to compromise or hide its complexity in order to present people with an interface that feels simple and natural. There have been attempts to do so every so often, but as far as I can tell they've never really caught on (and I don't think they've had enabling support from the Git core). The result is a powerful but complex and deep interface that doesn't necessarily operate the way that people start out expecting. This is why I say that Mercurial is more humane than Git; Mercurial has made an effort to have its interface operate in such a simple and natural way, even if it means not offering a certain amount of power to people or hiding it.
Sidebar: the pragmatic perspective
On a pragmatic basis Git has won. I say this for one simple reason: Github. If you work with open source projects (even just using them) you will sooner or later wind up dealing with Github. And if you want to share or show people your open source code, even trivial code, Github is again the platform that people will want you to use.
(Yes, there is a Mercurial equivalent, but Github is far bigger and far more dominant. And yes I believe that you can use Mercurial with a Mercurial to Git bridge to interact with Github if you're stubborn enough and really want to, but let's be honest; you're making life harder for yourself.)
Honesty compels me to admit that this was one large reason I finally started putting things into Git repos; I wanted to put them up on Github, so it was time to get serious about using Git.
2012-06-01
The secure boot problem
In theory, UEFI secure booting has the straightforward goal of stopping boot time malware, malware that compromises your machine before Windows boots and thus before any of its protections can kick in (such malware already exists, although it's not very common). In practice, secure boot requires that all privileged code your machine ever runs be signed. Your bootloader must be signed, your operating system (Windows or otherwise) must be signed, your hardware drivers must be signed. Wait, what? How did 'prevent boot time malware' turn into 'only run signed code'?
The core problem with all secure boot schemes and with this general goal of blocking boot time malware is that the OS has no way to be sure that it was booted securely. There is no way for Windows (or any other OS) to reliably detect that it was booted in a compromised environment or by something other than the official boot system and throw up a big warning when it starts to the effect that you're in trouble. If an attacker has control of the machine, they can construct a fake boot environment that lies to the target OS and says 'honest, you were booted securely, everything is fine, I am not boot time malware'. At that point it is game over.
(I'm not convinced that you can get around this in practice even with hardware support.)
This means that secure boot can never allow the attacker to gain control of the machine through any path, even a long one. A bootloader allows control of the machine, so you can only run approved, presumed secure bootloaders. An operating system allows control of the machine, so bootloaders can only run approved operating systems. Kernel level drivers allow control of the machine (if you abuse them), so operating systems can only allow approved drivers. Direct hardware access to some hardware allows you to take control of the machine (for example by programming DMA to overwrite bits of the OS), so operating systems can only allow that access to approved programs. And so on. Thus we wind up with secure boot requiring that all privileged code be signed, all the way down the line from the bootloader to graphics drivers.
Any opening in this chain of trust allows an attacker to slip in, intercept the process, take over the machine, and boot the target OS in a malware infected environment. If they can slip in early enough, you're unlikely to notice that your machine takes a few seconds longer to boot than before because it is actually booting a carefully configured minimal OS install for something that is willing to run an unsigned driver and the 'driver' is then taking the machine over to start your real OS.
Of course, the corollary of this is that signing things is not really good enough to keep attackers out. It would only be good enough if the signed things had no vulnerabilities that attackers could exploit, but of course they are going to have vulnerabilities and they're going to get compromised. In theory signing things allows things to be de-approved after the fact when they are found to be vulnerable; in practice, well, there's all sorts of potentially explosive issues.
PS: how this interacts with virtualization makes my brain hurt. In theory I think that all virtualization systems (whether or not they require special hardware privileges) are part of the trust chain and so have to be signed. I have no idea how you enforce that.
Sidebar: the theoretical way around this with hardware support
What you need is a piece of hardware that cannot be faked, can be irreversibly disabled by the system, and is essential to boot your OS. The obvious implementation is to have a crypto processor with preloaded keys that is used to decrypt some portion of your OS. At the point where the boot system transitions out of secure booting, it tells the crypto processor to flush the preloaded keys; if your OS is booted after that point, it will be unable to decrypt portions of itself and won't run. Malware is presumed to not have the keys, so it cannot reload them into the crypto processor.
(I'm engaging in a certain amount of handwaving here about how the keys would work.)
One pragmatic difficulty with this is the question of what prevents the malware from simply providing the necessary decrypted material directly. Almost all of your OS's code and data is not system dependent and cannot be tied to a particular machine so the malware can simply carry around a generic copy of the decrypted, live version of anything that normally comes encrypted. My instinct is that it's hard to have system dependent material that is really crucial and cannot be quietly substituted or patched.