Always sign exactly what you are authenticating

January 28, 2010

I can't claim to know very much about cryptography programming, but I like to think that I have picked up on a few mistakes to avoid. Here's one of them: you want to sign exactly what you are authenticating, not some mangled version of it.

(Note that 'canonicalizing' things is a form of mangling them.)

Suppose, as a not entirely hypothetical example, that you are signing some sort of web request with a bunch of (URL) parameters. In order to deal with annoying software, you define a canonical form for these URL parameters, which is to sort them into alphabetical order and concatenate them all together; you sign this mangled, canonical result.

Great, except that that this canonical form has just allowed an attacker to turn 'login=fred&next=10' into 'login=fredne&xt=10' (and worse is possible if you do not sort the parameters into order but use them in the request's order). As Colin Percival puts it, good crypto signatures are designed so it is very difficult to produce collisions and when you mangle what you sign, part of what you do is that you create opportunities for attackers to produce deliberate collisions. This rarely ends well.

Another part of what you do when you mangle is that you are no longer necessarily verifying what you think you are verifying. Instead of verifying what you are actually about to act on and use, you are verifying something else, some mangled transformation of it. This is almost invariably a mistake, one that attackers will be happy to take advantage of to slip dangerous things into the gap between what you verify and what you act on.

(It is possible that your mangling will be un-exploitable. But the historical odds are against you; over and over, people who have done this sort of mangling and imprecise verification have turned out to have created exploitable vulnerabilities. And if you are writing crypto code, you should not be betting on things going your way.)

Sidebar: how I think you have to do canonicalization

Disclaimer: you should not necessarily trust what I write about how to do crypto (as opposed to how not to do crypto), because I am not an experienced crypto person.

I believe that the corollary to this is that if you absolutely have to do mangling and canonicalization, you must do it as part of generating the plaintext; you take raw input, transform it into the canonical form, sign the canonical form, and output the canonical form and the signature. On verification, you canonicalize, verify the canonicalization, and then use the canonicalization for further processing, not the raw input.

If you cannot use your canonical form as input to the rest of your processing or as your public plaintext, you need a new canonical form. Try again.


Comments on this page:

By nothings at 2010-01-29 01:45:39:

This doesn't really make any sense without reading the example, because I don't think canonicalizing is the problem. I mean, I agree "mangling" gives an opportunity for problems, but normal canonicalization doesn't. If the canonical version of a URL involves resorting parameters into alphabetical order (but leaving all the delimeters in), you won't introduce any security flaws. This is actually what I thought you meant, so I wasn't clear exactly how this was a problem.

Removing delimeters obviously causes problems, but this is a pretty well-known and obvious problem. The point of the alphabetizing canonicalization is you're removing some semantically meaningless information: "X is the canonical form of Y" normally means "X communicates everything Y does". If you strip the delimiters out, you obviously lose the ability to recover the semantics of Y, so it's not really a "canonical" form in any sense I've ever heard the term used. (See, for instance, 'canonical form' on wikipedia. When I hear 'canonicalizaing a URL', I imagine 'the URL in canonical form', not 'a thing that isn't a URL at all'. Dropping the delimiters is more like hashing than it is like canonicalization, since you're allowing multiple semantically-different things to map to the same output, and I'm not sure how this isn't obvious, notwithstanding your claims to the contrary.)

Obviously this matches your comment about being able to do further processing on the canonical form, but I feel like there's something a lot more natural to explain this rather than "mangling and canonicalization are bad" and then a lot of caveats to explain the acceptable limits for canonicalization.

Written on 28 January 2010.
« AT&T's mad unbundling and the damage it did to Unix
A theory about Apple's new iPad »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Jan 28 02:00:43 2010
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.