Understanding hash length extension attacks

October 30, 2009

Here is a somewhat counterintuitive result (at least for me): suppose that you have the SHA1 hash of some value, SHA1(m). If you know the length of m, it is possible to work out the SHA1 hash of m, plus some magic, plus anything more of your choice. In the literature, this is called an 'extension attack'. I think I understand why this is so and basically why it has to be so, and now I'm going to write it down to make sure of this.

Hash functions like SHA1 operate on fixed-size blocks of input which are successively condensed down to a running internal state (the core operation is to apply a hashing function to the current internal state and the current block to produce the new internal state). This means that all of the input to date is entirely captured in the current internal state.

Thus, to extend a hash, all you need to do is to set up the hash algorithm with the internal state as of the end of the input (more or less) and then start adding more blocks of input. And all you need to recreate the internal state of hashes like SHA1 and MD5 is the public hash value, which you have.

(In fact the SHA1 hash value is exactly the internal state; you can see this in the SHA-1 algorithm.)

The length of m comes into this because of trailing padding. Since the input is in bytes and may not be an exact multiple of the block size, it actually gets padded behind the scenes; part of the padding is the length of the input, for reasons beyond the scope of this entry. So what you really can generate is not SHA1(m || X), where you supply X, but SHA1(m || pad(m) || X), where you supply X and pad(m) is the padding added to m in the original SHA1 computation. In order to generate pad(m), you need to know m's length.

This sounds useless, but suppose that SHA1(m) is a hash signature and m is actually secret-key || plaintext (a 'keyed hash'). Then, if you can know or work out the length of the key, you can manufacture plaintext || pad(m) || your-text plus a valid SHA1 hash signature for it without knowing the secret key. Applications of this ability are left as an exercise.

(You know the length of the plaintext, so all you need to know to determine the length of m and thus generate pad(m) is the length of the key.)

By the way, the moral of this is not that you should make your hash signatures be SHA1(plaintext || secret-key), it is that you should use HMAC and not write your own crypto code.

(I got all of this from Nate Lawson, and finally understood how hash length extension worked through the pictures in the flickr API signature forgery vulnerability description.)

Written on 30 October 2009.
« Why per-process (or per-user) memory resource limits are hard
The risk continuum for standardization success »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Oct 30 01:21:51 2009
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.