**2009-10-30**

## Understanding hash length extension attacks

Here is a somewhat counterintuitive result (at least for me): suppose
that you have the SHA1 hash of some value, *SHA1(m)*. If you know the
length of m, it is possible to work out the SHA1 hash of m, plus
some magic, plus anything more of your choice. In the literature,
this is called an 'extension attack'. I think I understand why this
is so and basically why it has to be so, and now I'm going to write
it down to make sure of this.

Hash functions like SHA1 operate on fixed-size blocks of input which are successively condensed down to a running internal state (the core operation is to apply a hashing function to the current internal state and the current block to produce the new internal state). This means that all of the input to date is entirely captured in the current internal state.

Thus, to extend a hash, all you need to do is to set up the hash algorithm with the internal state as of the end of the input (more or less) and then start adding more blocks of input. And all you need to recreate the internal state of hashes like SHA1 and MD5 is the public hash value, which you have.

(In fact the SHA1 hash value is exactly the internal state; you can see this in the SHA-1 algorithm.)

The length of m comes into this because of trailing padding. Since the
input is in bytes and may not be an exact multiple of the block size,
it actually gets padded behind the scenes; part of the padding is the
length of the input, for reasons beyond the scope of this entry.
So what you really can generate is not *SHA1(m || X)*, where you supply
X, but *SHA1(m || pad(m) || X)*, where you supply X and pad(m) is
the padding added to m in the original SHA1 computation. In order to
generate pad(m), you need to know m's length.

This sounds useless, but suppose that *SHA1(m)* is a hash signature
and *m* is actually *secret-key || plaintext* (a 'keyed hash'). Then,
if you can know or work out the length of the key, you can manufacture
*plaintext || pad(m) || your-text* plus a valid SHA1 hash signature for
it **without knowing the secret key**. Applications of this ability are
left as an exercise.

(You know the length of the plaintext, so all you need to know to
determine the length of *m* and thus generate *pad(m)* is the length of
the key.)

By the way, the moral of this is not that you should make your hash
signatures be *SHA1(plaintext || secret-key)*, it is that you should use
HMAC and not write your own crypto
code.

(I got all of this from Nate Lawson, and finally understood how hash length extension worked through the pictures in the flickr API signature forgery vulnerability description.)

** (Previous day | Next day) **