2009-10-30
Understanding hash length extension attacks
Here is a somewhat counterintuitive result (at least for me): suppose that you have the SHA1 hash of some value, SHA1(m). If you know the length of m, it is possible to work out the SHA1 hash of m, plus some magic, plus anything more of your choice. In the literature, this is called an 'extension attack'. I think I understand why this is so and basically why it has to be so, and now I'm going to write it down to make sure of this.
Hash functions like SHA1 operate on fixed-size blocks of input which are successively condensed down to a running internal state (the core operation is to apply a hashing function to the current internal state and the current block to produce the new internal state). This means that all of the input to date is entirely captured in the current internal state.
Thus, to extend a hash, all you need to do is to set up the hash algorithm with the internal state as of the end of the input (more or less) and then start adding more blocks of input. And all you need to recreate the internal state of hashes like SHA1 and MD5 is the public hash value, which you have.
(In fact the SHA1 hash value is exactly the internal state; you can see this in the SHA-1 algorithm.)
The length of m comes into this because of trailing padding. Since the input is in bytes and may not be an exact multiple of the block size, it actually gets padded behind the scenes; part of the padding is the length of the input, for reasons beyond the scope of this entry. So what you really can generate is not SHA1(m || X), where you supply X, but SHA1(m || pad(m) || X), where you supply X and pad(m) is the padding added to m in the original SHA1 computation. In order to generate pad(m), you need to know m's length.
This sounds useless, but suppose that SHA1(m) is a hash signature and m is actually secret-key || plaintext (a 'keyed hash'). Then, if you can know or work out the length of the key, you can manufacture plaintext || pad(m) || your-text plus a valid SHA1 hash signature for it without knowing the secret key. Applications of this ability are left as an exercise.
(You know the length of the plaintext, so all you need to know to determine the length of m and thus generate pad(m) is the length of the key.)
By the way, the moral of this is not that you should make your hash signatures be SHA1(plaintext || secret-key), it is that you should use HMAC and not write your own crypto code.
(I got all of this from Nate Lawson, and finally understood how hash length extension worked through the pictures in the flickr API signature forgery vulnerability description.)
2009-10-15
One complexity of buffered IO on Unix
It is surprisingly challenging to get buffered IO completely correct on Unix, and one area that trips people up is correct handling of EOF. You see, there's an important different between EOF on files and EOF on terminals, as a consequence of how terminals generate and signal EOF: EOF on files is persistent, but EOF is on terminals is not.
If you read repeatedly from a file that has hit EOF, you will almost always just get another EOF. But EOF on terminals is a transient thing, so if you read again from a terminal, your code will just sit there and the user will have to type another EOF to get you to pay attention.
(The exception for file EOF is if someone else adds more data to the end of the file that you're reading.)
This means that buffering code on Unix must be careful to remember that it has seen an EOF, and not re-read from the underlying file descriptor or IO stream. You cannot use a simpler, stateless implementation; if you do, it will be irritating.
(You can provide an explicit operation to clear the EOF state if you want to. It probably won't be used very often.)
Unfortunately this is an easy (and common) mistake to make, because it's so hard to notice. Since extra reads on files, pipes, network connections and so on are harmless, everything works fine until your code or program is used to read from a terminal, and this may be quite a while.
2009-10-11
Why security bugs aren't bugs
To a large extent, we understand how to write bug-free programs and code: you use and test the program until it doesn't crash or break things, produces the correct results, and doesn't do anything surprising (generate extra messages, pop up stray windows, and so on). There are various different ways to do this and people disagree about which approach is the most effective, but this is the broad view of how most bug-free program development is done.
('bug-free' is of course a misnomer; the goal is to make sure that the program works in general and has a very low chance of having some undiscovered bug, especially a serious bug.)
Following this general development process will not give you programs without security bugs, except through coincidence or luck. It will not do so even if you specifically attempt to test for security bugs by attacking your program with things such as input fuzzers, although you will eliminate a lot of the low-hanging fruit. We can argue about why this happens (I have my own views), but there's an overwhelming amount of experimental evidence that it does; just look at all of the security bugs in published, reasonably bug-free programs (and how many would not have been found by fuzz testing and so on).
This is why I say that security bugs aren't bugs. You can't render a program free of security bugs with the same techniques that are commonly used to render a program free of ordinary bugs. Security bugs are qualitatively different, for whatever reason, and getting rid of them requires an extra and different effort over and above regular debugging.
(Yes, security bugs are often ultimately caused by regular bugs, but they're clearly not regular bugs that are amenable to being found by our usual development techniques or we wouldn't have so many well debugged programs with serious security holes.)
My personal opinion is that neither testing in general nor obsessive error checking are sufficient to get rid of security bugs (although you will get a certain amount of low-hanging fruit). The really dangerous bugs are subtle issues that we can only currently find through outside code inspection.
2009-10-09
A fun bug I once ran across
Here's an interesting and fun bug (in C) that I once ran across, years ago. Suppose that you have a hash table, with entries chained off each other in case of a hash collision, and a function to remove an element from it that looks like this:
void remove(elem *mp)
{
elem *np;
np = table[HASH(mp)];
if (np == mp) {
table[HASH(mp)] = mp->next;
return;
}
while (np != NULL) {
if (np->next == mp) {
np->next = mp->next;
break;
}
}
return;
}
Since I have told you that there's a bug here, you can probably spot it;
the while loop doesn't actually traverse down the chain (it's missing
an 'np = np->next;' at the bottom).
What makes this an interesting bug is that this mistake is actually somewhat obscure. The infinite loop only happens if you have a chain that's at least three elements long and you're not trying to remove the first or second element, so how often the bug triggers depends on how sparsely populated the hash table is and on the removal pattern for elements.
In the actual case where I ran across this, it was in fact relatively rare for the infinite loop to trigger, or at least it was rare enough that this code could pass testing and be deployed before it started blowing up (and it did not blow up widely).
(If you check the code coverage of your tests, this is one of the cases where it is important to look for not just 100% code coverage but that both branches of all decision points have been followed. Technically you can get 100% code coverage here without exposing the bug since the problem is missing code, not incorrect code.)
2009-10-05
The problem with security bugs is that they aren't bugs
Here is a thesis that I have been mulling over lately:
One reason that security bugs are hard is that they aren't bugs. In regular bugs, your code doesn't do things that you want it to. In security bugs, the end result is that your code does extra things, things that let an attacker in. Ergo, security bugs are actually features that you didn't intend your code to have.
(Often security bugs are created by low-level mistakes that are regular bugs. I'm taking the high level view here based on how these bugs make the code behave.)
This makes security bugs much harder to find, especially in ordinary development. It's easy to notice when your code doesn't do something that it should (or does it incorrectly), but it's much harder to notice when it does extra things, and even harder to spot that it merely could do extra things (especially when we can be blind about it). As a result, the existence of extra features, security bugs included, rarely surfaces during ordinary testing.
(This is a magnified version of how tests are not very good at proving negatives. Proving that your code doesn't have extra features is even harder than proving that it doesn't have any bugs.)
The immediate, obvious corollary is that most normal techniques for winding up with bug-free code are probably ineffective at making sure that you don't have security bugs. You're likely to need an entirely different approach, which means directly addressing security during development instead of assuming that your normal development process will take care of security bugs too.
(Your normal development process might also catch security bugs, but it depends a lot on what that process is. I suspect that things like code reviews are much more likely to do so than, say, TDD.)