2011-10-24
Salting passwords compared to slow password hashing
When you're storing encrypted passwords, you can do two different things to ruin the life of an attacker who got your password hashes, or at least annoy them; you can salt your passwords in various ways and you can use a slow password hashing function. These two have different but complementary effects.
(I've talked about the effects of the various ways of salting passwords back in SaltingPasswords.)
Salting passwords reduces the payoff an attacker gets for making a single password guess. They slow down how fast an attacker can compromise a bunch of your accounts, but salts do not slow down how fast an attacker can compromise a specific account. Individually salted passwords with a fast hash function increase the time to mass compromises but do not necessarily increase the time to the first account compromise.
Using a slow password hashing function slows down how fast an attacker can make password guesses but by itself does nothing to reduce the payoff the attacker gets for making those guesses. If you have no salts you're vulnerable to rainbow tables, and if you just have a single site-wide salt every (slow) password guess can be checked against all of your users at once.
If you want to delay the time to the first account compromise, the worst situation is a fast password hash (like SHA1 or MD5) with at most a single site wide salt; an attacker can make very fast hash checks and all they need is some person to have a reasonably weak password. If you have per user salts with a fast hash, the attacker needs to get lucky by picking a user that used a weak password. How much luck they need depends on the password practices of your users; if most users choose weak passwords, the time to first compromise will barely budge between a site-wide salt and a per-user salt.
(If you want scary figures on how fast this can be done today, see How To Safely Store A Password.)
It's tempting to summarize this as that having good salting means that the attacker can't rapidly break lots of your users at once while having slow hashing means that they can't rapidly break any of your users, but it's not quite that simple. Even with slow password hashing, if you don't have per user salts and you have users who pick (very) weak passwords the attacker won't need to make very many slow password guesses in order to find a vulnerable account.
(Even with a slow hashing function it doesn't take too much time to try 20,000 to 30,000 guesses. That's almost certainly enough to find most seriously weak common passwords.)
(This entry was sparked by reading On cryptography and dogmas (via Hacker News). I wanted to fix some of this in my own head.)
2011-10-11
Why people are attracted to minimal language cores
In the last entry I described how some languages rewrite loops with loop bodies by turning the loop body into an anonymous function that the loop invokes repeatedly. I then mentioned that some languages go all the way to turning loops into tail-recursive function calls. You might wonder why languages do this and why these sort of crazy transformations are considered attractive.
There are at least two reasons that these things are popular, for some definition of popular. First, the intellectual purity of a minimal core language appeals to a certain sort of language wonk; they tend to call the result simpler. These people are often drawn towards Lisp (especially Scheme, perhaps the canonical illustration of this philosophy in action).
(Lisp is not the only language family that has this sort of minimalism. For example, I think that Forth is just as minimal in its own way, although it gets far less language design attention.)
Second, it means that your code generator or interpreter core only needs
to handle a minimal set of things because higher levels have transformed
code written in the general version of the language down into this
minimal form (often automatically). This has traditionally simplified
optimizers; rather than implementing very similar analysis for each of
a whole bunch of control flow constructs, they only have to analyze one
really well. Which control flow construct you pick as your base depends
on what language you're compiling; some languages pick goto, for
example.
(Then you can get a PhD thesis or two out of how to do this analysis.)
My understanding is that the pragmatic evidence is mixed on whether this is a good idea or not. There have certainly been some significant successes, but I have also heard stories of compilers where the frontend carefully reduced all control flow constructions down to the single fundamental control flow atom, passed the whole thing to the optimizer, and had the optimizer reverse engineer the high level control flow stuff again from the low-level minimized control flow information.
(The argument for still doing this even when it's inefficient is that this lets the optimizer (re)discover the true high level control flows in your program, regardless of how you actually wrote the code. In a sense it's discovering what you actually meant (or at least, what you created) instead of just what you wrote.)
2011-10-10
Arranging scopes for for loops
Suppose that you have a language where for loop index variables are local to the for loop; they aren't visible outside it. What I'll call the 'for loop closure problem' is that in a straightforward implementation of their lifetime, the following pseudo-code doesn't work the way people think it will:
nums := [];
for (var i = 0; i < 100; i++)
nums.append(func(x) {i+x});
In straightforward scope and variable lifetime semantics, every
lambda returns the same result because each lambda captured
the same i (and after the loop finishes, the value of i is
100). When I was discussing what Python closures close over, I mentioned that you could fix this with
careful use of how scopes are defined.
First, let's rewrite this for loop to show the scope explicitly:
{var i;
for (i = 0; i < 100; i++)
nums.append(func(x) {i+x});
}
If we want each iteration of the loop (and thus each lambda) to get
a different version of i, we need to explicitly create a scope for
each of them. However, we need all of the following to be true:
- the for loop condition has a single sustained scope, in which its
version of
iis the same over all iterations of the loop. This creates an outer version ofi. - the for loop body has a per-iteration scope which contains an inner
version of
i. - the inner version of
iis initialized from the outer version ofi.
Of these the only tricky requirement is the third one.
You could fix this with specific rules for scopes in for loops, but as
it happens we don't have to bother with this; the language already has
a way of expressing this outcome within the existing scoping rules. We
just say that the loop body acts as if it was an anonymous function
(well, closure) that is invoked on each loop iteration. This effectively
rewrites the loop as:
{var i;
for (i = 0; i < 100; i++)
func(i) {
nums.append(func(x) {i+x});
}(i);
}
Anonymous functions already create new scopes when invoked and function calls are the general method for passing variables into a scope, so we're done.
(Various sorts of optimizations can and generally are applied in languages that define for loops this way, because you really don't want to be creating a call frame for every loop iteration.)
I believe that doing this sort of conceptual rewriting is especially popular in languages (like many versions of Lisp) that consider lambdas and closures to be fundamental building blocks of the language. In these languages, rewriting loops as invocation of anonymous functions makes a lot of sense even if you don't care about the scoping rules. The extreme version is to not even have loop iteration as such in the language's core; instead you transform loops into recursive function calls. (Full tail call optimization is required for obvious reasons.)