**2017-08-22**

## Understanding rainbow tables

In yesterday's entry, I casually misused the term 'rainbow table' because I thought I knew what it meant when I actually didn't; what I was actually talking about was a (reverse) lookup table, which is not at all the same thing. I'm sure that I've read a bit about rainbow tables before but evidently it didn't stick and I didn't really get them. As a result of dozzie's comment pointing out my error, I wound up reading the Wikipedia writeup of rainbow tables, and I think I may now understand them. In an attempt to make this understanding stick, I'm going to write down my version of how they work and some remaining questions I have.

The core ingredient of a rainbow table is a set of reduction
functions, **R1** through **Rn**. Reduction functions take a hash
value and generate a password from it in some magic way. To create
one entry in a rainbow table, you start with a password *p1*, hash
the password to generate hash *h1*, apply the first reduction
function **R1** to generate a new password *p2*, hash that password
to generate *h2*, apply **R2** to generate password *p3*, and so
on along this *hash chain*. Eventually you hit hash *hn*, use your
last reduction function **Rn**, and generate *p(n+1)*. You then
store *p1*, the first password (the *starting point*), and *p(n+1)*,
the *endpoint* (I think you could store the hash of *p(n+1)* instead,
but that would take up more space).

To see if a given hash of interest is in the rainbow table, you
successively pretend that it is every hash *h1* through *hn* and
compute the output of each of the hash chains from that point onward.
When you're pretending the hash is *hn*, this is just applying
**Rn**; when it's *h(n-1)* this is applying **R(n-1)** to generate
*pn*, hashing it to get *hn*
and then applying **Rn**, and so on. You then check to see if any
of the computed endpoint passwords are in your rainbow table. If
any of them are, you recreate that chain starting with your stored
*p1* starting point up to the point where in theory you should find
the hash of interest, call it *hx*. If the chain's computed *hx*
actually is the hash of interest, password *px* from just before
it is the corresponding password.

Matching an endpoint password doesn't guarantee that you've found a password for the hash; instead it could be that you have two hashes that a given reduction function maps to the same next password. If your passwords of interest are shorter than the output of the password hash function this is guaranteed to happen some of the time (and shorter passwords is the usual case, especially with modern hash functions like SHA256 that have large outputs).

The length of the hash chains in your rainbow table is a tradeoff between storage space and compute time. The longer the hash chains are the less you have to store to cover roughly the same number of passwords, but the longer it will take to check each hash of interest because you will have to compute more versions of chains (and check more endpoint passwords). Also, the longer the hash chain is, the more reduction function variations you have to come up with.

(See the Wikipedia page for an explanation of why you have multiple reduction functions.)

As far as I can see, a rainbow table doesn't normally give you exhaustive coverage of a password space (eg, 'all eight character lower case passwords'). Most of the passwords covered by the rainbow table come from applying your reduction functions to cryptographic hashes; the hashes should have randomly distributed values so normally this will mean that your reduction functions produce passwords that are randomly distributed through your password space. There's no guarantee that these randomly distributed passwords completely exhaust that space. To get a good probability of this, I think you'd need to 'oversample' your rainbow table so that the total number of passwords it theoretically covers is greater than your password space.

(Although I haven't attempted to do any math on this, I suspect that oversampled rainbow tables still take up (much) less space than a full lookup table, especially if you're willing to lengthen your hash chains as part of it. Longer hash chains cover more passwords in the same space, at the cost of more computation.)

The total number of passwords a rainbow table theoretically covers is simply the number of entries times the length of the hash chains. If you have hash chains with 10 passwords (excluding the endpoint password) and you have 10,000 entries in your rainbow table, your table covers at most 100,000 passwords. The number of unique passwords that a rainbow table actually contains is not something that can be determined without recording all of the passwords generated by all of the reduction functions during table generation.

### Sidebar: Storing the endpoint password versus the endpoint's hash

Storing the hash of the endpoint password instead of the endpoint password seems superficially attractive and it feels like it should be better, but I've wound up believing that it's usually or always going to be a bad tradeoff. In most situations, your passwords are a lot shorter than your hash values, and often you already have relatively long hash chains. If you have long hash chains, adding one more entry is a marginal gain and you pay a real space penalty for it. Even if you have relatively short chains and a relatively small table, you get basically the same result in less space by adding another reduction function and officially lengthening your chain by one.

(Reduction functions are easily added as far as I can tell;
apparently they're often basically '(common calculation) + i',
where `i`

is the index of the reduction function.)

** (Previous day | Next day) **