## Understanding rainbow tables

August 22, 2017

In yesterday's entry, I casually misused the term 'rainbow table' because I thought I knew what it meant when I actually didn't; what I was actually talking about was a (reverse) lookup table, which is not at all the same thing. I'm sure that I've read a bit about rainbow tables before but evidently it didn't stick and I didn't really get them. As a result of dozzie's comment pointing out my error, I wound up reading the Wikipedia writeup of rainbow tables, and I think I may now understand them. In an attempt to make this understanding stick, I'm going to write down my version of how they work and some remaining questions I have.

To see if a given hash of interest is in the rainbow table, you successively pretend that it is every hash h1 through hn and compute the output of each of the hash chains from that point onward. When you're pretending the hash is hn, this is just applying Rn; when it's h(n-1) this is applying R(n-1) to generate pn, hashing it to get hn and then applying Rn, and so on. You then check to see if any of the computed endpoint passwords are in your rainbow table. If any of them are, you recreate that chain starting with your stored p1 starting point up to the point where in theory you should find the hash of interest, call it hx. If the chain's computed hx actually is the hash of interest, password px from just before it is the corresponding password.

Matching an endpoint password doesn't guarantee that you've found a password for the hash; instead it could be that you have two hashes that a given reduction function maps to the same next password. If your passwords of interest are shorter than the output of the password hash function this is guaranteed to happen some of the time (and shorter passwords is the usual case, especially with modern hash functions like SHA256 that have large outputs).

The length of the hash chains in your rainbow table is a tradeoff between storage space and compute time. The longer the hash chains are the less you have to store to cover roughly the same number of passwords, but the longer it will take to check each hash of interest because you will have to compute more versions of chains (and check more endpoint passwords). Also, the longer the hash chain is, the more reduction function variations you have to come up with.

(See the Wikipedia page for an explanation of why you have multiple reduction functions.)

(Although I haven't attempted to do any math on this, I suspect that oversampled rainbow tables still take up (much) less space than a full lookup table, especially if you're willing to lengthen your hash chains as part of it. Longer hash chains cover more passwords in the same space, at the cost of more computation.)

The total number of passwords a rainbow table theoretically covers is simply the number of entries times the length of the hash chains. If you have hash chains with 10 passwords (excluding the endpoint password) and you have 10,000 entries in your rainbow table, your table covers at most 100,000 passwords. The number of unique passwords that a rainbow table actually contains is not something that can be determined without recording all of the passwords generated by all of the reduction functions during table generation.

### Sidebar: Storing the endpoint password versus the endpoint's hash

(Reduction functions are easily added as far as I can tell; apparently they're often basically '(common calculation) + i', where `i` is the index of the reduction function.)