## GNU bc for the birthday paradox

February 13, 2006

So that I don't have to redo this again if I ever need it, the GNU bc magic for the Birthday paradox stuff is as follows. First, you need to start GNU bc as `bc -l` to load the math library, then:

```define p(n, m) {
return 1 - e( (-(n * (n-1))) / (2*m))
}

define p32(n, m) {
return p(n, 2^m)
}
```

The `p()` function is the straightforward p(N, M) function from the main entry. `p32()` is a version that expresses M in bits, so you can just write things like ```p32(10000, 32)``` to find out how inadequate 32-bit session IDs are.

(Answer: very. There's already a 1.16% collision chance with a mere 10,000 session IDs. At 50,000 it's up to 25%.)

Note that this produces answers as probabilities (0.0 to 1.0), not as percentages. Multiply by 100 to get percentages.

Fiddling with `p32()` shows a relatively intuitive result for thinking about how many bits you need: at small probabilities, every added bit added (or subtracted) more or less halves (or doubles) the probabilities. (The effect is not as simple as that when the probability is high.)

GNU bc is programmable enough that we can write a function to do the full probability expansion for p'(N, M, R):

```define pp(n, m, r) {
auto i, prod; prod = 1
for (i = 0; i < r; i++) {
prod *= 1 - n/(m-i)
}
return 1 - prod
}
```

This is not suitable for large values of N, M, or R, but it will give us exact values for where Daniel Martin's approximation (from the comments on the original entry) may not be well behaved.

Using this it's relatively easy to show experimentally that my proposed approximations for p' of either 'p(N+R,M) - p(N,M)' or 'p(N+R,M) - p(N,M) - p(R,M)' are not particularly accurate, even (or especially) for relatively small values of N, M, and R. So much for any mathematical intuition I might claim to possess. We can also use this to cross-check Daniel Martin's approximation, which seems to hold up quite well even in areas that are not really inside its domain.

Using both this and Daniel Martin's approximation, the good news is that people have to do a lot of session ID probes, even with relatively low numbers of bits in your session IDs. Let's do a little chart, assuming that you have three million valid session IDs and the attacker can make a million attempts:

 Bits Collision chance 32 ~100% 40 93.47% 48 1.06% 56 0.004% 64 0.000016% 72 0.00000006%

(Daniel Martin's approximation is bad only for the first two and is certainly a lot faster than the `bc` version, which is subject to its own rounding issues.)

Hopefully you will detect attacks faster than a million attempts, which significantly improves the numbers. Even with 48 bit session IDs, at a hundred thousand attacks there's only a 0.1% success chance and it drops from there.

PS: Daniel Martin's perl version needs a relatively modern perl (or the right compilation options, or both). The Fedora Core 2 version of perl 5.8.3 just reports '`NaN`' a lot; I had to go to a Fedora Core 4 machine with 5.8.6 to get results.

By DanielMartin at 2006-02-14 23:42:07:

Incidentally, a rough estimate of the error in the approximation I gave is that if the approximation yields `x`, then the error is less than `x^2`. Essentially all of this error is coming from the approximation `-ln(x) ~=~ 1-x`; Stirling's approximation holds to within 1% in the numbers we're dealing with.

If we remove that bit of approximating, we get:

``` perl -le 'use bignum;sub xe{\$_*log(\$_)+log(atan2(1,1)*8*\$_)/2
+ log(1+1/(12*\$_));} my(\$m,\$n,\$r)=map {eval "use bignum;\$_"}
@ARGV;print 1-exp(xe(\$m-\$n)+xe(\$m-\$r)-xe(\$m)-xe(\$m-\$n-\$r))'
'2**64' '3*(10**6)' '1*(10**6)'
```

Which works well over the range of your chart. Of course, there's still a range where it stops working - try 128 bit session ids with the same 3 million and 1 million numbers, and you get a probability of guessing a valid session id of -29.

Written on 13 February 2006.