GNU bc for the birthday paradox
So that I don't have to redo this again if I ever need it, the GNU bc
magic for the Birthday paradox stuff is
as follows. First, you need to start GNU bc as bc l
to load the math
library, then:
define p(n, m) { return 1  e( ((n * (n1))) / (2*m)) } define p32(n, m) { return p(n, 2^m) }
The p()
function is the straightforward p(N, M) function from the
main entry. p32()
is a version that
expresses M in bits, so you can just write things like p32(10000,
32)
to find out how inadequate 32bit session IDs are.
(Answer: very. There's already a 1.16% collision chance with a mere 10,000 session IDs. At 50,000 it's up to 25%.)
Note that this produces answers as probabilities (0.0 to 1.0), not as percentages. Multiply by 100 to get percentages.
Fiddling with p32()
shows a relatively intuitive result for
thinking about how many bits you need: at small probabilities, every
added bit added (or subtracted) more or less halves (or doubles)
the probabilities. (The effect is not as simple as that when the
probability is high.)
GNU bc is programmable enough that we can write a function to do the full probability expansion for p'(N, M, R):
define pp(n, m, r) { auto i, prod; prod = 1 for (i = 0; i < r; i++) { prod *= 1  n/(mi) } return 1  prod }
This is not suitable for large values of N, M, or R, but it will give us exact values for where Daniel Martin's approximation (from the comments on the original entry) may not be well behaved.
Using this it's relatively easy to show experimentally that my proposed approximations for p' of either 'p(N+R,M)  p(N,M)' or 'p(N+R,M)  p(N,M)  p(R,M)' are not particularly accurate, even (or especially) for relatively small values of N, M, and R. So much for any mathematical intuition I might claim to possess. We can also use this to crosscheck Daniel Martin's approximation, which seems to hold up quite well even in areas that are not really inside its domain.
Using both this and Daniel Martin's approximation, the good news is that people have to do a lot of session ID probes, even with relatively low numbers of bits in your session IDs. Let's do a little chart, assuming that you have three million valid session IDs and the attacker can make a million attempts:
Bits  Collision chance 
32  ~100% 
40  93.47% 
48  1.06% 
56  0.004% 
64  0.000016% 
72  0.00000006% 
(Daniel Martin's approximation is bad only for the first two and is
certainly a lot faster than the bc
version, which is subject to its
own rounding issues.)
Hopefully you will detect attacks faster than a million attempts, which significantly improves the numbers. Even with 48 bit session IDs, at a hundred thousand attacks there's only a 0.1% success chance and it drops from there.
PS: Daniel Martin's perl version needs a relatively modern perl (or
the right compilation options, or both). The Fedora Core 2 version
of perl 5.8.3 just reports 'NaN
' a lot; I had to go to a Fedora
Core 4 machine with 5.8.6 to get results.
Comments on this page:

