GNU bc for the birthday paradox

February 13, 2006

So that I don't have to redo this again if I ever need it, the GNU bc magic for the Birthday paradox stuff is as follows. First, you need to start GNU bc as bc -l to load the math library, then:

define p(n, m) {
return 1 - e( (-(n * (n-1))) / (2*m))
}

define p32(n, m) {
return p(n, 2^m)
}

The p() function is the straightforward p(N, M) function from the main entry. p32() is a version that expresses M in bits, so you can just write things like p32(10000, 32) to find out how inadequate 32-bit session IDs are.

(Answer: very. There's already a 1.16% collision chance with a mere 10,000 session IDs. At 50,000 it's up to 25%.)

Note that this produces answers as probabilities (0.0 to 1.0), not as percentages. Multiply by 100 to get percentages.

Fiddling with p32() shows a relatively intuitive result for thinking about how many bits you need: at small probabilities, every added bit added (or subtracted) more or less halves (or doubles) the probabilities. (The effect is not as simple as that when the probability is high.)

GNU bc is programmable enough that we can write a function to do the full probability expansion for p'(N, M, R):

define pp(n, m, r) {
auto i, prod; prod = 1
for (i = 0; i < r; i++) {
  prod *= 1 - n/(m-i)
}
return 1 - prod
}

This is not suitable for large values of N, M, or R, but it will give us exact values for where Daniel Martin's approximation (from the comments on the original entry) may not be well behaved.

Using this it's relatively easy to show experimentally that my proposed approximations for p' of either 'p(N+R,M) - p(N,M)' or 'p(N+R,M) - p(N,M) - p(R,M)' are not particularly accurate, even (or especially) for relatively small values of N, M, and R. So much for any mathematical intuition I might claim to possess. We can also use this to cross-check Daniel Martin's approximation, which seems to hold up quite well even in areas that are not really inside its domain.

Using both this and Daniel Martin's approximation, the good news is that people have to do a lot of session ID probes, even with relatively low numbers of bits in your session IDs. Let's do a little chart, assuming that you have three million valid session IDs and the attacker can make a million attempts:

Bits Collision chance
32 ~100%
40 93.47%
48 1.06%
56 0.004%
64 0.000016%
72 0.00000006%

(Daniel Martin's approximation is bad only for the first two and is certainly a lot faster than the bc version, which is subject to its own rounding issues.)

Hopefully you will detect attacks faster than a million attempts, which significantly improves the numbers. Even with 48 bit session IDs, at a hundred thousand attacks there's only a 0.1% success chance and it drops from there.

PS: Daniel Martin's perl version needs a relatively modern perl (or the right compilation options, or both). The Fedora Core 2 version of perl 5.8.3 just reports 'NaN' a lot; I had to go to a Fedora Core 4 machine with 5.8.6 to get results.

Written on 13 February 2006.
« The problem with <pre>
An advantage of using a non-standard shell »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Feb 13 16:38:53 2006
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.