2006-02-13
GNU bc for the birthday paradox
So that I don't have to redo this again if I ever need it, the GNU bc
magic for the Birthday paradox stuff is
as follows. First, you need to start GNU bc as bc -l
to load the math
library, then:
define p(n, m) { return 1 - e( (-(n * (n-1))) / (2*m)) } define p32(n, m) { return p(n, 2^m) }
The p()
function is the straightforward p(N, M) function from the
main entry. p32()
is a version that
expresses M in bits, so you can just write things like p32(10000,
32)
to find out how inadequate 32-bit session IDs are.
(Answer: very. There's already a 1.16% collision chance with a mere 10,000 session IDs. At 50,000 it's up to 25%.)
Note that this produces answers as probabilities (0.0 to 1.0), not as percentages. Multiply by 100 to get percentages.
Fiddling with p32()
shows a relatively intuitive result for
thinking about how many bits you need: at small probabilities, every
added bit added (or subtracted) more or less halves (or doubles)
the probabilities. (The effect is not as simple as that when the
probability is high.)
GNU bc is programmable enough that we can write a function to do the full probability expansion for p'(N, M, R):
define pp(n, m, r) { auto i, prod; prod = 1 for (i = 0; i < r; i++) { prod *= 1 - n/(m-i) } return 1 - prod }
This is not suitable for large values of N, M, or R, but it will give us exact values for where Daniel Martin's approximation (from the comments on the original entry) may not be well behaved.
Using this it's relatively easy to show experimentally that my proposed approximations for p' of either 'p(N+R,M) - p(N,M)' or 'p(N+R,M) - p(N,M) - p(R,M)' are not particularly accurate, even (or especially) for relatively small values of N, M, and R. So much for any mathematical intuition I might claim to possess. We can also use this to cross-check Daniel Martin's approximation, which seems to hold up quite well even in areas that are not really inside its domain.
Using both this and Daniel Martin's approximation, the good news is that people have to do a lot of session ID probes, even with relatively low numbers of bits in your session IDs. Let's do a little chart, assuming that you have three million valid session IDs and the attacker can make a million attempts:
Bits | Collision chance |
32 | ~100% |
40 | 93.47% |
48 | 1.06% |
56 | 0.004% |
64 | 0.000016% |
72 | 0.00000006% |
(Daniel Martin's approximation is bad only for the first two and is
certainly a lot faster than the bc
version, which is subject to its
own rounding issues.)
Hopefully you will detect attacks faster than a million attempts, which significantly improves the numbers. Even with 48 bit session IDs, at a hundred thousand attacks there's only a 0.1% success chance and it drops from there.
PS: Daniel Martin's perl version needs a relatively modern perl (or
the right compilation options, or both). The Fedora Core 2 version
of perl 5.8.3 just reports 'NaN
' a lot; I had to go to a Fedora
Core 4 machine with 5.8.6 to get results.
The problem with <pre>
Generally, <pre> is a fine thing, and it's become the de-facto way of writing any number of 'computer' things; code, Unix sessions, even equations, as as various WanderingThoughts entries illustrate. But there's a problem with it: <pre> text doesn't line-wrap.
The consequence is that if you write a long <pre> line, the browser will happily force a line wider than the browser window, making the reader either widen their browser (if they can widen it enough) or scroll. For WanderingThoughts it's even worse, because a CSS irritation forces me to lay it out using a table. Text inside a table cell is wrapped not at the browser width but at the table cell's width, and a single long line widens the entire cell's width, causing all text to be that wide. The net result is that if you don't (or can't) make your browser wide enough, you can't read anything.
Sometimes this is what's required and damn the torpedoes (and I'll 'line-wrap' by hand to try to avoid too-long lines). But there's a surprisingly large number of times when what you really want is just monospaced text with forced line breaks where the raw text has line breaks; extra line-wrapping doesn't actually hurt (especially if it's clear).
(I might as well admit that this is part of the 'personal aesthetic reasons' I alluded to in my comment on this entry; I browse with a fairly narrow browser window.)
In DWikiText my solution has been to write 'manual' <pre> text using
_
and [[...|]], more or less like this:
_[[... ... ...|]]_ \\ _[[... ... ...|]]_
But this is awkward, doesn't clearly show automatically wrapped lines, and compresses whitespace; plus, it requires manual work. Ideally, DWikiText would have a formatting option that makes it easy to do the right thing. (After all, one of the reason <pre> text gets used here so much is that it's so easy.)
It's possible to use CSS to get most of the way to what I want (not all;
there is no way to not compress whitespace without disallowing automatic
line-wrapping). The CSS that I've come up with so far is a <div> with
'font-family: monospace; margin-left: 1em; text-indent: -1em;', and then
each line inside it is another <div> (empty lines have to be forced with
<br>). This causes wrapped lines to be indented by a bit, to make them
stand out. Of course this looks pretty bad if you aren't using CSS, and
I still value readability in lynx
.
(It's still a temptation to implement it in DWiki.)