2006-02-10
The charm of Sun's Freeware collection
Presented on a stock install Solaris 9:
$ cat hello.cpp
#include <iostream.h>
main()
{
cout << "Hello World!\n";
}
$ g++ hello.cpp
$ ./a.out
ld.so.1: a.out: fatal: libstdc++.so.2.10.0: \
open failed: No such file or directory
Killed
(Line wrapping added for clarity.)
To get the Sun Freeware g++ working, the magic solution is 'g++
-Wl,-R/opt/sfw/lib ...', which adds /opt/sfw/lib (where the necessary
g++ internal library is) to the runtime shared library lookup path.
The irony here is that the Sun Freeware g++ has already been
configured to look in /opt/sfw/lib (or it would report a compile
time error about not being able to find the library). But nobody
went the extra step to add it to the default runtime path, and no
one bothered to test the Freeware g++ before they shipped it.
If I have to use g++ on Solaris much, I expect I'll just write a cover script for gcc and g++ that just always adds that argument. (Or replace the g++ version entirely, since it is gcc 2.95.3.)
Update: courtesy of the htdig FAQ,
another workaround is to set the LD_RUN_PATH environment variable to
/opt/sfw/lib when you compile stuff.
Session IDs and the Birthday Paradox
If you have a database (or in general some writable store), the best way to do authentication in a web app is with session ID cookies. Every time someone logs in, you pick a large random number to be their session ID, give them a cookie with the session ID, and then store all of the details in your database under the session ID. When they come back, you get the session ID cookie, look it up in the database, make sure it's (still) valid, and go.
What prevents attackers from getting in is the difficulty of guessing
a usable session ID. So, assuming you're using a good random number
generator such as /dev/urandom on Linux, how big do your session ID
numbers have to be to make one infeasible to guess?
This is a slight variant on the 'Birthday paradox' problem. With N items
and M possible values they can have, the basic Birthday paradox
approximation is (in Python notation):
1 - exp( (-(N*(N-1))) / (2*M))
(call this p(N, M).)
To start with, let's say that you expect three million valid session IDs at a time; this might seem extreme, but LiveJournal currently has two million active users. I've made a handy little table of results:
| Bits | Collision chance |
| 32 | 100% |
| 40 | 98.3% |
| 48 | 1.59% |
| 56 | 0.006% |
| 64 | 0.00002% |
| 72 | ~ 0% |
(Disclaimer: I think I, and Python, got the math right. Please feel free to correct me if I didn't. '~ 0%' means 'so small that it prints as 0'; the collision chance is clearly never literally 0.)
So 72 bits (9 bytes) should be pretty durn safe. It even remains safe at 30 million valid session IDs, which is a pretty good sign that people trying to guess valid session IDs will be there for a long, long time. If you expect 300 million valid session IDs, you'll want to go to more bits.
On most modern systems, the easy way to get this much good randomness is
just to read however many bytes you need from /dev/urandom. (Please
don't try to make up your own random number generator. The security
world is littered with the shattered remnants of people who tried that.)
More probability and math
Now, this isn't quite the answer to the question I posed at the start, which I can rephrase as:
Starting with
Nunique session IDs out ofMpossible values, what are the odds of finding a duplicate inRadditional guesses?
The figures I gave are somewhat of a handwave about, effectively,
p(N+R, M), which is more or less the same as p(N, M) if R is
(very) small compared to N.
A friend pointed out that this is just the probability of making R
different wrong guesses, so the full expression for the probability
is:
1 - (1 - N/M) * (1 - N/(M-1)) * ... * (1 - N/(M-R))
(Update: actually the terms should run only to (M-(R-1)).)
However, computing this for large Rs is irksome (unless you have a
good symbolic math package around), and I don't know if there's a good
approximation or summation.
I think p(N+R, M) - p(N, M) might be close, although that sort of overcounts by roughly p(R, M) (since the attacker's values will be non-colliding if the attacker has any brains).
Sidebar: giving everyone on earth a randomly assigned unique ID
Let's assume that we wanted to randomly assign everyone on earth a numeric ID, and be relatively sure that there were no collisions. How large a bit space would we need?
To have a margin for error, let's use a world population figure of ten billion.
| Bits | Collision chance |
| 64 | 93.35% |
| 72 | 1.05% |
| 80 | 0.004% |
| 88 | 0.000016% |
| 96 | 0.000000063% |
| 104 | 0.0000000002% |
| 128 | 0.000000000000000014% |
| 160 | ~ 0% |
(These figures are from bc, because python ran out of precision. MD5
hashes are 128 bits and SHA1 hashes are 160 bits.)