## How many bits of information are in a password?

July 9, 2007

The number of bits of information in a password are a function of the alphabet that the password is drawn from and how many characters long it is. The formula is:

nbits = ceil(log2(len(alphabet)) * nchars)

So what does that mean? Let's take the case of 8 character long Unix passwords, and do a table:

 alphabet total bits (bits per character) lower case ASCII 38 4.7 lower case plus digits 42 5.2 upper and lower case ASCII 46 5.7 letters plus digits 48 5.95 letters, digits, and all punctuation characters 53 6.55

(The version of 'all punctuation' I'm using is Python's, and has 32 characters.)

As we can see, conventional Unix passwords are not all that strong. Nor does lengthening them help a lot; at the most generous assumption, you need 20 characters to get a 128-bit password.

The same result can be applied to passphrases for SSH keys and the like. If your passphrase is lower case plus spaces, you have about 4.75 bits of information per character and you need 27 characters to get 128 bits.

(The number of bits of information in a password is how many bits of randomness it has and thus how many random bits you need to generate as strong a random password as you can get, and an indicator of how strong a cryptographic key it is.)

By nothings at 2007-07-10 01:08:09:

On the other hand, we can see the silliness of the admonition for people to use at least one digit and mixed case to make their password unguessable; you can get an equivalent increase in bits by just going from 8 characters to 10 characters.

(Of course, this is assuming brute-force cracking the alphabet, rather than the idea that people's passwords may be dictionary-able, which digits will change.)

By Dan.Astoorian at 2007-07-10 16:11:31:

The number of bits of randomness in the password is really only of consequence in scenarios which admit brute-force or dictionary attacks. For example, the strength of the passphrase which protects an SSH private key is irrelevant unless an attacker is able to obtain a copy of the private keyfile: the passphrase itself is of no value at all to traffic analysis attacks.

Moreover, the algorithms applied to passwords are not necessarily comparable to those applied to cryptographic keys in general, so talking about "128-bit passwords" as though they were somehow comparable to 128-bit symmetric keys in general may leave one with the wrong impression unless one knows exactly how the passwords are being used.

It's also possible for the smaller key space afforded by passwords to be mitigated by other factors, such as the use of more expensive symmetric algorithms to increase the cost of guessing a given key, or using tamper-resistant hardware which limits the number of password attempts.

--Dan

Passwords need to be random. period.

I wrote a complicated generator in python, then in C. <https://github.com/scholarly/kbsum/tree/master/genpass>

Then I discovered a very complicated one that someone else wrote. I generated exactly one real pass phrase with it, then abandoned it because "pronounceable according to NIST FIPS 181" does not mean memorable. <https://packages.debian.org/stable/apg>

Later, I wrote a simple one in Go. <https://github.com/lernisto/didactic-journey/blob/master/apg/main.go>

I love most the simplicity of the last one, and the memorability of the first one. I'll probably combine them some day.

With (1) and (3) I know exactly how many bits are in the password, because what I type in is deterministic representation of 128 random bits.

Written on 09 July 2007.