Why I like Python's large integer support

July 18, 2007

The simple explanation of why I like Python's large integer support (somewhat misleadingly called longs for historical reasons) is that they make it convenient to deal with arbitrarily large binary numbers. These crop up surprisingly often, sometimes because a large bunch of bits is the natural representation of things like IPv6 addresses, and sometimes because they make algorithms clearer and simpler.

For example, take the job of generating a random password that's drawn from a given alphabet. The simple approach to doing this is:

from math import log, ceil
def genpass(nchars, alphabet):
    alen = len(alphabet)
    bits = int(ceil(log(alen, 2) * nchars))
    rnum = getnbits(bits)
    pw = ''
    while nchars:
        # extract a character's worth
        # of randomness.
        idx = rnum % alen
        rnum = rnum // alen
        pw += alphabet[idx]
        nchars -= 1
    return pw

This assumes we have a getnbits() routine that returns a random number with (at least) bits bits of randomness.

If Python did not have large integers, getnbits() would have to return a much more complicated data structure to represent all those bits, and extracting a character's worth of randomness would be much more complicated. Because it does have large integers, I can express the algorithm simply and directly and thus be pretty confidant that this is really generating random passwords.

(This is in fact my new random password generator, shorn of the bits that specify the alphabet, let me say how many characters I want, and print the result.)

Sidebar: an implementation of getnbits()

This requires /dev/urandom, which is present on most modern Unixes, and omits error checking in the interests of simplicity:

def getnbits(nbits):
    fp = open("/dev/urandom", "r")
    bytes = (nbits + 7) // 8
    buf = fp.read(bytes)
    base = 0L
    for c in list(buf):
        base = (base << 8L) + ord(c)
    return base

Again we can see how simple this is once we have arbitrarily large integers.

(Using list(buf) is strictly speaking unnecessary but makes it clear that we are deliberately using that string as a list of characters.)

If we wanted to be even trickier, the whole for loop could be rewritten as reduce(lambda a, b: (a << 8L) + ord(b), buf, 0L). This probably serves as a good illustration of why Python 3K is trying to kill reduce off.

Comments on this page:

From at 2007-07-19 09:17:20:
Written on 18 July 2007.
« Random passwords are not necessarily good passwords
A safety tip: keep your different sorts of source trees separate »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Jul 18 23:14:43 2007
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.