Wandering Thoughts archives

2007-07-18

Why I like Python's large integer support

The simple explanation of why I like Python's large integer support (somewhat misleadingly called longs for historical reasons) is that they make it convenient to deal with arbitrarily large binary numbers. These crop up surprisingly often, sometimes because a large bunch of bits is the natural representation of things like IPv6 addresses, and sometimes because they make algorithms clearer and simpler.

For example, take the job of generating a random password that's drawn from a given alphabet. The simple approach to doing this is:

from math import log, ceil
def genpass(nchars, alphabet):
    alen = len(alphabet)
    bits = int(ceil(log(alen, 2) * nchars))
    rnum = getnbits(bits)
    pw = ''
    while nchars:
        # extract a character's worth
        # of randomness.
        idx = rnum % alen
        rnum = rnum // alen
        pw += alphabet[idx]
        nchars -= 1
    return pw

This assumes we have a getnbits() routine that returns a random number with (at least) bits bits of randomness.

If Python did not have large integers, getnbits() would have to return a much more complicated data structure to represent all those bits, and extracting a character's worth of randomness would be much more complicated. Because it does have large integers, I can express the algorithm simply and directly and thus be pretty confidant that this is really generating random passwords.

(This is in fact my new random password generator, shorn of the bits that specify the alphabet, let me say how many characters I want, and print the result.)

Sidebar: an implementation of getnbits()

This requires /dev/urandom, which is present on most modern Unixes, and omits error checking in the interests of simplicity:

def getnbits(nbits):
    fp = open("/dev/urandom", "r")
    bytes = (nbits + 7) // 8
    buf = fp.read(bytes)
    fp.close()
    base = 0L
    for c in list(buf):
        base = (base << 8L) + ord(c)
    return base

Again we can see how simple this is once we have arbitrarily large integers.

(Using list(buf) is strictly speaking unnecessary but makes it clear that we are deliberately using that string as a list of characters.)

If we wanted to be even trickier, the whole for loop could be rewritten as reduce(lambda a, b: (a << 8L) + ord(b), buf, 0L). This probably serves as a good illustration of why Python 3K is trying to kill reduce off.

LargeIntegersLike written at 23:14:43; Add Comment

2007-07-13

You can't change a Python function's local variables from outside

In an aside in a recent entry, I wrote:

With sufficient cleverness, you can construct a version of capture() that is passed store's name and puts the result in it directly, without needing you to make it an array and use store[0].

I'm wrong. In Python, no outside power can change a function's local variables (more precisely, no outside power can change the name bindings, doing the equivalent of 'localvar = something'). While you can make a version of capture() that sets global variables, setting a local variable in the function that called it is beyond its power.

Once I thought about it, this limitation is a logical consequence of how Python implements local variables. Since local variables aren't stored in a Python data structure, the CPython core would have to implement special functions to modify their values, which would only be useful for debuggers and people who want to commit dubious hacks.

(This does mean that not even Python debuggers can change local variables, although not all of them will clearly tell you this.)

FrozenLocalVariables written at 13:17:46; Add Comment

2007-07-11

Why I wish Python had assignment in conditionals

There's a lot of times where a function wants to return more than a plain boolean; for example, validation routines often want to return some sort of explanation of the validation error (if there is one). At the same time, the most natural way to use the function is often in the flow of a conditional:

if blah(o):
    ....
elif not validates(o):
    # whoops, lost the error reason

There's a variety of traditional answers to this, but I don't really like any of them because they all obscure (to some degree) the logic of what is going on. The right way of saying 'I am using this as a condition but capturing its return value for later use' is an assignment in a conditional; it is explicit and unambiguous (apart from the bit where = and == are too close to each other).

Python being Python, we can of course actually do this via a suitable hack:

def capture(store, res):
    store[0] = res
    return res

store = [None,]
if blah(o):
    ...
elif capture(store, validates(o)):
    ... use store[0] ...

Just like the last one, I suspect that this is not going to be considered really Pythonic.

(With sufficient cleverness, you can construct a version of capture() that is passed store's name and puts the result in it directly, without needing you to make it an array and use store[0]. To appeal to the Lisp crowd, call the routine let().)

AssignmentInConditionals written at 22:50:31; Add Comment

2007-07-10

Thinking about the Python equivalents of C's !! double negation

C's !! double negation idiom is a way of compressing all non-zero values of an expression down into a 1. There's a number of things that you can use to get the same effect in Python:

  • if you know that your result is always zero or positive, you can use cmp(). I'm ambivalent about whether this is obscure usage; you're at least making it clear what you're doing, although probably not why.

    (If your result can be negative, you can use abs(cmp(...)).)

  • the minimalist version is just bool(...), because (currently) True and False can be directly used in arithmetic expressions.
  • writing int(bool(...)) has the same effect but makes it clear to readers (and programs like pylint and pychecker) that you're doing this deliberately.

I'm not sure which one I like better. Consider an expression like:

nbytes = nbits // 8 + int(bool(nbits % 8))

My C heritage says that this version is the clearest (and indeed it's the first one I thought of). I'd have to pause to think about the cmp() version, partly because I've yet to memorize which way cmp()'s arguments run so I always have to think about what its result will be.

I suspect that none of these would be considered good Python style, and that the proper Pythonic way to do this is to just use an if, or to write the whole thing as math.ceil(nbits / 8.0) and trust that floating point precision is not going to bite you.

(For some reason I have a small addiction to little overly clever Python idioms like this.)

DoubleNegationEquivalent written at 21:30:48; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.