2010-08-05
Doing unsigned 32-bit integer math in Python
Modern versions of Python helpfully have infinite-precision integers. This is often useful (and they are intelligent), but there are periodic situations where I really do want to operate with fixed-size integers, rollover and all. My most recent need was computing a hash that was specified in terms of unsigned 32-bit operations (well, it was specified in C code that was doing such operations).
(It's common for hashes, checksums, and so on to be specified with this sort of arithmetic. If you need to duplicate the hash or checksum in pure Python, you're going to be faced with this.)
Embarrassingly, every time I run into this I go through the same exercise of working out how to do it and digging up my old code and convincing myself that it works. So I'm going to write this down once and hope that it sticks (and if not, I can look it up here later).
In pure Python for unsigned 32-bit arithmetic, it suffices to mask numbers with 0xffffffffL after every potentially overflowing operation. In my recent case, I wound up defining some convenience functions:
M32 = 0xffffffffL def m32(n): return n & M32 def madd(a, b): return m32(a+b) def msub(a, b): return m32(a-b) def mls(a, b): return m32(a<<b)
(There is no need for an equivalent function for >>
, as it can't
overflow.)
I don't know if there's an equivalent version for signed 32-bit integer math, rollover and all; so far I haven't needed one.
If you have NumPy available, the simpler version is just:
from numpy import uint32 from numpy import int32
NumPy also provides signed and unsigned 8 bit, 16 bit, and 64 bit integers under the obvious names.
Note that you don't want to mix these types with regular Python integers
if you want things to come out just right; you need to make sure that
all numbers involved in your code are turned into the appropriate NumPy
type. But if you do a bunch of computation with only a few original
numbers, this may well be the easiest, most convenient approach. If
you're copying code from another language, it will also likely keep your
Python code looking as much like the original as possible instead of
littering it with madd()
and msub()
calls.