A Python length gotcha
Python calls the __len__
method on your objects to implement
len()
, and in a few other situations (for example, as one way of
iterating through the elements of a sequence-like
object). Surprisingly, there's an under-documented restriction on what
your __len__
can return: objects can't be larger than
sys.maxint
.
In fact, it's stricter than that: your __len__
method must return
a literal integer. You cannot return a Python 'long', even if the long
is less than sys.maxint
. (This is arguably a bug and may be fixed in
future versions of Python, since it's a wart in the int/long
unification.)
If you return a non-int, including something larger than sys.maxint
,
you'll get a TypeError from len()
(with helpful explanatory text, at
least).
It's impossible to hit this with ordinary container objects, since you can't really create that many objects. However, you can run into this if you have container classes that use some efficient internal encoding. In my case it was a class representing sets of IP address ranges (built on top of a class to do efficient sets of (positive) integer ranges). I decided that the right definition of 'length' was 'how many IP addresses are in this set', and then discovered that IP addresses had to be stored as longs, not ints, which caused me to hit both aspects of the problem at once.
I don't really have a good workaround. My number range sets class uses:
def len(self): .... def __len__(self): return int(self.len()))
Then I tell people to use rset.len()
instead of len(rset)
,
and if they still use len()
it might still work.
Sidebar: xrange()
, the other thing that cares
As I found out when writing my iterator for the number sets classes
(it returns every number in the set, one by one), xrange()
also
requires its arguments to be less than sys.maxint
(even if the span
would be smaller than sys.maxint
).
This forced me to write a rare explicit iterating loop in Python and led to a certain amount of muttering.
|
|