2006-01-19
A Python length gotcha
Python calls the __len__ method on your objects to implement
len(), and in a few other situations (for example, as one way of
iterating through the elements of a sequence-like
object). Surprisingly, there's an under-documented restriction on what
your __len__ can return: objects can't be larger than
sys.maxint.
In fact, it's stricter than that: your __len__ method must return
a literal integer. You cannot return a Python 'long', even if the long
is less than sys.maxint. (This is arguably a bug and may be fixed in
future versions of Python, since it's a wart in the int/long
unification.)
If you return a non-int, including something larger than sys.maxint,
you'll get a TypeError from len() (with helpful explanatory text, at
least).
It's impossible to hit this with ordinary container objects, since you can't really create that many objects. However, you can run into this if you have container classes that use some efficient internal encoding. In my case it was a class representing sets of IP address ranges (built on top of a class to do efficient sets of (positive) integer ranges). I decided that the right definition of 'length' was 'how many IP addresses are in this set', and then discovered that IP addresses had to be stored as longs, not ints, which caused me to hit both aspects of the problem at once.
I don't really have a good workaround. My number range sets class uses:
def len(self): .... def __len__(self): return int(self.len()))
Then I tell people to use rset.len() instead of len(rset),
and if they still use len() it might still work.
Sidebar: xrange(), the other thing that cares
As I found out when writing my iterator for the number sets classes
(it returns every number in the set, one by one), xrange() also
requires its arguments to be less than sys.maxint (even if the span
would be smaller than sys.maxint).
This forced me to write a rare explicit iterating loop in Python and led to a certain amount of muttering.
2006-01-11
Recording attribute access and method calls
Here's a little bit of magic, demonstrated in the Python interpreter:
>>> from record import Record
>>> r = Record()
>>> r2 = r.a.b('abc').c
>>> print r2
a.b('abc').c
>>> print r.d.e(r2, t=20).f()
d.e(Record().a.b('abc').c, t = 20).f()
Rather than making function calls or giving you real attributes, the
Record class just records what you did (and in this case, lets it be
dumped as a string). It's smart enough to know when some of the
function arguments are also recorded bits and treat them specially.
(Here it just sticks 'Record().' in front of them, since this is an
example.)
A version of this cropped up in PingingWeblogsInPython, where it's the underlying mechanism the xmlrpclib module uses to let people invoke XML/RPC procedures as if they were writing plain Python calls.
Python allows objects to customize access to themselves (covered here), so the implementation is pretty simple:
- every
Recordobject remembers the path to itself. __getattr__returns a kidRecordobject with the attribute name appended to the path.__call__returns a newRecordobject with the function call arguments added to the path.__str__just returns the current path.
This creates a new object for each step of the path, but because
Record objects never change their own path they can be safely saved
and reused by outside code (eg, in the example we reuse r as the
starting point for two different paths; if we returned the same object
all the time and just mutated its current path, this would explode
spectacularly).
Because Python doesn't distinguish between 'call attribute' and
'access attribute', all of the attributes you get back are potentially
callable (eg, r2 could be called later).
Sidebar: the actual code
Here's a simple implementation of the Record class.
class Record(object):
def __init__(self, name=''):
self._n = name
def __str__(self):
return self._n
def __getattr__(self, attr):
nn = attr
if self._n:
nn = '%s.%s' % (self._n, attr)
return self.__class__(nn)
def _gr(self, x):
if isinstance(x, Record):
return "Record().%s" % str(x)
else:
return repr(x)
def __call__(self, *a, **kw):
al = [self._gr(x) for x in a]
k = kw.keys(); k.sort()
al.extend(["%s = %s" % (x, self._gr(kw[x]))
for x in k])
nn = "%s(%s)" % (self._n, ", ".join(al))
return self.__class__(nn)
(As usual, some names have been shortened to make the code not take up too much space.)
In a real version, __getattr__ probably should have a cache, so
that getting '.foo' twice gives you the same object back.
2006-01-04
Python synergies in list addressing
Something I took from this Ian Bicking entry is that synergies and elegance don't just happen; someone usually worked hard to make it all come out neatly. Python lists have an interesting case of this; some apparently odd decisions in other places turn out to be needed to create useful (and error-avoiding) synergies.
Python lists (and sequences in general) are indexed from 0. Zero-based
indexing presents a problem, which can be succinctly stated as this:
list element indexes run from 0 to len(lst)-1. That -1 is ugly
and error prone.
So Python has quietly arranged things so that you never have to write it
(or +1, its kissing cousin), by making 'slice' addressing of lists and
range() end-exclusive asymmetric (instead of 'i:j' running from i
to j, it runs from i to j-1). This means:
range(len(lst))generates indexes that exactly cover the list, since they run from 0 tolen(lst)-1.lst[:len(lst)]is the entire list.- if
prefis at the start of the list,lst[len(pref):]is the list withprefremoved from the start. - if
sublstis inlststarting atpos,lst[:pos]is everything beforesublstandlst[pos + len(sublst):]is everything aftersublst. - if
sufis at the end oflst,lst[:-len(suf)]is the list withsufremoved from the end. - avoiding a subtler error,
lst[len(pref):]works even ifprefis zero length (althoughlst[:-len(suf)]does not; can't win them all).
All of these are straightforward expressions, with nary a stray -1
or +1 in sight and no chance for off by one errors. (Is it elegant
or does the asymmetry cancel out the lack of +1 and -1? That's in
the eye of the beholder, but I like it.)
Python is not the first language to notice this issue; Scheme's
substring is end-exclusive, for example. Other things duck the issue
by having their substring operation take a start and a length, instead
of a start and and end position.
(This entry's genesis came from comments made on AClosureConfusion.)