Wandering Thoughts archives

2006-01-19

A Python length gotcha

Python calls the __len__ method on your objects to implement len(), and in a few other situations (for example, as one way of iterating through the elements of a sequence-like object). Surprisingly, there's an under-documented restriction on what your __len__ can return: objects can't be larger than sys.maxint.

In fact, it's stricter than that: your __len__ method must return a literal integer. You cannot return a Python 'long', even if the long is less than sys.maxint. (This is arguably a bug and may be fixed in future versions of Python, since it's a wart in the int/long unification.)

If you return a non-int, including something larger than sys.maxint, you'll get a TypeError from len() (with helpful explanatory text, at least).

It's impossible to hit this with ordinary container objects, since you can't really create that many objects. However, you can run into this if you have container classes that use some efficient internal encoding. In my case it was a class representing sets of IP address ranges (built on top of a class to do efficient sets of (positive) integer ranges). I decided that the right definition of 'length' was 'how many IP addresses are in this set', and then discovered that IP addresses had to be stored as longs, not ints, which caused me to hit both aspects of the problem at once.

I don't really have a good workaround. My number range sets class uses:

def len(self):
  ....
def __len__(self):
  return int(self.len()))

Then I tell people to use rset.len() instead of len(rset), and if they still use len() it might still work.

Sidebar: xrange(), the other thing that cares

As I found out when writing my iterator for the number sets classes (it returns every number in the set, one by one), xrange() also requires its arguments to be less than sys.maxint (even if the span would be smaller than sys.maxint).

This forced me to write a rare explicit iterating loop in Python and led to a certain amount of muttering.

ALengthGotcha written at 02:42:25; Add Comment

2006-01-11

Recording attribute access and method calls

Here's a little bit of magic, demonstrated in the Python interpreter:

>>> from record import Record
>>> r = Record()
>>> r2 = r.a.b('abc').c
>>> print r2
a.b('abc').c
>>> print r.d.e(r2, t=20).f()
d.e(Record().a.b('abc').c, t = 20).f()

Rather than making function calls or giving you real attributes, the Record class just records what you did (and in this case, lets it be dumped as a string). It's smart enough to know when some of the function arguments are also recorded bits and treat them specially. (Here it just sticks 'Record().' in front of them, since this is an example.)

A version of this cropped up in PingingWeblogsInPython, where it's the underlying mechanism the xmlrpclib module uses to let people invoke XML/RPC procedures as if they were writing plain Python calls.

Python allows objects to customize access to themselves (covered here), so the implementation is pretty simple:

  • every Record object remembers the path to itself.
  • __getattr__ returns a kid Record object with the attribute name appended to the path.
  • __call__ returns a new Record object with the function call arguments added to the path.
  • __str__ just returns the current path.

This creates a new object for each step of the path, but because Record objects never change their own path they can be safely saved and reused by outside code (eg, in the example we reuse r as the starting point for two different paths; if we returned the same object all the time and just mutated its current path, this would explode spectacularly).

Because Python doesn't distinguish between 'call attribute' and 'access attribute', all of the attributes you get back are potentially callable (eg, r2 could be called later).

Sidebar: the actual code

Here's a simple implementation of the Record class.

class Record(object):
  def __init__(self, name=''):
    self._n = name
  def __str__(self):
    return self._n

  def __getattr__(self, attr):
    nn = attr
    if self._n:
      nn = '%s.%s' % (self._n, attr)
    return self.__class__(nn)

  def _gr(self, x):
    if isinstance(x, Record):
      return "Record().%s" % str(x)
    else:
      return repr(x)

  def __call__(self, *a, **kw):
    al = [self._gr(x) for x in a]
    k = kw.keys(); k.sort()
    al.extend(["%s = %s" % (x, self._gr(kw[x]))
               for x in k])
    nn = "%s(%s)" % (self._n, ", ".join(al))
    return self.__class__(nn)

(As usual, some names have been shortened to make the code not take up too much space.)

In a real version, __getattr__ probably should have a cache, so that getting '.foo' twice gives you the same object back.

RecordingAccesses written at 01:33:50; Add Comment

2006-01-04

Python synergies in list addressing

Something I took from this Ian Bicking entry is that synergies and elegance don't just happen; someone usually worked hard to make it all come out neatly. Python lists have an interesting case of this; some apparently odd decisions in other places turn out to be needed to create useful (and error-avoiding) synergies.

Python lists (and sequences in general) are indexed from 0. Zero-based indexing presents a problem, which can be succinctly stated as this: list element indexes run from 0 to len(lst)-1. That -1 is ugly and error prone.

So Python has quietly arranged things so that you never have to write it (or +1, its kissing cousin), by making 'slice' addressing of lists and range() end-exclusive asymmetric (instead of 'i:j' running from i to j, it runs from i to j-1). This means:

  • range(len(lst)) generates indexes that exactly cover the list, since they run from 0 to len(lst)-1.
  • lst[:len(lst)] is the entire list.
  • if pref is at the start of the list, lst[len(pref):] is the list with pref removed from the start.
  • if sublst is in lst starting at pos, lst[:pos] is everything before sublst and lst[pos + len(sublst):] is everything after sublst.
  • if suf is at the end of lst, lst[:-len(suf)] is the list with suf removed from the end.
  • avoiding a subtler error, lst[len(pref):] works even if pref is zero length (although lst[:-len(suf)] does not; can't win them all).

All of these are straightforward expressions, with nary a stray -1 or +1 in sight and no chance for off by one errors. (Is it elegant or does the asymmetry cancel out the lack of +1 and -1? That's in the eye of the beholder, but I like it.)

Python is not the first language to notice this issue; Scheme's substring is end-exclusive, for example. Other things duck the issue by having their substring operation take a start and a length, instead of a start and and end position.

(This entry's genesis came from comments made on AClosureConfusion.)

PythonListSynergies written at 03:32:15; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.