Wandering Thoughts archives

2009-03-09

What list methods don't make sense for heterogeneous lists

If one is going to use lists for heterogeneous data (per yesterday's entry), it makes sense to ask what list methods don't make sense any more. Opinions will probably differ, but here is my take on it.

First, I think that we can skip all methods that are common between tuples and lists; if tuples have them, they are presumably considered fine for heterogeneous data. Looking at what remains, I see:

  • .sort() clearly makes no sense; there is no real ordering among heterogeneous elements.

  • .reverse() doesn't make much sense to me, because if you have heterogeneous data I tend to think that their order is important.

I'm unlikely to use .index(), .insert(), .remove(), or list multiplication, but I'm not sure they'll never make sense for some ways of building and manipulating heterogeneous lists. The same is true for .append() and .extend(), and I actually use them in situations where I accumulate elements instead of creating the list in one big bang.

In thinking about this, I've come to the obvious realization that there are two sorts of heterogeneous lists. Sometimes, nominally heterogeneous lists actually contain conceptually homogeneous data that is just most conveniently represented in different Python types (or, to put it the other way, that you have not bothered to create a class to encapsulate). For instance, in processing a language you might have a list of parser nodes or lexer tokens that have varying representations; at a mechanical level the list is heterogeneous, but at a conceptual level everything in it is the same sort of thing.

With this sort of conceptually homogeneous list, you can use all of the list methods (even .sort(), with a custom comparison function) and have them all make sense, even though in some sense you are mingling apples and oranges.

Sidebar: finding all of the list-only methods

Here is yet another appreciation of Python's introspection abilities. I decided that I wanted to know the methods that lists didn't share with tuples, so:

t = tuple()
l = list()
s1 = set(dir(t))
s2 = set(dir(l))
l2 = list(s2 - s1)
l2.sort(); print l2

I used actual instances as a precaution, but some experimentation shows that I didn't need to; you get the same result if you take the dir() of list and tuple directly.

(Updated: fixed the code to actually sort the list, as pointed out by a commentator. Whoops.)

HeterogeneousLists written at 00:47:15; Add Comment

2009-03-08

Python's theoretically missing core data type

In theory in Python, tuples are for heterogeneous data and lists are for homogeneous data. Except, well, tuples and lists have 'side effects': tuples are read-only and lists are not. Which means that if you take the restriction on lists seriously, you do not have a core data type for ordered writeable heterogeneous data (and maybe also one for read-only homogeneous data).

(While this is a theory that's probably more honored in the breach than in practice, I think that it does have some consequences, especially for determining what things go into the standard library.)

All of which gets us to the issue of whether one should hijack lists for the situation where you need ordered writeable heterogeneous data. On the one hand, if lists are for heterogeneous data and this is their most important attribute, then people who hijack them are doing it wrong; not only are they writing un-Pythonic code, but they are probably not going to get the support they'd like from the standard library.

On the other hand, I think that ordered writeable heterogeneous data is a common enough pattern that there should be a core type that supports it; there are a fair number of situations with ordered, modifiable fields. (If nothing else, I note that any serialization format imposes a field order on otherwise unordered structures.)

In theory you could add another core type to Python for this. In practice I think it would be the wrong answer; there would be little functional difference between lists and the new data type, so you'd be adding it merely for intellectual purity. This is unlikely to be attractive to either language designers or Python programmers.

All of which leads me to the opinion that the list purists should yield a bit and accept that for the overall good of Python, lists can be legitimately used for heterogeneous data as well as homogeneous data. Carrying this through to what gets added to the standard library would be nice, too.

(Yes, I would like a 'namedlist' class in the collections module, especially since I wrote one.)

MissingType written at 01:25:44; Add Comment

2009-03-07

What past problems of mine the collections module solves

The somewhat new collections module has a number of useful things that obsolete several of the things I've done by hand in the past. For my own reference if nothing else, here's more or less what its solutions obsolete and when each solution was introduced.

  • defaultdict obsoletes my use of dict.setdefault(). It was introduced in Python 2.5.

  • namedtuple obsoletes some but not all of my various forms of structures in Python. It was introduced in Python 2.6.

The reason namedtuple doesn't cover all of my uses of (abstract) structures with named fields is right there in the name. Since it is based on tuples, you can't have modifyable structures; this makes namedtuple a good match for functions that want to return structured read-only information, but not for more general uses where you want ordered named fields that you can change.

(I suspect that the Pythonic view is that one should just go straight to full structures and add ordering there, instead of trying to hijack lists. A proper discussion of the issue does not fit within the margins of this entry.)

Unfortunately, I'm not going to get to use the collections module very much any time soon; most of our systems are old enough that they are running Python 2.4, and that's not likely to change. Even the more recent and fast-moving ones are only on Python 2.5, and based on the speed of updates around here, it will probably be at least a year before I'm using Python 2.6 anywhere.

(For the curious: Solaris 10, Red Hat Enterprise 5, and Ubuntu 6.06 all have Python 2.4 and are almost sure to stay that way for their lifetimes. Ubuntu 8.04 and Fedora 10 have Python 2.5, although it looks like the next version of Fedora will have 2.6.)

CollectionsSolutions written at 02:16:17; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.