Wandering Thoughts archives

2011-06-23

Milter tools for Python and an experiment with coding in public

I've decided to buckle down and write some Python code to deal with sendmail milters, in order to satisfy a long standing desire of mine. The current name of this code is 'python-milter-tools', and so far I have encoders and decoders for all v2 milter protocol messages and some code to make it easier to talk the milter protocol over sockets. So far it is mostly focused on the 'MTA' side of the protocol, since that's what I'm interested in.

(I should note that although this code has survived talking with a real milter client, it has not been put into actual use and may have issues.)

As an experiment in the modern way of doing things I've decided to use git and put the code up on github. I've taken to call this 'coding in public', and it's an interesting feeling so far. It's probably caused me to write more tests and more documentation than I otherwise would have at this point in the code's life. The tests are not complete, but do demonstrate some of my general approaches to testing Python code.

(The observant will notice that I renamed the repository once already. Names are hard.)

PyMilterTools written at 17:22:27; Add Comment

2011-06-22

A basic Namespace metaclass for Python

Suppose that you want to conveniently abuse classes as namespaces. For me, this means avoiding having to splatter @staticmethod all over my class definitions; it's both repetitive make-work and distracting when I'm looking at the source code.

Clearly the solution to this problem is a metaclass to do all of the work for me. Here's one, which is deliberately somewhat simple:

from types import FunctionType
class NSMeta(type):
  def __new__(meta, cname, bases, cdict):
    ndict = {}
    for name, aval in cdict.items():
      if type(aval) == FunctionType:
        aval = staticmethod(aval)
      ndict[name] = aval
    return type.__new__(meta, cname, \
                        bases, ndict)

# Inherit from this class:
class Namespace(object):
  __metaclass__ = NSMeta

# An example:
class uint16(Namespace):
  def encode(val):
    return ...

  def decode(data):
    return ...

(To fully exhibit my twitch about aesthetics and not repeating myself, I set up the Namespace class purely so that 'namespace' classes would not have to add an extra line to set __metaclass__. Inheriting from Namespace is free, in terms of code layout, since they need to inherit from something.)

A really superintelligent metaclass would only remap functions and would peek at what the first argument of every function was called. If it was 'self', the function would be left alone (and would thus become a method function); if it was 'cls' (or your preferred equivalent), the function would be made into a classmethod; otherwise, the function would be made into a staticmethod.

Writing such a superintelligent metaclass is left as an exercise for the reader.

Update: code in haste, repent slightly later. I have realized that this metaclass should at least only apply staticmethod to functions, since there are sensible uses for non-function things in a namespace (such as constants). So I've fixed it to do this right.

NamespaceMetaclass written at 01:51:27; Add Comment

Abusing Python classes as namespaces

Suppose, not entirely hypothetically, that you are dealing with binary structures and so have a bunch of functions to encode Python values to various different sorts of binary fields and decode those fields back to Python values. These functions clearly come in pairs.

There are various ways of organizing these functions; for example, you can just give them names like encode_uint16 and decode_uint16. One attractive way would be to group the pairs of functions together in namespaces, for example as uint16.encode and uint16.decode; this becomes even more useful if you add more per field type functions (for example, a validator function).

Unfortunately, Python does not have general namespace support. There is no convenient way to open up a new namespace and start defining things in it; all of the ways of creating namespaces come with various sorts of baggage. The closest Python comes to pure namespaces without side effects is modules, but you have to put them in separate files and I think that this gets ugly and hard to follow because the modules (and thus the files) are just too small.

(In theory one answer is to add some methods to your value types. However, even if you have modifiable classes for your value types, it may be inappropriate to add .encode and .decode methods to them because the same value type may have multiple different possible encodings. And if you're using basic Python values (ints, strings, etc), you don't even have modifiable classes to add methods to.)

The only real way to conveniently define multiple namespaces in a single file is to make them classes. But not conventional classes. Classes used purely as namespaces will never be instantiated; with no instances and no per-instance state, none of the 'methods' will actually ever use self and so there is no point in having them called with it. In short we want a class that has only staticmethod functions, because that makes them into regular plain old functions:

class uint16(object):
  @staticmethod
  def encode(val):
    return ...

  @staticmethod
  def decode(data):
    return ...

(This class deliberately breaks the usual Python naming convention for classes, because it's nothing like a normal class.)

In some cases it may be useful to have per-class data and some sort of class hierarchy in order to reuse code. That's the time for @classmethod.

PS: having tried this approach out in some of my code, I confess that I'm not sure I like it. The problem for me is an aesthetic one; I think I got the 'namespace' names wrong (I made them too class-like) and I'm not sure I like the @staticmethod's that are sprayed all over the place.

ClassesAsNamespaces written at 01:28:34; Add Comment

2011-06-12

Why [], {}, and () are cheap in Python

Here's a trick question: are the following two lines semantically equivalent?

a = []
a = list()

In a normal Python environment they certainly have the same effect, but the answer is that no, they are not semantically equivalent because there is always the possibility that the name list has been rebound in the current namespace. As a result, when Python executes the second line it's obliged to go through the bother of looking up list's name to determine that it actually is the built-in. By contrast, [] here can only ever mean one thing in Python; its semantics are fixed and so CPython is free to fast-path things based on that.

(And it does. [] compiles straight down to a specific bytecode operation, BUILD_LIST. list() compiles to a lookup and a function call.)

That's half of the reason that [] is a cheap operation. The other half is that CPython keeps a stash of preallocated list objects ready to be reused. As a result, CPython needs to do no memory allocations if it needs an empty list when this stash is not empty.

(For a non-empty list, it needs to allocate and zero the memory block for the array of pointers to list items.)

Dictionaries have similar optimizations, although the dictionary setup code has slightly more work to do in order to give you a valid empty dictionary.

Tuple creation has even more optimization. CPython keeps a much larger stash of various sized tuples so that it can almost certainly immediately give you, eg, a 4-element tuple without any fuss. Since tuples are immutable, () can be a very special case:

>>> () is ()
True

In other words, there is only one empty tuple in CPython. Once something creates it, it will live forever. Giving you an empty tuple is as simple as increasing this one empty tuple's reference count and returning it.

(This entry was sparked by a recent comment on this entry; it caused me to get curious about just how efficient things like '[]' were. I had assumed that CPython had optimized them, but why assume when I could go find out.)

Sidebar: how much this matters

; python -m timeit 'a = ()'
10000000 loops, best of 3: 0.0315 usec per loop
; python -m timeit 'a = []'
10000000 loops, best of 3: 0.0573 usec per loop
; python -m timeit 'a = list()'
1000000 loops, best of 3: 0.198 usec per loop

And here's some more, with some interesting results:

; python -m timeit -s 'l = list' 'a = l()'
10000000 loops, best of 3: 0.176 usec per loop
; python -m timeit -s 'class A(object): pass' 'a = A()'
10000000 loops, best of 3: 0.16 usec per loop

If you give the A class a do-nothing __init__, it slows down to around 0.38 usec per loop. I have no idea why 'A()' is so fast without an __init__; before I measured, I would have expected it to be clearly slower than list().

(This is all from a 64-bit Fedora 14 machine with Python 2.7.)

PS: this is where Zed Shaw gets very irritated with me, so don't bet the farm on these timing results.

CheapListDictTupleCreation written at 01:13:05; Add Comment

2011-06-11

Some notes on __slots__ and class hierarchies

How __slots__ interacts with class inheritance is a little bit complicated (although in general everything is covered in the official documentation). There are a couple of issues.

First, once a class has a dictionary, it's permanent; you cannot take the dictionary away in subclasses. This means that if you have a class hierarchy involving slots-using classes, all of their parents need to not have a dictionary, ie use __slots__ without including "__dict__".

(By the way, it's just struck me that this is fundamentally why you can't add attributes to instances of object(), as I noticed a long time ago. To have arbitrary attributes added, object() instances would have to have a dictionary, but that would mean that you couldn't subclass object() to get a dictionary-less class.)

However, you can relocate a dictionary-using parent class's instance variables into slots (which may save you memory), because slots don't have to be defined in the same class as where they're used. To make this clear, suppose that you have:

class A(object):
  def __init__(self):
    self.a = 10

class B(A):
  __slots__ = ('a',)

An instance of A will have a __dict__ with 'a' in it, but an instance of B will have an empty __dict__; a is now stored in a slot.

Using this very much will probably make everyone's head hurt, especially if you start introspecting things. (Introspection in the face of slots is one of those interesting topics. For example, there is no simple way to see what attributes an object currently has.)

Since a class normally has a dictionary, inheriting from a slots-using class without specifying __slots__ yourself results in your instances having dictionaries, as I discovered. As with the example above, the actual instance variables from your parent class will still wind up in slots. It is easy to miss this.

It's certainly possible to combine slots with having a class hierarchy, but it's clearly complex (and has limits). I would avoid it if possible, partly because some failure modes are basically silent (your instances have dictionaries when you didn't expect them to; nothing breaks, but you use more memory than you should). The simple approach is the easiest; just have all your slots-using classes be subclasses of object(). Among other things, this makes it easy to be sure of exactly what slots behavior you're getting, because the class itself is the only thing that matters.

PS: I'm sure that someone, somewhere, has written a 'slotizer' metaclass that analyzes the __init__ function in a new class to determine all instance variables that it sets and then turns them all into slots.

SlotsInheritance written at 01:20:20; Add Comment

2011-06-10

What __slots__ are good for (or, how to use slots to save memory)

First off and to echo what everyone says, slots are for saving memory. The direct corollary of this is that when you are considering using slots you should measure the memory savings you get with and without them; if it is not significant (to you), you should not use slots because slots have various odd side effects and drawbacks.

(Sometimes people are attracted to those side effects. Please don't be one of those people.)

As the documentation says, slots save you memory by eliminating (or at least minimizing) the per-instance dictionary that objects of your class would normally have. How much memory this saves you is complicated (because dictionary space usage is complicated); a lot depends on how many entries your instance dictionary would normally have. There are two different but closely related things you do with slots: you can disable the creation of an instance dictionary and you can preallocate space for specific instance attributes. These things lead to three different ways of using slots.

The first way is subclassing built in types when all you're doing is redefining or adding methods. Instances of most built in types don't have instance dictionaries, but instances of your subclass normally will (even if you never use it for anything). Using an empty __slots__ will turn this off, making your instances as memory-efficient as the parent type, and you can do this even for variable-size types like str and tuple. This is the only thing you can do with slots if you're subclassing a variable-sized type, one that has a non-zero __itemsize__ (see here and here for more discussion of this).

(This also applies if you are subclassing a slots-using class. Basically, you can do this any time you're subclassing a class that doesn't already have an instance dictionary.)

Second, if you have a fixed set of instance attributes you can preallocate space for all of them and entirely eliminate the instance dictionary. This is the classical case of a non-empty __slots__ with attribute names in it. In many situations it makes sense to preallocate instance attributes that you will only rarely use so that you can get rid of the instance dictionary (and slots arrange for hasattr() et al to keep on working in this case; a slot that has never had a value set raises AttributeError when you try to access it).

(Each slot attribute costs you a pointer, either 4 or 8 bytes depending on your platform. An instance dictionary costs you at least a pointer plus sys.getsizeof(dict()) bytes. On my 32-bit and 64-bit Linux machines, this works out to about 35 or 36 attributes taking as much memory as an empty instance dictionary.)

Third, if you have a subset of instance attributes that are always present plus some unpredictable attributes that are added dynamically, you can potentially save space overall by preallocating space for the always-present attributes while still having an instance dictionary for the dynamic attributes. To do this you use a __slots__ with your attribute names plus "__dict__". This is the least straightforward and most conditional memory savings.

In the abstract, a slot uses about a third of the memory of a dictionary entry. In practice, well, dictionary space usage is complicated so how much memory this will save depends on the typical number of entries in the instance dictionary and how many entries you're able to preallocate. In current versions of CPython 2.x, the breakpoints you're likely to hit are 5 entries and 21 entries. Moving from six entries to five will save you a chunk of memory, but moving from five to four won't. In particular, an instance dictionary with no entries takes up just as much memory as one with five entries.

(Yes, this means that adding slots while keeping an instance dictionary can actually cost you more memory. That's why you really want to measure this sort of stuff.)

PS: I'm ignoring __weakref__ in this discussion. If you need weak references, you probably already know it and you can add it to your __slots__.

WhatSlotsAreGoodFor written at 01:56:23; Add Comment

2011-06-06

How much space a Python dictionary entry uses

There are a number of built in Python types that have variable sized instances, such as tuples, lists, and longs (or ints, in Python 3). For some of these, there is a simple answer to how much space a given instance will take up, based on how many items you put in an instance; you can determine this from using sys.getsizeof() on an empty instance to get the base size and then looking at <type>.__itemsize__ to see the per-item space usage.

Dictionaries are not this simple, because like lists they do not simply add their entries on the end of their basic C level blob of data (well, it's a C structure). As a result a number of things affect the size of a particular dictionary, and in turn affect when adding entries will actually make the dictionary use more memory.

In the abstract, a dictionary entry is three pointers; one for the key, one for the value, and a third to store the value of the key's hash. On a 32-bit machine this makes it 12 bytes and on a 64-bit machine, 24 bytes. However, dictionaries do not add entries one by one; instead there are two complications.

First, the basic dictionary structure comes with enough room for up to five entries. If you watch sys.getsizeof() as you add keys to a dictionary, you can see this clearly; an empty dictionary and one with five entries use the same modest amount of space, but once you add a sixth entry the space usage jumps.

Second, dictionaries enlarge (and shrink) in discontinuous jumps. Each dictionary has a table for entries, which is always a power of two in size and at most 2/3rds used; when you add an entry that would push the table over 2/3rds used, CPython enlarges the dictionary table instead. Dictionary tables that are not that large quadruple when they're resized; large tables merely double.

Let's put this in concrete terms. On a 64-bit machine with Python 2.7, sys.getsizeof() reports that a dictionary uses 280 bytes when it has from zero to five elements. When you add a sixth entry, the reported size jumps to 1048 bytes. We can actually work out exactly where this memory is going; the size of the dictionary's C structure is 280 bytes, and then we need a 32-entry table (four times the size of the 8-entry table we started with) where every entry is 24 bytes, for 768 bytes total. This adds up to exactly the 1048 bytes of memory usage that was reported.

Note that all of this is of course implementation dependent, although the CPython dictionary implementation has been refined enough that it doesn't seem to change much. Python 3.1.2 reports the same numbers, for example. The real lesson here is that sys.getsizeof() means that you can measure things for yourself if you think it matters in your program.

(Also note that size reported by sys.getsizeof() does not include the space required for the keys and the values themselves, for various reasons.)

DictionarySpaceUsage written at 00:46:19; Add Comment

2011-06-05

A subtle difference between tuples and lists in CPython

There are of course any number of differences between tuples and lists, but today I discovered a subtle and interesting one: tuples have a non-zero __itemsize__, and lists do not. Or to put it in a more comprehensible way, in CPython tuples are variable-size blobs and lists are not.

Wait, what? Lists are clearly variable sized things, in fact more variable sized that tuples since a tuple is immutable once created. It turns out that that difference is exactly why tuples are variable-sized blobs and lists are not.

Both lists and tuples must store a variable-sized array of items somewhere. As covered in HowSlotsWorkI, the CPython runtime allows the fundamental blob of storage for a Python object instance to have a variable amount of additional data tacked on the end. However, you cannot resize an instance blob once it has been created. Because tuples are immutable, the size of their item array doesn't change once a tuple instance has been created and they can directly store their item array on the end of their C-level blob, saving a separate memory allocation and a pointer. However, lists can and do change the size of their item array after creation and so must store their item array in a separate chunk of storage so they can freely resize it as the particular list grows and shrinks.

One of the consequences of this is the traditional one: you can't subclass tuple with anything that has a non-empty __slots__.

(In theory this makes tuples slightly more memory efficient than lists. In practice the differences are small enough that they're unlikely to matter unless you have a huge number of items.)

PS: as you might expect, dict is another variable sized Python thing that has a zero __itemsize__.

TupleListSlots written at 00:55:03; Add Comment

2011-06-03

Ints, __slots__, and Python 3

Today, one of my visitors came here after doing a Google search for the error message:

nonempty __slots__ not supported for subtype of 'int'

Right away, I can tell you something about this person; they are either using Python 3 or porting code to Python 3.

(I use 'porting' instead of 'migrating' deliberately.)

At one level, the reason you get this error message is simple and I covered it in HowSlotsWorkI; in short, you are trying to subclass a type with a non-zero __itemsize__, one whose C-level storage is already variable sized, and that doesn't work because of how CPython implements slots. I can tell that this is a Python 3 user because only in Python 3 does int have a non-zero __itemsize__.

Or to put it differently, in Python 2 ints are fixed-size objects and in Python 3 they are not. So, you might ask, why is this so? The answer is that Python 3 fully unified ints and longs.

Python 2's long type is what other languages would call a 'bignum', an arbitrary-length integer number. Like everyone else's bignum implementation, Python 2 stores longs in variable sized objects; this is clearly necessary since a bignum can be arbitrarily large and the actual length of bignums varies considerably. Python 2's int is your ordinary good old fashioned fixed-length integer type (either 32 bits or 64 bits depending on what the underlying C int type is on your platform).

In Python 2, ints and longs are thus different types but are partially unified in that you can freely mix them in arithmetic and an int that grows sufficiently large is automatically converted to a long (however, small longs do not automatically convert back into ints). In Python 3, they have been completely unified to the point where there is only one type, int. Since this type has to represent bignums as well as ordinary small integers, its underlying storage must be variable sized; since it has a variable-sized storage, it can't be used with slots.

(In a sense the way to deal with this issue when porting to Python 3 is simple; you just stop using __slots__ on your subclass of int. Everything works the same and your instances just take up a bit more memory.)

By the way, before anyone freaks out over the clear memory waste of representing all small integers with bignums, well:

>>> import sys
>>> sys.getsizeof(10)
24
>>> sys.getsizeof(10L)
28

(This is on a 64-bit Linux machine with Python 2.7; a 32-bit machine with 2.6.5 reports 12 bytes and 16 bytes respectively. Using Python 3 on the same 64-bit machine reports a size of 28 bytes.)

(Some people would quibble over this being called 'unification', in that it's more like the Python 2 'long' type was just renamed to be 'int'. In fact this is pretty much exactly what happened at the C API layer.)

IntSlotsPython3k written at 00:14:23; Add Comment

By day for June 2011: 3 5 6 10 11 12 22 23; before June; after June.

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.