2011-06-23
Milter tools for Python and an experiment with coding in public
I've decided to buckle down and write some Python code to deal with sendmail milters, in order to satisfy a long standing desire of mine. The current name of this code is 'python-milter-tools', and so far I have encoders and decoders for all v2 milter protocol messages and some code to make it easier to talk the milter protocol over sockets. So far it is mostly focused on the 'MTA' side of the protocol, since that's what I'm interested in.
(I should note that although this code has survived talking with a real milter client, it has not been put into actual use and may have issues.)
As an experiment in the modern way of doing things I've decided to use git and put the code up on github. I've taken to call this 'coding in public', and it's an interesting feeling so far. It's probably caused me to write more tests and more documentation than I otherwise would have at this point in the code's life. The tests are not complete, but do demonstrate some of my general approaches to testing Python code.
(The observant will notice that I renamed the repository once already. Names are hard.)
2011-06-22
A basic Namespace metaclass for Python
Suppose that you want to conveniently abuse classes as namespaces. For me, this means avoiding having to splatter
@staticmethod all over my class definitions; it's both repetitive
make-work and distracting when I'm looking at the source code.
Clearly the solution to this problem is a metaclass to do all of the work for me. Here's one, which is deliberately somewhat simple:
from types import FunctionType
class NSMeta(type):
def __new__(meta, cname, bases, cdict):
ndict = {}
for name, aval in cdict.items():
if type(aval) == FunctionType:
aval = staticmethod(aval)
ndict[name] = aval
return type.__new__(meta, cname, \
bases, ndict)
# Inherit from this class:
class Namespace(object):
__metaclass__ = NSMeta
# An example:
class uint16(Namespace):
def encode(val):
return ...
def decode(data):
return ...
(To fully exhibit my twitch about aesthetics and not repeating myself,
I set up the Namespace class purely so that 'namespace' classes would
not have to add an extra line to set __metaclass__. Inheriting from
Namespace is free, in terms of code layout, since they need to
inherit from something.)
A really superintelligent metaclass would only remap
functions and would peek at what the first argument of every function
was called. If it was 'self', the function would be left alone (and
would thus become a method function); if it was 'cls' (or your preferred
equivalent), the function would be made into a classmethod; otherwise,
the function would be made into a staticmethod.
Writing such a superintelligent metaclass is left as an exercise for the reader.
Update: code in haste, repent slightly later. I have realized that
this metaclass should at least only apply staticmethod to functions,
since there are sensible uses for non-function things in a namespace
(such as constants). So I've fixed it to do this right.
Abusing Python classes as namespaces
Suppose, not entirely hypothetically, that you are dealing with binary structures and so have a bunch of functions to encode Python values to various different sorts of binary fields and decode those fields back to Python values. These functions clearly come in pairs.
There are various ways of organizing these functions; for
example, you can just give them names like encode_uint16 and
decode_uint16. One attractive way would be to group the pairs of
functions together in namespaces, for example as uint16.encode and
uint16.decode; this becomes even more useful if you add more per
field type functions (for example, a validator function).
Unfortunately, Python does not have general namespace support. There is no convenient way to open up a new namespace and start defining things in it; all of the ways of creating namespaces come with various sorts of baggage. The closest Python comes to pure namespaces without side effects is modules, but you have to put them in separate files and I think that this gets ugly and hard to follow because the modules (and thus the files) are just too small.
(In theory one answer is to add some methods to your value
types. However, even if you have modifiable classes for your value
types, it may be inappropriate to add .encode and .decode
methods to them because the same value type may have multiple different
possible encodings. And if you're using basic Python values (ints,
strings, etc), you don't even have modifiable classes to add methods
to.)
The only real way to conveniently define multiple namespaces in a single
file is to make them classes. But not conventional classes. Classes
used purely as namespaces will never be instantiated; with no instances
and no per-instance state, none of the 'methods' will actually ever use
self and so there is no point in having them called with it. In short
we want a class that has only staticmethod functions, because that
makes them into regular plain old functions:
class uint16(object):
@staticmethod
def encode(val):
return ...
@staticmethod
def decode(data):
return ...
(This class deliberately breaks the usual Python naming convention for classes, because it's nothing like a normal class.)
In some cases it may be useful to have per-class data and some sort of class hierarchy in order to reuse code. That's the time for @classmethod.
PS: having tried this approach out in some of my code, I confess that I'm not sure I like it. The problem for me is an aesthetic one; I think I got the 'namespace' names wrong (I made them too class-like) and I'm not sure I like the @staticmethod's that are sprayed all over the place.
2011-06-12
Why [], {}, and () are cheap in Python
Here's a trick question: are the following two lines semantically equivalent?
a = [] a = list()
In a normal Python environment they certainly have the same effect, but
the answer is that no, they are not semantically equivalent because
there is always the possibility that the name list has been rebound
in the current namespace. As a result, when Python executes the second
line it's obliged to go through the bother of looking up list's name
to determine that it actually is the built-in. By contrast, [] here
can only ever mean one thing in Python; its semantics are fixed and so
CPython is free to fast-path things based on that.
(And it does. [] compiles straight down to a specific bytecode
operation, BUILD_LIST. list() compiles to a lookup and a function
call.)
That's half of the reason that [] is a cheap operation. The other half
is that CPython keeps a stash of preallocated list objects ready to be
reused. As a result, CPython needs to do no memory allocations if it
needs an empty list when this stash is not empty.
(For a non-empty list, it needs to allocate and zero the memory block for the array of pointers to list items.)
Dictionaries have similar optimizations, although the dictionary setup code has slightly more work to do in order to give you a valid empty dictionary.
Tuple creation has even more optimization. CPython keeps a much larger
stash of various sized tuples so that it can almost
certainly immediately give you, eg, a 4-element tuple without any fuss.
Since tuples are immutable, () can be a very special case:
>>> () is () True
In other words, there is only one empty tuple in CPython. Once something creates it, it will live forever. Giving you an empty tuple is as simple as increasing this one empty tuple's reference count and returning it.
(This entry was sparked by a recent comment on this entry; it caused me to get curious about just how
efficient things like '[]' were. I had assumed that CPython had
optimized them, but why assume when I could go find out.)
Sidebar: how much this matters
; python -m timeit 'a = ()' 10000000 loops, best of 3: 0.0315 usec per loop ; python -m timeit 'a = []' 10000000 loops, best of 3: 0.0573 usec per loop ; python -m timeit 'a = list()' 1000000 loops, best of 3: 0.198 usec per loop
And here's some more, with some interesting results:
; python -m timeit -s 'l = list' 'a = l()' 10000000 loops, best of 3: 0.176 usec per loop ; python -m timeit -s 'class A(object): pass' 'a = A()' 10000000 loops, best of 3: 0.16 usec per loop
If you give the A class a do-nothing __init__, it slows down to
around 0.38 usec per loop. I have no idea why 'A()' is so fast without
an __init__; before I measured, I would have expected it to be
clearly slower than list().
(This is all from a 64-bit Fedora 14 machine with Python 2.7.)
PS: this is where Zed Shaw gets very irritated with me, so don't bet the farm on these timing results.
2011-06-11
Some notes on __slots__ and class hierarchies
How __slots__ interacts with class inheritance is a little bit
complicated (although in general everything is covered in the official
documentation).
There are a couple of issues.
First, once a class has a dictionary, it's permanent; you cannot take
the dictionary away in subclasses. This means that if you have a
class hierarchy involving slots-using classes, all of their parents
need to not have a dictionary, ie use __slots__ without including
"__dict__".
(By the way, it's just struck me that this is fundamentally why you
can't add attributes to instances of object(), as I noticed a long
time ago. To have arbitrary attributes added,
object() instances would have to have a dictionary, but that would
mean that you couldn't subclass object() to get a dictionary-less
class.)
However, you can relocate a dictionary-using parent class's instance variables into slots (which may save you memory), because slots don't have to be defined in the same class as where they're used. To make this clear, suppose that you have:
class A(object):
def __init__(self):
self.a = 10
class B(A):
__slots__ = ('a',)
An instance of A will have a __dict__ with 'a' in it, but an
instance of B will have an empty __dict__; a is now stored in a
slot.
Using this very much will probably make everyone's head hurt, especially if you start introspecting things. (Introspection in the face of slots is one of those interesting topics. For example, there is no simple way to see what attributes an object currently has.)
Since a class normally has a dictionary, inheriting from a slots-using
class without specifying __slots__ yourself results in your
instances having dictionaries, as I discovered. As
with the example above, the actual instance variables from your parent
class will still wind up in slots. It is easy to miss this.
It's certainly possible to combine slots with having a class hierarchy,
but it's clearly complex (and has limits). I would
avoid it if possible, partly because some failure modes are basically
silent (your instances have dictionaries when you didn't expect them to;
nothing breaks, but you use more memory than you should). The simple
approach is the easiest; just have all your slots-using classes be
subclasses of object(). Among other things, this makes it easy to be
sure of exactly what slots behavior you're getting, because the class
itself is the only thing that matters.
PS: I'm sure that someone, somewhere, has written a 'slotizer' metaclass
that analyzes the __init__ function in a new class to determine all
instance variables that it sets and then turns them all into slots.
2011-06-10
What __slots__ are good for (or, how to use slots to save memory)
First off and to echo what everyone says, slots are for saving memory. The direct corollary of this is that when you are considering using slots you should measure the memory savings you get with and without them; if it is not significant (to you), you should not use slots because slots have various odd side effects and drawbacks.
(Sometimes people are attracted to those side effects. Please don't be one of those people.)
As the documentation says, slots save you memory by eliminating (or at least minimizing) the per-instance dictionary that objects of your class would normally have. How much memory this saves you is complicated (because dictionary space usage is complicated); a lot depends on how many entries your instance dictionary would normally have. There are two different but closely related things you do with slots: you can disable the creation of an instance dictionary and you can preallocate space for specific instance attributes. These things lead to three different ways of using slots.
The first way is subclassing built in types when all you're doing is
redefining or adding methods. Instances of most built in types don't
have instance dictionaries, but instances of your subclass normally will
(even if you never use it for anything). Using an empty __slots__
will turn this off, making your instances as memory-efficient as
the parent type, and you can do this even for variable-size types
like str and tuple. This is the only thing you can do with
slots if you're subclassing a variable-sized type, one that has a
non-zero __itemsize__ (see here and here for more discussion of this).
(This also applies if you are subclassing a slots-using class. Basically, you can do this any time you're subclassing a class that doesn't already have an instance dictionary.)
Second, if you have a fixed set of instance attributes you can
preallocate space for all of them and entirely eliminate the instance
dictionary. This is the classical case of a non-empty __slots__ with
attribute names in it. In many situations it makes sense to preallocate
instance attributes that you will only rarely use so that you can get
rid of the instance dictionary (and slots arrange for hasattr() et al
to keep on working in this case; a slot that has never had a value set
raises AttributeError when you try to access it).
(Each slot attribute costs you a pointer, either 4 or 8 bytes depending
on your platform. An instance dictionary costs you at least a pointer
plus sys.getsizeof(dict()) bytes. On my 32-bit and 64-bit Linux
machines, this works out to about 35 or 36 attributes taking as much
memory as an empty instance dictionary.)
Third, if you have a subset of instance attributes that are always
present plus some unpredictable attributes that are added dynamically,
you can potentially save space overall by preallocating space for the
always-present attributes while still having an instance dictionary for
the dynamic attributes. To do this you use a __slots__ with your
attribute names plus "__dict__". This is the least straightforward
and most conditional memory savings.
In the abstract, a slot uses about a third of the memory of a dictionary entry. In practice, well, dictionary space usage is complicated so how much memory this will save depends on the typical number of entries in the instance dictionary and how many entries you're able to preallocate. In current versions of CPython 2.x, the breakpoints you're likely to hit are 5 entries and 21 entries. Moving from six entries to five will save you a chunk of memory, but moving from five to four won't. In particular, an instance dictionary with no entries takes up just as much memory as one with five entries.
(Yes, this means that adding slots while keeping an instance dictionary can actually cost you more memory. That's why you really want to measure this sort of stuff.)
PS: I'm ignoring __weakref__ in this discussion. If you need
weak references, you probably already know it and you can add it to
your __slots__.
2011-06-06
How much space a Python dictionary entry uses
There are a number of built in Python types that have variable sized
instances, such as tuples, lists, and longs (or ints, in Python 3).
For some of these, there is a simple answer to how much space a given
instance will take up, based on how many items you put in an instance;
you can determine this from using sys.getsizeof() on an empty instance
to get the base size and then looking at <type>.__itemsize__ to see
the per-item space usage.
Dictionaries are not this simple, because like lists they do not simply add their entries on the end of their basic C level blob of data (well, it's a C structure). As a result a number of things affect the size of a particular dictionary, and in turn affect when adding entries will actually make the dictionary use more memory.
In the abstract, a dictionary entry is three pointers; one for the key, one for the value, and a third to store the value of the key's hash. On a 32-bit machine this makes it 12 bytes and on a 64-bit machine, 24 bytes. However, dictionaries do not add entries one by one; instead there are two complications.
First, the basic dictionary structure comes with enough room for up
to five entries. If you watch sys.getsizeof() as you add keys to a
dictionary, you can see this clearly; an empty dictionary and one with
five entries use the same modest amount of space, but once you add a
sixth entry the space usage jumps.
Second, dictionaries enlarge (and shrink) in discontinuous jumps. Each dictionary has a table for entries, which is always a power of two in size and at most 2/3rds used; when you add an entry that would push the table over 2/3rds used, CPython enlarges the dictionary table instead. Dictionary tables that are not that large quadruple when they're resized; large tables merely double.
Let's put this in concrete terms. On a 64-bit machine with Python 2.7,
sys.getsizeof() reports that a dictionary uses 280 bytes when it has
from zero to five elements. When you add a sixth entry, the reported
size jumps to 1048 bytes. We can actually work out exactly where this
memory is going; the size of the dictionary's C structure is 280 bytes,
and then we need a 32-entry table (four times the size of the 8-entry
table we started with) where every entry is 24 bytes, for 768 bytes
total. This adds up to exactly the 1048 bytes of memory usage that was
reported.
Note that all of this is of course implementation dependent, although
the CPython dictionary implementation has been refined enough that it
doesn't seem to change much. Python 3.1.2 reports the same numbers, for
example. The real lesson here is that sys.getsizeof() means that you
can measure things for yourself if you think it matters in your program.
(Also note that size reported by sys.getsizeof() does not include
the space required for the keys and the values themselves, for various
reasons.)
2011-06-05
A subtle difference between tuples and lists in CPython
There are of course any number of differences between tuples and lists,
but today I discovered a subtle and interesting one: tuples have a
non-zero __itemsize__, and lists do not. Or to put it in a more
comprehensible way, in CPython tuples are variable-size blobs and
lists are not.
Wait, what? Lists are clearly variable sized things, in fact more variable sized that tuples since a tuple is immutable once created. It turns out that that difference is exactly why tuples are variable-sized blobs and lists are not.
Both lists and tuples must store a variable-sized array of items somewhere. As covered in HowSlotsWorkI, the CPython runtime allows the fundamental blob of storage for a Python object instance to have a variable amount of additional data tacked on the end. However, you cannot resize an instance blob once it has been created. Because tuples are immutable, the size of their item array doesn't change once a tuple instance has been created and they can directly store their item array on the end of their C-level blob, saving a separate memory allocation and a pointer. However, lists can and do change the size of their item array after creation and so must store their item array in a separate chunk of storage so they can freely resize it as the particular list grows and shrinks.
One of the consequences of this is the traditional one: you can't
subclass tuple with anything that has a non-empty __slots__.
(In theory this makes tuples slightly more memory efficient than lists. In practice the differences are small enough that they're unlikely to matter unless you have a huge number of items.)
PS: as you might expect, dict is another variable sized Python thing
that has a zero __itemsize__.
2011-06-03
Ints, __slots__, and Python 3
Today, one of my visitors came here after doing a Google search for the error message:
nonempty __slots__ not supported for subtype of 'int'
Right away, I can tell you something about this person; they are either using Python 3 or porting code to Python 3.
(I use 'porting' instead of 'migrating' deliberately.)
At one level, the reason you get this error message is simple and I
covered it in HowSlotsWorkI; in short, you are trying to subclass a
type with a non-zero __itemsize__, one whose
C-level storage is already variable sized, and that doesn't work
because of how CPython implements slots. I can tell that this is a
Python 3 user because only in Python 3 does int have a non-zero
__itemsize__.
Or to put it differently, in Python 2 ints are fixed-size objects and in Python 3 they are not. So, you might ask, why is this so? The answer is that Python 3 fully unified ints and longs.
Python 2's long type is what other languages
would call a 'bignum', an arbitrary-length integer number. Like everyone
else's bignum implementation, Python 2 stores longs in variable sized
objects; this is clearly necessary since a bignum can be arbitrarily
large and the actual length of bignums varies considerably. Python 2's
int is your ordinary good old fashioned fixed-length integer type
(either 32 bits or 64 bits depending on what the underlying C int
type is on your platform).
In Python 2, ints and longs are thus different types but are partially
unified in that you can freely mix them in arithmetic and an int that
grows sufficiently large is automatically converted to a long (however,
small longs do not automatically convert back into ints). In Python
3, they have been completely unified to the point where there is only
one type, int. Since this type has to represent bignums as well as
ordinary small integers, its underlying storage must be variable sized;
since it has a variable-sized storage, it can't be used with slots.
(In a sense the way to deal with this issue when porting to Python 3 is
simple; you just stop using __slots__ on your subclass of int.
Everything works the same and your instances just take up a bit more
memory.)
By the way, before anyone freaks out over the clear memory waste of representing all small integers with bignums, well:
>>> import sys
>>> sys.getsizeof(10)
24
>>> sys.getsizeof(10L)
28
(This is on a 64-bit Linux machine with Python 2.7; a 32-bit machine with 2.6.5 reports 12 bytes and 16 bytes respectively. Using Python 3 on the same 64-bit machine reports a size of 28 bytes.)
(Some people would quibble over this being called 'unification', in that it's more like the Python 2 'long' type was just renamed to be 'int'. In fact this is pretty much exactly what happened at the C API layer.)