Wandering Thoughts archives

2008-07-28

What you can (probably) count on for concurrency in Python

A comment on the previous entry about how builtins are atomic asked a really good question:

Is there actually a guarantee in CPython that they're atomic, or is that a side-effect of implementation?

There are two answers, at least in my opinion. The legalistic answer is that there is no guarantee at all, because there's nothing in the documentation. But this is because the CPython documentation doesn't talk about concurrency issues at all; even the thread module documentation doesn't really say anything. In theory you can go from 'the documentation doesn't promise anything' to 'the documentation allows an implementation to do anything if you don't explicitly lock'. However, this is robot logic, not human logic; the real truth is that threading is not really a core Python feature (even today it is in the 'optional operating system' modules section), and the documentation reflects that.

I maintain that the practical answer is that it is guaranteed, at least for CPython, despite the lack of explicit documentation to that effect. Whether legalistic purists like it or not, in practice if you do not document something and then expose a single implementation to people, that implementation's long standing behavior becomes implicit documentation. This is especially so if the behavior is quite useful and the alternatives painful.

Or in short: it's guaranteed because there's no documentation about it either way and that's how CPython has always behaved. And it's really handy.

And as a pragmatic matter, anything that wants to be compatible with existing CPython threaded code is going to have to mimic this behavior (given Python 3K, this is less reassuring than it seems). However, IronPython, Jython, and any other similar projects out there may be not all that interested in supporting existing CPython threaded code; I suspect that they feel that they have better native threading models that you should use instead.

(Although for what it is worth IronPython does not document any differences in what is and isn't atomic for basic types in their list of differences between IronPython and CPython.)

(This is one of those entries that is going to get me pilloried.)

BuiltinsConcurrencyGuarantee written at 23:23:28; Add Comment

Another advantage of Python builtins

I've talked before about the speed advantage that Python builtins have. But speed isn't the only way that Python privileges things written at the C level; as dict.setdefault() illustrates, Python makes a useful atomicity guarantee for them that it does not make for methods written in Python itself.

Does this guarantee matter? I think that it does, because it is simultaneously useful and cheap. A concurrent program can avoid locking when dealing with shared data built carefully from builtin types, and duplicating the effects of this in your own Python code would be fairly expensive, especially in a non-threaded program.

(Given the limitations of Python concurrency and thus that most Python programs aren't threaded, performing very well in a non-threaded environment is quite important in practice.)

This also illustrates once again that it can matter a lot to know how things are implemented. If you are writing a threaded program, knowing whether method calls on a shared data structure are concurrency safe or need to be guarded with locks is vital. If a module's documentation gives you no information on thread-safety (and few do), you really need to know how it is implemented, and a straightforward Python implementation of it is not at all equivalent to a C implementation.

(There are some tricky cases where a module is effectively implemented partly in C and partly in Python. Fortunately most such commingled modules seem to do relatively little in Python.)

BuiltinsConcurrencyAdvantage written at 00:27:30; Add Comment

2008-07-26

dict.setdefault() as a concurrency primitive

One of the things I do in the back of my mind is to keep my eyes open for Python things that could be useful for concurrency (especially since I still have a concurrency problem that I have to solve one of these days). Because of what the GIL protects, one of the useful things to look for is interesting operations on builtin types.

All of which is a roundabout way of saying that it recently struck me that dict.setdefault() is a limited test-and-set operation. All the ingredients are there: setdefault() tests the 'value' of something, will replace it with your value if it matches the test, and returns the result. It is limited in that you can only test for a single value, 'unset'.

As an example, we can use this to write acquire and release as:

def acquire(d, k, me):
     r = d.setdefault(k, me)
     return r == me

def release(d, k):
     del d[k]

(This has one danger; acquire() will succeed when you already have acquired the resource, but release() does not guard against that. You could solve this by having acquire() make up a random identifier instead of using one that the caller comes up with.)

While it may seem perverse to come up with new concurrency primitives when Python already has some, these work even if you are not using threads (for example, if you are using signals inside a single process). And yes, it is a bit perverse, but I like the challenge of this sort of thing.

(The thread module doesn't say that its locks have thread-related side effects if you always invoke them as non-blocking, but it doesn't guarantee that they don't. And forgetting to make an acquire non-blocking will blow your foot off in a non-threaded program.)

SetdefaultAsLockingPrimitive written at 00:22:25; Add Comment

By day for July 2008: 26 28; before July; after July.

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.