2008-07-28
What you can (probably) count on for concurrency in Python
A comment on the previous entry about how builtins are atomic asked a really good question:
Is there actually a guarantee in CPython that they're atomic, or is that a side-effect of implementation?
There are two answers, at least in my opinion. The legalistic answer is that there is no guarantee at all, because there's nothing in the documentation. But this is because the CPython documentation doesn't talk about concurrency issues at all; even the thread module documentation doesn't really say anything. In theory you can go from 'the documentation doesn't promise anything' to 'the documentation allows an implementation to do anything if you don't explicitly lock'. However, this is robot logic, not human logic; the real truth is that threading is not really a core Python feature (even today it is in the 'optional operating system' modules section), and the documentation reflects that.
I maintain that the practical answer is that it is guaranteed, at least for CPython, despite the lack of explicit documentation to that effect. Whether legalistic purists like it or not, in practice if you do not document something and then expose a single implementation to people, that implementation's long standing behavior becomes implicit documentation. This is especially so if the behavior is quite useful and the alternatives painful.
Or in short: it's guaranteed because there's no documentation about it either way and that's how CPython has always behaved. And it's really handy.
And as a pragmatic matter, anything that wants to be compatible with existing CPython threaded code is going to have to mimic this behavior (given Python 3K, this is less reassuring than it seems). However, IronPython, Jython, and any other similar projects out there may be not all that interested in supporting existing CPython threaded code; I suspect that they feel that they have better native threading models that you should use instead.
(Although for what it is worth IronPython does not document any differences in what is and isn't atomic for basic types in their list of differences between IronPython and CPython.)
(This is one of those entries that is going to get me pilloried.)
Another advantage of Python builtins
I've talked before about the speed advantage that Python builtins have. But speed isn't the only way that Python
privileges things written at the C level; as dict.setdefault()
illustrates, Python makes a useful
atomicity guarantee for them that it does not make for methods written
in Python itself.
Does this guarantee matter? I think that it does, because it is simultaneously useful and cheap. A concurrent program can avoid locking when dealing with shared data built carefully from builtin types, and duplicating the effects of this in your own Python code would be fairly expensive, especially in a non-threaded program.
(Given the limitations of Python concurrency and thus that most Python programs aren't threaded, performing very well in a non-threaded environment is quite important in practice.)
This also illustrates once again that it can matter a lot to know how things are implemented. If you are writing a threaded program, knowing whether method calls on a shared data structure are concurrency safe or need to be guarded with locks is vital. If a module's documentation gives you no information on thread-safety (and few do), you really need to know how it is implemented, and a straightforward Python implementation of it is not at all equivalent to a C implementation.
(There are some tricky cases where a module is effectively implemented partly in C and partly in Python. Fortunately most such commingled modules seem to do relatively little in Python.)