Revisiting Python's string concatenation optimization
Back in Python 2.4, CPython introduced an optimization for string concatenation that was designed to reduce memory churn in this operation and I got curious enough about this to examine it in some detail. Python 2.4 is a long time ago and I recently was prompted to wonder what had changed since then, if anything, in both Python 2 and Python 3.
To quickly summarize my earlier entry,
CPython only optimizes string concatenations by attempting to grow
the left side in place instead of making a new string and copying
everything. It can only do this if the left side string only has
(or clearly will have) a reference count of one, because otherwise
it's breaking the promise that strings are immutable. Generally
this requires code of the form 'avar = avar + ...
' or 'avar +=
...
'.
As of Python 2.7.8, things have changed only slightly. In particular
concatenation of Unicode strings is still not optimized; this
remains a byte string only optimization. For byte strings there are two
cases. Strings under somewhat less than 512 bytes can sometimes be grown
in place by a few bytes, depending on their exact sizes. Strings over
that can be grown if the system realloc()
can find empty space after
them.
(As a trivial root, CPython also optimizes concatenating an empty string to something by just returning the other string with its reference count increased.)
In Python 3, things are more complicated but the good news is that
this optimization does work on Unicode strings. Python 3.3+ has a
complex implementation of (Unicode) strings, but it does attempt
to do in-place resizing on them under appropriate circumstances.
The first complication is that internally Python 3 has a hierarchy
of Unicode string storage and you can't do an in-place concatenation
of a more complex sort of Unicode string into a less complex one.
Once you have compatible strings in this sense, in terms of byte
sizes the relevant sizes are the same as for Python 2.7.8; Unicode
string objects that are less than 512 bytes can sometimes be grown
by a few bytes while ones larger than that are at the mercy of the
system realloc()
. However, how many bytes a Unicode string takes
up depends on what sort of string storage it is using, which I think
mostly depends on how big your Unicode characters are (see this
section of the Python 3.3 release notes and PEP 393 for the gory details).
So my overall conclusion remains as before; this optimization is
chancy and should not be counted on. If you are doing repeated
concatenation you're almost certainly better off using .join()
on a list; if you think you have a situation that's otherwise, you
should benchmark it.
(In Python 3, the place to start is PyUnicode_Append()
in
Objects/unicodeobject.c. You'll probably also want to read
Include/unicodeobject.h and PEP 393 to understand this, and
then see Objects/obmalloc.c for the small object allocator.)
Sidebar: What the funny 512 byte breakpoint is about
Current versions of CPython 2 and 3 allocate 'small' objects using an internal allocator that I think is basically a slab allocator. This allocator is used for all overall objects that are 512 bytes or less and it rounds object size up to the next 8-byte boundary. This means that if you ask for, say, a 41-byte object you actually get one that can hold up to 48 bytes and thus can be 'grown' in place up to this size.
|
|