Revisiting Python's string concatenation optimization

October 20, 2014

Back in Python 2.4, CPython introduced an optimization for string concatenation that was designed to reduce memory churn in this operation and I got curious enough about this to examine it in some detail. Python 2.4 is a long time ago and I recently was prompted to wonder what had changed since then, if anything, in both Python 2 and Python 3.

To quickly summarize my earlier entry, CPython only optimizes string concatenations by attempting to grow the left side in place instead of making a new string and copying everything. It can only do this if the left side string only has (or clearly will have) a reference count of one, because otherwise it's breaking the promise that strings are immutable. Generally this requires code of the form 'avar = avar + ...' or 'avar += ...'.

As of Python 2.7.8, things have changed only slightly. In particular concatenation of Unicode strings is still not optimized; this remains a byte string only optimization. For byte strings there are two cases. Strings under somewhat less than 512 bytes can sometimes be grown in place by a few bytes, depending on their exact sizes. Strings over that can be grown if the system realloc() can find empty space after them.

(As a trivial root, CPython also optimizes concatenating an empty string to something by just returning the other string with its reference count increased.)

In Python 3, things are more complicated but the good news is that this optimization does work on Unicode strings. Python 3.3+ has a complex implementation of (Unicode) strings, but it does attempt to do in-place resizing on them under appropriate circumstances. The first complication is that internally Python 3 has a hierarchy of Unicode string storage and you can't do an in-place concatenation of a more complex sort of Unicode string into a less complex one. Once you have compatible strings in this sense, in terms of byte sizes the relevant sizes are the same as for Python 2.7.8; Unicode string objects that are less than 512 bytes can sometimes be grown by a few bytes while ones larger than that are at the mercy of the system realloc(). However, how many bytes a Unicode string takes up depends on what sort of string storage it is using, which I think mostly depends on how big your Unicode characters are (see this section of the Python 3.3 release notes and PEP 393 for the gory details).

So my overall conclusion remains as before; this optimization is chancy and should not be counted on. If you are doing repeated concatenation you're almost certainly better off using .join() on a list; if you think you have a situation that's otherwise, you should benchmark it.

(In Python 3, the place to start is PyUnicode_Append() in Objects/unicodeobject.c. You'll probably also want to read Include/unicodeobject.h and PEP 393 to understand this, and then see Objects/obmalloc.c for the small object allocator.)

Sidebar: What the funny 512 byte breakpoint is about

Current versions of CPython 2 and 3 allocate 'small' objects using an internal allocator that I think is basically a slab allocator. This allocator is used for all overall objects that are 512 bytes or less and it rounds object size up to the next 8-byte boundary. This means that if you ask for, say, a 41-byte object you actually get one that can hold up to 48 bytes and thus can be 'grown' in place up to this size.

Written on 20 October 2014.
« Vegeta, a tool for web server stress testing
Some numbers on our inbound and outbound TLS usage in SMTP »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Oct 20 00:37:10 2014
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.