Wandering Thoughts archives

2011-11-25

Python instance dictionaries, attribute names, and memory use

In a comment on my entry on what __slots__ are good for, Max wrote:

On the other hand, having __slots__ saves the strings that the instance dictionary entries would point to for the attribute names. On a 4 byte string platform, that adds up quickly too.

Although one might naturally think that this is the case, CPython is actually sufficiently clever that it is not so; using __slots__ doesn't save you any memory for attribute names because the string values of attribute names are already only stored once. However understanding how and why requires a reasonable amount of knowledge about CPython internals.

(Or you have to know to look at the documentation for the intern() function, which casually mentions this in passing.)

Like many similar languages, Python has string interning and the CPython internals make liberal use of interned strings for any code-related string that might look like it's going to be repeated. Attribute names are one such example of this; starting right in the code itself, all attribute names are fully interned. So you always have the same set of interned strings for attribute names regardless of how the attributes are stored and regardless of how many instances of the class you have.

(This is quite similar to part of the concept of 'symbols' in languages like Lisp and Ruby, although both of those expose symbols directly to user-level code.)

More specifically, all names used directly as attributes are interned. There are a number of ways where you can use real strings as attribute names and these will not be interned. The most prominent example is actually __slots__ itself, although things get confusing here. Consider:

class A(object):
  __slots__ = ('attrone', 'attrtwo')

  def __init__(self):
    self.attrone = 10

  def report(self):
    return self.attrone

The two string literals in __slots__ are not interned. However, the same string value ('attrone') is interned in __init__ and report(). If you have lots of code that all refers to '<something>.attrone', all of it will do all attribute lookups using the same interned string value.

(Note that attribute names are interned globally, not on a per-class basis or the like. The 'attrone' in the attribute name module1.cls1.attrone is the same interned string value as in module2.cls2.attrone.)

An even more complicated example can be had with 'setattr(obj, "astring", value)'. If you write this twice in two different functions, the "astring" literals are not interned (and thus are different strings). However, 'astring' as the attribute name in obj.astring is interned (this is done in setattr()). If you call one function with one object and the other function with another object, the attribute name is still a common interned string.

(In theory direct manipulation of obj.__dict__ might allow you to create a non-interned attribute name on an instance, although actual code that accesses it as obj.attr would use an interned version.)

If you are testing this, note that all single-character strings are interned for you; you need to use multi-character attribute names to avoid false positives.

(This is undoubtedly far more about this issue than most people want to know. I'm peculiar that way; I can't resist peeking under the hood.)

Sidebar: interned versus non-interned versions of a string value

In some languages, once you intern a string value all future occurrences of that string value, anywhere, are automatically converted to the interned version. CPython doesn't work this way; instead, something has to explicitly convert a string value into an interned version of it and otherwise string values are left alone. It's thus entirely possible, even easy, to have an interned version of a string value as well as one or more non-interned versions of it.

python/InstanceStringUsage written at 00:11:49; Add Comment

These are my WanderingThoughts
(About the blog)

Full index of entries
Recent comments

This is part of CSpace, and is written by ChrisSiebenmann.
Twitter: @thatcks

* * *

Categories: links, linux, programming, python, snark, solaris, spam, sysadmin, tech, unix, web

This is a DWiki.
GettingAround
(Help)

Search:
By day for November 2011: 2 3 4 5 6 7 8 9 10 11 12 13 15 16 17 19 20 21 23 24 25 26 27 28 29 30; before November; after November.

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.