When I've interned Python strings

September 9, 2012

One day, I read the following on reddit's r/python:

Is manually interning a string every a good idea? I'm having trouble thinking of a use case where the cost is justified outside the compilation cycle.

and a reply:

I've yet to run into a situation where it was, but I guess it could be in a very restricted number of situations, such as a small set of non-literal strings generated over and over again (e.g. some sort of parser), interning could reduce memory pressure.

This is exactly the case that I ran into at one point, with one of my Python daemons which was being used in a fairly demanding situation where I wanted to minimize the memory usage and memory churn over time. My program had three important features for having this make sense: it had big files to parse, there was a lot of repeated text in the files (text that had to be saved), and the files were changed a bit and reloaded on a relatively frequent basis.

Interning repeated text is an obvious win for memory usage, if you have a decent amount of it (measuring helps to know this). In my situation it also helped avoid memory churn during reloads of the files. When you reload a configuration file, the usual case is that almost all of the text is the same as it was the last time; this creates a lot of text and string duplication as you re-parse the file and get the same results as last time for most of it. Interning strings here insures that you do not create a boatload of new strings every time you reload the configuration file (and discard a boatload of old ones); instead you're likely to create only a few new ones and discard a few old ones.

Of course all of this care and interning may be a micro-optimization that doesn't make any difference in your actual circumstance. Interning strings is a performance optimization, so like any other optimization you should measure it to see if it gives you any benefits.

(For my program in our specific situation, it was one of a number of things that did make a visible difference.)

Written on 09 September 2012.
« When you can log bad usernames for failed authentications
The core difference between Debian source packages and RPMs »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Sep 9 00:53:22 2012
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.