When I've interned Python strings
September 9, 2012
One day, I read the following on reddit's r/python:
and a reply:
This is exactly the case that I ran into at one point, with one of my Python daemons which was being used in a fairly demanding situation where I wanted to minimize the memory usage and memory churn over time. My program had three important features for having this make sense: it had big files to parse, there was a lot of repeated text in the files (text that had to be saved), and the files were changed a bit and reloaded on a relatively frequent basis.
Interning repeated text is an obvious win for memory usage, if you have a decent amount of it (measuring helps to know this). In my situation it also helped avoid memory churn during reloads of the files. When you reload a configuration file, the usual case is that almost all of the text is the same as it was the last time; this creates a lot of text and string duplication as you re-parse the file and get the same results as last time for most of it. Interning strings here insures that you do not create a boatload of new strings every time you reload the configuration file (and discard a boatload of old ones); instead you're likely to create only a few new ones and discard a few old ones.
Of course all of this care and interning may be a micro-optimization that doesn't make any difference in your actual circumstance. Interning strings is a performance optimization, so like any other optimization you should measure it to see if it gives you any benefits.
(For my program in our specific situation, it was one of a number of things that did make a visible difference.)
* * *
Atom feeds are available; see the bottom of most pages.