I like the Python 3 string .translate() method

October 20, 2016

Suppose, hypothetically, that you wanted to escape the & character in text as a HTML entity:

txt = txt.replace('&', '&')

Okay, maybe there's a character or two more:

...
txt = txt.replace('<', '&lt;')

And so it goes. The .replace() string method is an obvious and long standing hammer, and I've used it to do any number of single-character replacements years (as well as some more complicated multi-character ones, such as replacing \r\n with \n).

Recently I was working on my Exim attachment type logger, and more specifically I was fixing its handling of odd characters in the messages that it logged as part of making it work in Python 3. My Python 2 approach to this was basically to throw repr() at the problem and forget about it, but using repr() for this is a hack (especially in Python 3). As part of thinking about just what I actually wanted, I decided that I wanted control characters to be explicitly turned into some sort of clear representation of themselves. This required explicitly remapping and replacing them, and I needed to do this to a fair number of characters.

At first I thought that I would have to do this with .replace() (somehow) or a regular expression with a complicated substitution or something equally ugly, but then I ran across the Python 3 str.translate() method. In Python 2 this method is clearly very optimized but also only useful for simple things, since you can only replace a character with a single other character. In Python 3, .translate() has become much more general; it takes a dictionary of translations and the values in the dictionary don't have to be single characters.

So here's what my handling of control characters now looks like:

# ctrl-<chr> -> \0xNN escape
ctrldict = {c: "\\x%02x" % c for c in range(0,32)}
ctrldict[127] = "\\x7f"
# A few special characters get special escapes
ctrldict[ord("\n")] = "\\n"; ctrldict[ord("\r")] = "\\r";
ctrldict[ord("\t")] = "\\t"
ctrldict[ord("\\")] = "\\\\"

def dectrl(msg):
  return msg.translate(ctrldict)

That was quite easy to put together, it's pretty straightforward to understand, and it works. The only tricky bit was having to read up on how the keys for the translation dictionaries are not characters but the (byte) ordinal of each character (or the Unicode codepoint ordinal if you want to be precise). Once I found .translate(), the whole exercise was much less annoying than I expected.

Python 2's string .translate() still leaves me mostly unenthused, but now that I've found it, Python 3's has become an all purpose tool that I'm looking forward to making more use of. I have any number of habitual uses of .replace() that should probably become .translate() in Python 3 code. That you can replace a single character by multiple characters makes .translate() much more versatile and useful, and the simplified calling sequence is nice.

(Python 3's version merges the Python 2 deletechars into the translation map, since you can just map characters to None to delete them.)

PS: Having read the documentation a bit, I now see that str.maketrans() is the simple way to get around the whole ord() stuff that I'm doing in my code. Oh well, the original code is already written. But I'll have to remember maketrans() for the future.

(The performance and readability of .replace() versus .translate() is something that can be measured (for performance) and debated (for readability). I haven't made any performance measurements and I don't really care for most of my code. As far as readability, probably I'll conclude that .translate() wins if I'm doing more than one or two substitutions.)

Written on 20 October 2016.
« Writing in Python 3 has been a positive experience so far
The shutdown command is a relic of BSD's historical origins »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Oct 20 00:07:24 2016
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.