I like the Python 3 string .translate()
method
Suppose, hypothetically, that you wanted to escape the &
character in
text as a HTML entity:
txt = txt.replace('&', '&')
Okay, maybe there's a character or two more:
... txt = txt.replace('<', '<')
And so it goes. The .replace()
string method is an obvious and long
standing hammer, and I've used it to do any number of single-character
replacements years (as well as some more complicated multi-character
ones, such as replacing \r\n
with \n
).
Recently I was working on my Exim attachment type logger, and more
specifically I was fixing its handling of odd characters in the
messages that it logged as part of making it work in Python 3. My Python 2 approach to this was
basically to throw repr()
at the problem and forget about it, but
using repr()
for this is a hack (especially in Python 3). As
part of thinking about just what I actually wanted, I decided that
I wanted control characters to be explicitly turned into some sort
of clear representation of themselves. This required explicitly
remapping and replacing them, and I needed to do this to a fair
number of characters.
At first I thought that I would have to do this with .replace()
(somehow)
or a regular expression with a complicated substitution or something
equally ugly, but then I ran across the Python 3 str.translate()
method. In Python 2 this method is clearly very optimized but also
only useful for simple things, since you can only replace a character
with a single other character. In Python 3, .translate()
has
become much more general; it takes a dictionary of translations and
the values in the dictionary don't have to be single characters.
So here's what my handling of control characters now looks like:
# ctrl-<chr> -> \0xNN escape ctrldict = {c: "\\x%02x" % c for c in range(0,32)} ctrldict[127] = "\\x7f" # A few special characters get special escapes ctrldict[ord("\n")] = "\\n"; ctrldict[ord("\r")] = "\\r"; ctrldict[ord("\t")] = "\\t" ctrldict[ord("\\")] = "\\\\" def dectrl(msg): return msg.translate(ctrldict)
That was quite easy to put together, it's pretty straightforward
to understand, and it works. The only tricky bit was having to read
up on how the keys for the translation dictionaries are not characters
but the (byte) ordinal of each character (or the Unicode codepoint
ordinal if you want to be precise). Once I found .translate()
,
the whole exercise was much less annoying than I expected.
Python 2's string .translate()
still leaves me mostly unenthused,
but now that I've found it, Python 3's has become an all purpose
tool that I'm looking forward to making more use of. I have any
number of habitual uses of .replace()
that should probably become
.translate()
in Python 3 code. That you can replace a single
character by multiple characters makes .translate()
much more
versatile and useful, and the simplified calling sequence is nice.
(Python 3's version merges the Python 2 deletechars
into the
translation map, since you can just map characters to None
to
delete them.)
PS: Having read the documentation a bit, I now see that str.maketrans()
is the simple way to get around the whole ord()
stuff that I'm
doing in my code. Oh well, the original code is already written.
But I'll have to remember maketrans()
for the future.
(The performance and readability of .replace()
versus .translate()
is something that can be measured (for performance) and debated
(for readability). I haven't made any performance measurements and
I don't really care for most of my code. As far as readability,
probably I'll conclude that .translate()
wins if I'm doing more
than one or two substitutions.)
|
|