What the members of a Unicode conversion error object areNovember 13, 2008
The codecs module's
The substring Also worth noting is that when things are being decoded to Unicode, the first element of the tuple your error handler returns (assuming that you want to keep going) has to be a unicode string. Given all of this, we can put together a really simple decoding error handler that just replaces undecodable bytes with backslashed hex versions of themselves:
Note that we are playing fast and loose with multi-byte sequences; instead of handling the entire sequence, we just replace the first byte and restart decoding after it. In some situations this can malfunction and produce garbled output. (One would think that the standard "backslashreplace" error handler would already do this, but unfortunately it doesn't handle decoding errors, only encoding errors.) Comments on this page:
From 65.172.155.230 at 2008-11-13 18:18:44:
|
These are my WanderingThoughts GettingAround This is part of CSpace, and is written by ChrisSiebenmann. * * * Atom feeds are available; see the bottom of most pages. Categories: links, linux, programming, python, snark, solaris, spam, sysadmin, tech, unix, web |