Wandering Thoughts archives

2014-11-12

A wish: setting Python 3 to do no implicit Unicode conversions

In light of the lurking Unicode conversion issues in my DWiki port to Python 3, one of the things I've realized I would like in Python 3 is some way to turn off all of the implicit conversions to and from Unicode that Python 3 currently does when it talks to the outside world.

The goal here is the obvious one: since any implicit conversion is a place where I need to consider how to handle errors, character encodings, and so on, making them either raise errors or produce bytestrings would allow me to find them all (and to force me to handle things explicitly). Right now many implicit conversions can sail quietly past because they're only having to deal with valid input or simple output, only to blow up in my face later.

(Yes, in a greenfield project you would be paying close attention to all places where you deal with the outside world. Except of course for the ones that you overlook because you don't think about them and they just work. DWiki is not in any way a greenfield project and in Python 2 it arrogantly doesn't use Unicode at all.)

It's possible that you can fake this by setting your (Unix) character encoding to either an existing encoding that is going to blow up on utf-8 input and output (including plain ASCII) or to a new Python encoding that always errors out. However this gets me down into the swamps of default Python encodings and how to change them, which I'm not sure I want to venture into. I'd like either an officially supported feature or an easy hack. I suspect that I'm dreaming on the former.

(I suspect that there are currently places in Python 3 that always both always perform a conversion and don't provide an API to set the character encoding for the conversion. Such places are an obvious problem for an official 'conversion always produces errors' setting.)

python/Python3NoImplictUnicodeOption written at 01:31:01; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.