Wandering Thoughts archives

2022-02-27

Python's os.environ is surprisingly liberal in some ways

The way you access and modify Unix environment variables in Python programs is generally through os.environ; Python 3 being Python 3, sometimes you need os.environb. In Unix, what can go in the environment is somewhat fuzzy and while Python has some issues with character encodings, it's otherwise surprisingly liberal in a number of ways.

The first way that os.environ is liberal is that it allows environment variables to have blank values:

>>> os.environ["FRED"] = ""
>>> subprocess.run("printenv")
[...]
FRED=
[...]

It's possible to do this with some Unix shells as well, but traditionally environment variables are generally assumed to have non-blank values. Quite a lot of code is likely to assume that a blank value is the same as the variable being unset, although in Python you can tell the difference since os.environ raises KeyError if the environment variable doesn't exist at all.

A bigger way that os.environ is liberal is that it will allow you to use non-traditional characters in the names of environment variables:

>>> os.environ["FRED/BAR"] = "Yes"
>>> subprocess.run("printenv")
[...]
FRED/BAR=Yes

On Unix, setting an environment variable uses setenv(), which generally only requires that you avoid '='. Python specifically checks for an '=' in your name so that it can generate a specific error, and otherwise passes things through.

Python itself doesn't particularly restricted environment variable names beyond that. As a result you can do all sorts of odd things with environment variable names, including putting spaces and Unicode into them (at least in a UTF-8 environment). Some or many of these environment variables won't be accessible to a shell program, but not everything that interprets the environment follows the shell's rules.

The case where this came up for me recently was in Dovecot post-login scripting, which in some cases can require you to create environment variables with '/' in their names. Typical shells disallow this, but I was quite happy to find that Python was perfectly willing to go ahead and everything worked fine.

OsEnvironLiberal written at 23:24:29;

2022-02-21

Python's Global Interpreter Lock is not there for Python programmers

I recently read Evan Ovadia's Data Races in Python, Despite the Global Interpreter Lock (via), which discusses what its title says. Famously, the Global Interpreter Lock only covers the execution of individual Python bytecodes (more or less), and what this does and doesn't cover is tricky, subtle, and depends on the implementation details of Python code. For example, making a Python class better and more complete can reduce what's safe to do with it without explicit locking.

These days, I've come to feel that the Global Interpreter Lock is not really for Python programmers. Who the GIL is for is the authors of CPython packages that are written in C (or in general any compiled language). The GIL broadly allows authors of those packages to not implement any sort of locking in their own code, even when they're manipulating C level data structures, because they're guaranteed that their code will never be called concurrently or in parallel. This extends to the Python standard objects themselves, so that (in theory) Python dicts don't need any sort of internal locks in order to avoid your CPython process dumping core or otherwise malfunctioning spectacularly. Concurrency only enters into your CPython extension if you explicitly release the GIL, and the rules of the CPython API make you re-take the GIL before doing much with interpreter state.

(There are probably traps lurking even for C level extensions that allow calls back into Python code to do things like get attributes. Python code can come into the picture in all sorts of places. But for simple operations, you have a chance.)

Avoiding internal locks while looking into or manipulating objects matters a lot for single threaded performance (Python code looks into objects and updates object reference counts quite frequently). It also makes the life of C extensions simpler. I'm not sure when threading was added to Python (it was a very long time ago), but there might have been C extensions that predated it and which would have been broken in multi-threaded programs if CPython added a requirement for internal locking in C-level code.

The Global Interpreter Lock can be exploited by Python programmers; doing so is even fun. But we really shouldn't do it, because it's not designed for us and it doesn't necessarily work that well when we try to use it anyway. Python has a variety of explicit locking available in the standard library threading module, and we should generally use them even if it's a bit more annoying.

(Honesty compels me to admit that I will probably never bother to use locking around 'simple' operations like appending to a list or adding an entry to a dict. I suspect at least some people would even see using explicit locks for that (in threaded code) to be un-Pythonic.)

GILWhoItIsFor written at 22:28:58;


Page tools: See As Normal.
Search:
Login: Password:

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.