Unix environment variables (and the environment) are a fuzzy thing

February 26, 2022

Unix environment variables generally look like a straightforward thing initially, but I have recently been reminded that they are actually somewhat more fuzzy and under-defined in practice than you might think.

Generally speaking, the kernel API's only requirement is that the environment be an array of null-terminated strings, generally of a limited total size. Further interpretation of the contents of these strings is left up to user level programs. Almost everything interprets these strings as the names and values of 'environment variables', with the name and value separated by an '='. Although the kernel API allows for strings of 'STRING' or 'STRING=', I think most Unix programs will either ignore them or give you odd results if you ask about them.

Given the 'name=value' format expected of the environment, in theory the only character you can't put in the name is an '=' (and a null, which can't be put in environment variables at all). In practice most Unix shells limit what characters they will accept in the names of environment variables down to a relatively small set. POSIX probably sets a minimum requirement on this but I haven't looked it up (okay, now I have, it's here). Other programs that manipulate the environment (or create it from scratch) may be more liberal about what characters they allow. Unix shells (and other programs) may or may not pass through such oddly named environment variables, but not counting on it is probably your wisest course.

(There's no way of quoting environment variable names or special characters in them, although there could be. Probably no one's ever seen the need.)

By convention, the names of environment variables are in upper case. This is only a convention; pretty much every Unix shell is happy to deal with lower case environment variables. It's a social expectation among Unix people that pretty much all officially documented environment variables are in upper case (which is to say, environment variables that are part of the API of your system). I suspect that people think of lower case environment variables as being for local, internal use only, at best.

Many programs will interact with the environment (to the extent that they do) through the C library getenv() function, and will inherit any quirks, limitations, or peculiarities that it has. Some, like Python, can have additional restrictions like character set encoding issues (see os.environ). Others, like Go, have their own implementation that's independent of the C library one.

Dynamically linked programs almost always use the standard C library runtime loader (even if they're written in other languages), and on most Unixes that will check environment variables through the C library getenv() and similar functions. In programs that are setuid or otherwise executing in what the runtime loader and the C library think is a special situation, this may result in the C library sanitizing the environment in various ways.

Written on 26 February 2022.
« The varying sizes of images on the web today, and remembering that
Python's os.environ is surprisingly liberal in some ways »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Feb 26 23:24:01 2022
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.