2012-08-06
The theoretical legality of shadowing builtins in Python
Here is a variant of an example I wrote a few years ago:
eval = eval class A(object): file = file
This creates a module-level eval
name binding that is the same as the
builtin eval
, and a class variable A.file
that is the same as the
builtin file
. All of this works in CPython because of how names and
scopes are used in the CPython bytecode.
Which leads to the theoretical question: is this actually 'portable' in
the sense that this behavior is required by the Python specification?
(As a practical matter I think that any alternate Python interpreter will include this behavior, making it portable in practice; I believe that a certain amount of code out there in the world relies on it.)
I will cut to the chase: the real result of this exercise is that the Python language reference is essentially an informal document, not a standard. You can't use it for language lawyering, not only for the pragmatic reasons mentioned above but also because it's not an attempt at a complete formal specification of Python for implementors; it is more an attempt at some sort of semantic description for Python programmers (combined with a grammar). The rest of this entry is an illustration of that.
The place to look for the answer to our question is the Naming and Binding section of the Language Reference (Python 3 version). Having peered into the Python 2 version, as far as I can tell this behavior is ambiguous for module level code but apparently theoretically not correct in class level code. For class level code, the crucial two sentences are:
Each assignment or import statement occurs within a block defined by a class or function definition or at the module level (the top-level code block).
If a name binding operation occurs anywhere within a code block, all uses of the name within the block are treated as references to the current block. [...]
The second sentence is only correct in CPython for function code
blocks; it's false for other blocks, as we can see in the example with
class A
. The case of module-level code is more ambiguous, because the
same section contains a description of the global
statement which
includes:
[...] Names are resolved in the top-level namespace by searching the global namespace, i.e. the namespace of the module containing the code block, and the builtins namespace, [...]
Although this is in a paragraph about global
, it's sensible to read
it as a general description of how names are resolved in the top-level
(module) namespace. One reading of this combined with the 'name binding'
sentence allows for module-level rebinding; in 'eval = eval
', the
right side eval
may be a reference to the version in the module
level block scope but the lookup rules for such references allow you
to find the builtin eval
. Another reading is that the two sentences
contradict each other.
By the way, this shows one of the problems with standards in practice: you have to read most actual standards for complex things extremely closely and carefully in order to get the right answers. Doing this is unnatural and hard, even more so than reading Unix manpages; mistakes are easy to make and the consequences potentially significant (and hard to test).
PS: given this view of the language reference, you might wonder why I want it to include a description of the attribute lookup order. My answer is that such a description is useful for a Python programmer, if only to put all of the pieces in one place. By contrast painstaking and nitpicking descriptions of arcane bits of namespace oddness are not so useful.
The (possibly apocryphal) story of System V IPC's origins
Back in SystemVSHMLimits I mentioned in passing that I don't have a high opinion of System V IPC in general. This goes back to the general attitude to it from (some) old Unix hands that I absorbed as a young Unix person (I'm not entirely sure where from by now), and especially the story of its origins. Unfortunately I don't have handy access to the necessary research materials right now, so let's call the following story folklore.
(The interesting thing about Unix folklore is that it continues influencing people whether or not it is actually true. This case is one example of that, where a possibly apocryphal story has shaped my dismissive attitude towards a Unix feature.)
If you've looked at System V IPC, you may have scratched your head a bit over it. What we call 'System V IPC' is actually three separate IPC mechanisms which seem barely related to each other, and none of them are very Unixy. The story I heard starts when Bell Labs released V7 Unix. One of the things that happened with V7 is that various groups inside AT&T picked it up for their projects and, in the way of things at the time (especially given V7's not entirely finished state), each of them modified the system a bit to meet their local needs (thus spawning a constellation of internal AT&T Unix versions). As part of this fragmentation, three of the groups doing things with Unix all decided that they needed an IPC mechanism. Working independently, each came up with something that was more or less like one of the current System V IPC systems, implemented it in their version, and was happy.
(Many of these AT&T modifications were not very Unixy because they were often designed by people who just wanted to get something done with the system and who had not really absorbed the Unix philosophy. Of course this was not unique to AT&T; many groups made not-very-Unixy changes to V7. Some would say that this included UC Berkeley when they made BSD Unix.)
Then AT&T management showed up; someone decreed that AT&T was spending too much effort maintaining too many different versions of Unix. All of the AT&T Unixes were to be unified into one, managed and maintained by a single group. Of course, as part of this all of the changes that all of the groups had made needed to be merged back into a common version. This was, shall we say, a somewhat political process; many groups were not happy at the thought of losing 'their' features and so pushed for them to be included in the official AT&T Unix, no matter how well they fit. This included all three groups that had invented their own IPC mechanism; none of them were willing to compromise and each of them insisted that their mechanism just had to be included. In the end the gatekeepers for AT&T Unix threw up their hands and said 'okay, you win, we'll support all three different IPC mechanisms'.
And that, so the story goes, is what gave us System V IPC. Because of a political compromise we got stuck with three different and partially overlapping IPC mechanisms, all of them not very Unixy and all of them with oddities. Is it any wonder that old Unix hands look down on the whole mess?
(Or at least that one can make a story where old Unix hands look down on the mess.)