2006-04-17
What do variable names mean (in Python and elsewhere)?
What are variables?
In many languages, variables are just labels for storage locations, for a spot in memory. This ranges from pure machine level storage, in languages like assembler, up through storage locations augmented with size and type information, such as in C, all the way to Perl, where the 'storage locations' are actually fairly abstract. But they're still there, even in Perl, partly because it's a comforting way to think about it: variables are where you put things.
In Python and some other languages, variables are bindings; they are references to something (often the somethings gets called 'objects', which can result in confusion with the sort of objects that are instances of classes). The objects have a life independent of the variables, and multiple variables can be bound to the same object, and you can have objects that aren't 'in' any variables at all.
In the storage model, you make copies of things all the time: 'a = b'
puts a copy of the contents of b's storage location into a's storage
location (possibly scrunching b up a lot if it doesn't fit). In the
binding model, copies are rare and always explicit; 'a = b' simply
makes a have the same binding as b, so they both refer to the same
object.
The binding model is a lot more abstract than the storage location model; since the storage model is how computers work, it's often easier for people to wrap their minds around. It's also harder to clearly see the binding model, because the two models look the same as long as you're only dealing with immutable objects. (It may save you memory, but that's not something that most people notice.)
(You can even confuse the two models when you're programming in a binding language and write code that assumes the storage model when it's getting the binding model; this causes peculiar bugs.)
Capable storage model languages almost always grow explicit bindings of some sort, whether they call them pointers (C) or references (Perl). Once you've got explicit bindings, you can of course implement a binding language, because all a binding language really is is a language where all variables are actually pointers to otherwise anonymous blobs. (At this point you really want some sort of automatic garbage collection.)
Only languages with storage models have 'pass by value' versus 'pass by reference' issues with subroutine calls. When you have a binding language all subroutine calls should implicitly be pass by reference, since that's what passing a binding around is. (You can create a perverse binding language where subroutine calls make a copy of the objects the arguments point to and then pass in bindings to the copies. But you can make all sorts of perverse languages.)
Python is a 'context-free' binding language, where a variable's
binding does not depend on its context. There are binding languages
where this is not the case, so for example foo is one thing when taken
as a function and another thing when taken as a variable; I believe
Common Lisp is one big example. I personally prefer the Python way,
because it's simpler, more regular, and easier to understand.
(For one technical discussion of the Lisp issues, see here.)
There are probably hybrid storage/binding languages out there. I think the easiest sort to construct is a statically typed language where 'primitive' types use a storage model and everything else uses a binding model. (To some extent this happens under the hood in most high performance binding language implementations.)
(Trivia: this entry is the 'another blog entry' from my comment on this entry. Sometimes my entry-writing mills grind very, very slowly.)
Sidebar: explanations of Python's object model
If you want to read explanations of Python's object model itself, try Python Objects, How to think like a Pythonista, or this discussion.
2006-04-14
Why del has to be a Python builtin
A posting today
to the LiveJournal python_dev community wound up sort of asking
why del has to be a Python builtin instead of a method. At first I
thought this had a nice simple answer, but the more I thought about it
the less obvious it became, and by the time I'd worked out a satisfying
answer I'd had to really think about how Python fits together.
The short summary is that del needs to be a builtin because Python's
object model is based on references and namespaces, and Python doesn't
have a way to refer to your current namespaces (they have no explicit
name).
Unlike in some languages, del does not literally delete objects;
instead it removes references to them (which may as a side effect
delete the object). In other words, del is not an object manipulation
operation, it is a namespace manipulation operation.
So the correct translation of 'del d' to a method call version cannot
be 'd.delete()'; it has to be 'namespace.delete("d")'. But Python
provides no magic namespace name for you to use for this operation,
because del finds it implicitly. (Compound cases give the namespace;
'del d.c' is explicitly operating on d's namespace and duly
translates to d.__delattr__("c").)
There are at least three such anonymous Python namespaces: globals,
function locals (including closures), and the namespace of a class
as it is being built. Of these, only the global namespace is really
accessible; the latter two can only be gotten with magic, and even
then you can't use __delattr__ on them. del itself does internal
interpreter magic to make it all go.
(Note that locals() is explicitly not a reference to the function's
namespace; it is a dictionary copy of the function's current namespace.
This is because functions don't implement their namespaces using Python
dictionaries. Even for things that do implement their namespace using
a Python dict, it's best considered an implementation detail.)
Sidebar: all but deleting yourself
While an object cannot literally delete itself in Python, it can try
real hard, by scrubbing its __dict__ and changing its __class__ to
something that does nothing. (Unfortunately you can't set your __class__
to plain object, as far as I can tell; you'll need to make an actual
do-nothing class.)
2006-04-07
A pleasing Python regularity with __future__
The other night I was writing a Python program that wanted to divide two integers and get a floating point result. Normally integer division in Python produces integers, following a C style model of distinct numeric types; however, Python is slowly migrating towards a model where the types of numbers are more of an implementation detail.
The general way to get early access to an incompatible change like
this is a magic statement at the start of your module: from
__future__ import whatever. I knew that the change in number
behavior could be gotten this way, but I couldn't remember what
the magic whatever for it was.
After a moment's thought, I decided to try something:
>>> import __future__
>>> dir(__future__)
Despite the magic involved with __future__, this worked; I got a
list of all of the magic stuff I could enable, and easily picked
'division' out as what I wanted.
It turns out that in addition to the magic in the CPython interpreter,
there is a real __future__.py module. When you import it normally
you get the regular module instead of the special magic interpreter
handling, and get to introspect it and so on as usual.
And talking of special magic:
>>> from __future__ import braces
File "<stdin>", line 1
SyntaxError: not a chance
(Other nonexistent future features get a different error message.
And you specifically can't do 'from __future__ import *'.)
2006-04-05
Keeping up with new Python features
I have a new resolution: every so often, I'm going to read over the current builtins section of the Python documentation.
I've come to this because recently I was reading through a page on
Python idioms to see if
it had anything new, and stumbled over the mention of an enumerate()
builtin, new in Python 2.3. Well, I'm using Python 2.3, and I hadn't
remembered enumerate(), and I could have used it recently. Whoops.
I do try to keep up with release notes and other sources of Python news and discussion (eg, Planet Python). But it's easy to forget about smaller things (or only remember them vaguely) in the time between I read about a new bit and when I can use it. Clearly I need to give myself a refresher every so often.
(If I was really ambitious I would periodically scan the entire Python Library Reference, at least reading the one sentence description of all the modules. I don't think I'm that energetic, though.)