What do variable names mean (in Python and elsewhere)?

April 17, 2006

What are variables?

In many languages, variables are just labels for storage locations, for a spot in memory. This ranges from pure machine level storage, in languages like assembler, up through storage locations augmented with size and type information, such as in C, all the way to Perl, where the 'storage locations' are actually fairly abstract. But they're still there, even in Perl, partly because it's a comforting way to think about it: variables are where you put things.

In Python and some other languages, variables are bindings; they are references to something (often the somethings gets called 'objects', which can result in confusion with the sort of objects that are instances of classes). The objects have a life independent of the variables, and multiple variables can be bound to the same object, and you can have objects that aren't 'in' any variables at all.

In the storage model, you make copies of things all the time: 'a = b' puts a copy of the contents of b's storage location into a's storage location (possibly scrunching b up a lot if it doesn't fit). In the binding model, copies are rare and always explicit; 'a = b' simply makes a have the same binding as b, so they both refer to the same object.

The binding model is a lot more abstract than the storage location model; since the storage model is how computers work, it's often easier for people to wrap their minds around. It's also harder to clearly see the binding model, because the two models look the same as long as you're only dealing with immutable objects. (It may save you memory, but that's not something that most people notice.)

(You can even confuse the two models when you're programming in a binding language and write code that assumes the storage model when it's getting the binding model; this causes peculiar bugs.)

Capable storage model languages almost always grow explicit bindings of some sort, whether they call them pointers (C) or references (Perl). Once you've got explicit bindings, you can of course implement a binding language, because all a binding language really is is a language where all variables are actually pointers to otherwise anonymous blobs. (At this point you really want some sort of automatic garbage collection.)

Only languages with storage models have 'pass by value' versus 'pass by reference' issues with subroutine calls. When you have a binding language all subroutine calls should implicitly be pass by reference, since that's what passing a binding around is. (You can create a perverse binding language where subroutine calls make a copy of the objects the arguments point to and then pass in bindings to the copies. But you can make all sorts of perverse languages.)

Python is a 'context-free' binding language, where a variable's binding does not depend on its context. There are binding languages where this is not the case, so for example foo is one thing when taken as a function and another thing when taken as a variable; I believe Common Lisp is one big example. I personally prefer the Python way, because it's simpler, more regular, and easier to understand.

(For one technical discussion of the Lisp issues, see here.)

There are probably hybrid storage/binding languages out there. I think the easiest sort to construct is a statically typed language where 'primitive' types use a storage model and everything else uses a binding model. (To some extent this happens under the hood in most high performance binding language implementations.)

(Trivia: this entry is the 'another blog entry' from my comment on this entry. Sometimes my entry-writing mills grind very, very slowly.)

Sidebar: explanations of Python's object model

If you want to read explanations of Python's object model itself, try Python Objects, How to think like a Pythonista, or this discussion.

Comments on this page:

By DanielMartin at 2006-04-21 22:57:13:

Oh, man, where do I begin? Unfortunately, I'll have to cut this short as I'd like some sleep.

At the most charitable interpretation, you appear to have taken a standard CS term - binding - and given it an almost completely (but not quite completely) different meaning. Then, you've mixed that with a difference between Perl and Python that is much subtler than you make it out to be.

First off, binding. I'll let the wikipedia article speak to this: http://en.wikipedia.org/wiki/Name_binding

Now the difference between python and perl: basically, we're talking here about what value objects the language has, and perl has a (just barely) larger repetoire of interesting value objects, whereas python keeps all its interesting objects as reference objects.

First off, both languages have this behavior; that is, variables can be rebound at any time:

 a = b = {}
 a = 1
 # b does not now refer to "1", but to an empty hash

They also have this behavior; that is, what we normally think of as "changing" numbers is nothing of the sort, but changing numeric placeholders:

 a = b = 5
 a += 1
 # a and b are now different

And strings too:

 a = b = 'g'
 a += 'g'

Also, for complicated objects, both languages give the same semantics to something like this:

 a = b = FTP('ftp.python.org')
 # both a and b now refer to a logged-in ftp
 # session (in fact, to the same logged-in session)

In fact, you only get to a difference when you begin to talk about dictionaries/hashes and lists/arrays. Now, these happen to be amazingly useful structures, so they come up often, but here's the deal: idiomatic perl treats arrays and hashes with the same semantics accorded by both perl and python to strings and integers, whereas idiomatic python treats lists and dictionaries with the same semantics accorded to arbitrary user-defined objects. ("same" is a slight bit of a stretch in perl because you actually have to decorate your variable with different prefixes to tell if you're referencing the array slot, the hash slot, or the "scalar" slot, but the a = b = c case shows the same semantics)

This makes python's behavior more uniform and should make it potentially easier to understand, though many people coming from languages where strings are just a special kind of list or array (C, Haskell, OCaml, J, etc.) will find it a bit odd that lists and strings are treated so differently from other lists. Also, many beginning programs do not involve objects more complex than strings, hashes, and lists. However, my point is that this difference between python and perl is simply a matter of where one draws the line between what things have value semantics and what things have reference semantics, and not the kind of fundamental language difference you make it out to be.

Closely related to this whole mess is the concept of immutability. Consider:

 >>> a = b = ('g',)
 >>> a += ('t',)
 >>> a
 ('g', 't')
 >>> b
 >>> a = b = ['g']
 >>> a += ['t']
 >>> a
 ['g', 't']
 >>> b
 ['g', 't']

Now why should that be? The standard explanation is "because tuples are immutable". That is, an immutable object has to have "value" semantics, rather than "reference" semantics.

Also, as further evidence that this ties into immutability, consider that Haskell doesn't have even the slightest variant of this issue; everything always has value semantics, because everything is immutable. (Also, rebinding is prohibited, so you have to use new variable names or a new scope to refer to the result of an operation)

(I'm reminded here of something of a computing urban legend I heard once; I don't know whether it's true, but supposedly in early Fortran compilers it was possible to change the values of numbers by reference, so that you could, for example, set "4" to have the value "10" if you wanted. Apparently this was also something it was quite possible to do by accidentally getting some things in the wrong columns in the first few lines of your program, causing the predictable bughunting nastiness)

By cks at 2006-04-21 23:24:02:

In Python, everything has reference (aka binding) semantics, even primitive numbers. It looks like storage semantics only because you cannot mutate a number object in place, so even 'a += 10' rebinds a. (You can see this clearly with id().)

Python's += is an odd case, because it rebinds only sometimes. (Whenever the object doesn't have inplace operation methods, which covers all of the built in immutable types, any classes that didn't bother to implement it or didn't want in place mutation, or etc.)

Perl does not behave at all like this. Consider:

@a = @b = ();
$a[0] = 10;

$b[0] is undef, not 10. This is storage semantics; there is an an 'array' storage area for @a and @b, and they are distinct, and all the first line did was store an empty array in each of them.

By cks at 2006-04-22 00:39:09:

(A paranthetical 'honesty in blog operation' note: I used magic site owner power to edit Daniel Martin's comment to split his comment in the FTP() example into two lines. (Whether or not noting this sort of comment edits explicitly is neurotic is a topic for another entry or something.))

From at 2006-05-08 20:28:33:

-- RE:"but the a = b = c case shows the same semantics".

-- a or b or c are an alphabetical variant of identity ! *************************************************************** ===============================================================


((((a = b_) = c ) v ( a = (b_ = c_)))) v ((a = c) = b_) <= x = x


================================================================ ****************************************************************

By cks at 2006-05-10 15:46:29:, I think you're talking about something entirely different from what this entry is talking about. (I'm not sure what you're talking about, to be honest.)

Written on 17 April 2006.
« Weekly spam summary on April 15th, 2006
How Solaris matches names in NFS exports »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Apr 17 00:48:49 2006
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.