Something C programmers should not write in Python code

September 29, 2005

In C, a common idiom for initializing multiple variables to the same value is serial assignment:

a = b = c = 0;

Python allows for the same thing; just remove the semicolon in the above and it's valid Python syntax, and it even works. So when I was starting Python, I wrote things like that. And then one day I wrote almost the same thing, in an __init__ function in a class:

self.fooDict = self.barDict = {}

The resulting debugging experience was very educational.

A C programmer expects multiple assignment to work using what I'll call 'value semantics', where the result of an assignment is a value and that value is then copied into the next variable. Thus, I'd (subconsciously) expected the Python assignment to be the same as:

self.fooDict = {}
self.barDict = {}

However, Python operates using reference semantics; the result of an assignment is merely another reference to what was assigned, so the next variable just gets another reference to the same thing. In other words, what I was actually getting was the same as:

self.barDict = {}
self.fooDict = self.barDict

Since my code required the two dictionaries to be distinct, and used the same keys in both, the result didn't work too well. (It also caused me to spend some time trying to hunt down where various strange entries were getting added to each dictionary before the penny dropped.)

I was led down this garden path partly because this does 'work' in the case of multiple assignment of a lot of simple Python things (including numbers). This is because they're immutable. If the object one variable points at can't change, it doesn't matter how many other people also point at it; the value can't change out from underneath you.


Comments on this page:

From 63.203.73.35 at 2005-09-29 01:51:47:

I'm not sure that I buy that this is Python vs. C. C gives you value semantics for value types, but usually for interesting things (containers, dictionaries, and the like) you'll be using pointers, and if in C you said

a = b = createDictionary();

you'd have the same bug.

I think in practice people only ever use this in C for numerical constants and NULL. I think if anything what this gets at is the vast difference between NULL and an empty dictionary, a vast difference that LISP obfuscated long ago by making the empty list "nil" be used as "false".

Of course, LISP lists are normally used in a side-effect free fashion, and in particular, an empty list can never become a longer list via side-effects.

(I wanted to put a declaration for a and b as pointers to 'Dict' above, but I can't figure out how to escape asterisk so it doesn't emphasize; that is, 'Dict *a, b' is wrong, but ! doesn't seem to escape asterisk, e.g. 'Dict !a, !*b;' isn't right either, and in fact looks suspiciously buggy, since the / seems to come from nowhere. Of course, I realize there are things like pre-formatted mode and such, but I just wanted to do the most convenient thing and was surprised to discover how broken it was.)

From 63.203.73.35 at 2005-09-29 01:52:18:

Oops, forgot to sign that.

-- nothings

By cks at 2005-09-29 03:19:22:

I think that C makes the mistake very clear, because it makes the pointers explicit. With explicit pointers and value semantics, you're clearly assigning the same pointer value to both a and b and it's immediately apparent that you're getting two pointers to the same thing.

(And in part I think this comes down to the difference of what names mean in Python and in C, which is another blog entry.)

I think the intuition for C is that multiple assignment is useful for 'primitive types', things the language has direct syntax for. In C this is more or less numbers and explicit strings (which are often immutable these days). Python just happens to have a larger set of primitive types that include mutable lists and dictionaries.

As for getting a literal '*' character into the text: it depends on whether you want it plain or in typewriter text. The former is written '[[*|]]', the latter '((*))'. Unfortunately, as they could say, 'all the good escape characters are taken'.

(See DWikiText for the gory details. There's a link to it around the bottom of the 'Write comments' page, but it's probably not too clear.)

I usually just write anything over a snatch of code as a preformatted block, usually 'quoted' to indent it.

From 192.88.60.254 at 2005-09-29 10:15:47:

Actually, that particular mistake looks more like evidence of someone who's been practicing Perl than C. Perl has two interesting structures - arrays and hashes - for which something like that is perfectly legit.

That is, the following is valid perl and doesn't result in aliasing things, but results in three distinct arrays:

@a = @b = @c = ();

Likewise, for hashes; this results in %a, %b, and %c pointing to three separate blank hashes:

%a = %b = %c = ();

(Actually, technically all that those do is blank out the hashes/arrays behind those variables; the hashes or arrays were created when first the variable was encountered, which could be on this statement or could be elsewhere)

Of course, if you're dealing with perl references you get the semantics you talk about in Python; the following assigns $a, $b, and $c as references pointing to the same hash:

$a = $b = $c = {};

Except that then you have to deal with nasty stuff like using $a->{'key'} every time you want to de-reference the hash, instead of the normal hash syntax of $a{'key'}, so people don't use perl references unless there's a good reason to do so.

This "value syntax for primitives, reference syntax for others" is a generic problem not just with multiple assignment in a single statement but with programming in general - as you pointed out, you can easily simulate this with multiple statements. I've seen Java code suffer from similar confusion.

As I think about it, I think that the basic confusion is between objects that can change after assignment because of some action on a different variable assigned to the same object, and objects that cannot. That is, more on the confusion between mutable and immutable than the confusion between "primitive" and "reference type", which is really just an implementation detail. For example, in python (and in Java) strings behave "well" with multiple assignment just as integers do:

a = b = "x"
a += "y"
b += "z"
print "a is '%s'; b is '%s'" % (a,b)

I don't know how to get away from this problem, since it seems that either you need to prevent any aliasing anywhere (do any languages do this? Early Fortran, maybe?) or eliminate mutable objects entirely. (Well, Haskell is kind of fun)

Hrm. Except... it's not really mutable vs immutable because arrays and hashes in perl are mutable, and you can see that by taking a reference to the object, which then allows you to modify the original...

Okay, so I've found something where it would be very useful to have had some formal CS theory background ground into me, only I'm not at all convinced that majoring as a CS major would put me in a position to tease apart these distinctions easily.

From 63.203.72.98 at 2005-09-29 11:41:07:

(See DWikiText for the gory details. There's a link to it around the bottom of the 'Write comments' page, but it's probably not too clear.)

Well, right, I read that and that's where I got the "!-is-the-escape-character" idea from, except it isn't.

I'd suggest that two asterisks in a row should output an asterisk.

-- nothings

By cks at 2005-09-30 01:04:06:

The more I think about it, the more I think what makes the difference is explicit pointers/references. In either C or Perl, you're pretty much guaranteed to be quite aware of using references instead of values, since you have to explicitly dereference them and often explicitly form them. In Python references are ubiquitous and therefor all but invisible. Unless you see it in explicit syntax, I think it's easy to forget. (I suspect that this is the case in most garbage collected languages.)

PS: that's a good idea about two asterisks in a row, nothings; I'll see if I can find the time to put it in.

By cks at 2006-02-08 00:56:27:

In case future readers are confused about how nothings' aside about escaping asterisk looks, it is part of the baroque charm of having a really dynamic website. Namely, the DWikiText to HTML renderer has changed since then. The raw text of his comment looks like:

[...] that is, 'Dict *a, *b' is wrong, but ! doesn't seem to escape asterisk, e.g. 'Dict !*a, !*b;' isn't right either [...]

When he wrote the comment, it came out looking like this:

[...] that is, 'Dict a, b' is wrong, but ! doesn't seem to escape asterisk, e.g. 'Dict !a, !b;' isn't right either [...]

(Well, like that unless the renderer drifts again.)

I ultimately decided not to have !* escape * because I decided that I didn't like how things like 'this is important!' looked in source; you would have had to double the ! or otherwise escape it. About when I started thinking about grep'ing the entire page tree to find any existing instances, I decided it was a no-go.

Oh well. All the good escape characters remain taken, darnit.

Written on 29 September 2005.
« MSNbot goes crazy with RSS feeds
Pinging weblogs.com in Python »

Page tools: View Source, View Normal.
Search:
Login: Password:

Last modified: Thu Sep 29 00:39:52 2005
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.