The periodic strangeness of idiomatic Python

July 29, 2012

Suppose that you want to do something N times, for whatever reason. In C, the straightforward and idiomatic way to do this is a for loop; 'for (i = 0; i < times; i++) { .... }'. Since Python doesn't have this form of a for loop, the Python equivalent is a while loop. However, many people would probably say that this isn't idiomatic Python. What I think of as the idiomatic Python way to do 'do something N times' is:

for _ in range(0, times):
  ....

(Some people will use xrange() instead of range() here.)

This is certainly what instantly popped into my head when I ran into this situation recently and at first I didn't think any more of it. But once I began actually looking at this it started getting stranger and stranger, less like a clear language idiom and much more like a convention. Let me run down a number of the ways that this is strange:

  • It's a rather indirect way of expressing 'do something N times'. The C for loop is pretty direct by contrast.

    (With that said, I'm not sure a while loop would be that much more direct. The directness advantage that C has is that all parts of the for loop's control are there in one chunk; a while loop spreads them out in three different lines.)

  • We're doing things in this odd way partly to use as many builtins as possible, often in the name of (nominal) efficiency. Yes, this avoids a couple of extra lines to initialize and increment an otherwise unused counter, but I don't think that really makes it clearer.
  • In the pursuit of this idiom we're creating a list or at least an iterator and walking it, throwing away the result. In many languages this would be wince-inducingly inefficient (or at least much worse than basic integer arithmetic with a variable). It's a (probable) win in CPython because of the whole builtins vs non-builtins issue.

    (Not only is range() a builtin, but for with iterators has direct bytecode support.)

  • You pretty much need to know this idiom in order to understand this code without a bunch of thought (which is not the case for the C version). A special tricky point is the use of `_' as a special variable name used to indicate 'I don't care about this variable, I just have to have something here'; this is entirely a convention in (some) Python programming circles, with no special meaning in the language itself.

    (As a corollary, I doubt that this is an idiom that would naturally occur to people who are not already immersed in Python.)

  • When using this idiom you'd better remember the exact effects of range()/xrange(), since eg 'range(1, times)' is very much not what you want.

    (Again the C equivalent has this clearly visible.)

The overall summary of this is that the Python idiom really is close to being an idiom, in the literal definition of the word: it is an expression whose meaning is not clearly and immediately understandable from a quick read of its component parts. By contrast the C idiom is much clearer (at least for me).

(I don't think that all of this makes the Python idiom bad; it remains the most compact and probably the most efficient way of expressing this. And even without knowing this idiom off the top of your head I think it's reasonably clear roughly what it does (and it's reasonably easy to work out all of the details).)


Comments on this page:

From 87.194.56.231 at 2012-07-29 03:35:06:

Discussion of the most efficient way to loop N times: http://rhodesmill.org/brandon/2012/counting-without-counting/

Although that is embracing idioms rather than clarity.

David B.

By nothings at 2012-07-29 04:38:22:

I believe _ is a conventional (or even official) "unused" variables in other languages; I believe I've seen it used in ML (or at least Ocaml) for unused elements in pattern matches. (Sometimes you want to discard multiple slots of a matched tuple, which means you use _ multiple times in the same statement; since normally this would mean binding a variable multiple times, I suspect it's actually a language feature, but I'm positive.) And I suspect I've seen it in other languages.

I'm not sure how you can call the Python conventional not idiomatic but consider the C version idiomatic. I don't think the idea of "repeat something 10 times" naturally involving a counter going from 0 to 9, or 1 to 10, is natural or idiomatic at all. (To some extent this is semantics; I certainly do consider the C approach idiomatic, but I think that just means I'm using the word "idiom" in a different way.)

Certainly it is a property of C that idioms are easier to decipher, because the mapping from C operators and statements to underlying behaviors is always so trivial. But I think that's more just something inherent to C vs higher-level languages, as opposed to something specific about Python counting vs C counting.

(Also note that DWiki's inconsistent behavior with underscores versus asterisks and other style markers means that a bare underscore surrounded by whitespace is still treated as starting typewriter text, and despite the explanation in the docs I can't really see any practical reason for the behavior. Also, I could not find any documented way to escape them. Inconsistently, double underscore appears to output as double underscore, not single or empty. I used the ".pn no" system to disable them entirely, but this seems crazy to have to enable/disable rather than having a one-off mechanism; also way too much effort for someone commenting on a blog entry to have to go to. Why not just allow \-escaping (possibly with some other character, given the existing weird use of \).)

From 173.72.85.26 at 2012-07-30 01:53:41:

I have to admit, I don't find the C version that natural either. To get natural you need to go for the logo version:

 repeat 4 [forward 50 right 90]

Also, I think that the idiomatic python version is:

for _ in range(times):
  ....

(Or xrange)

-- DanielMartin

From 87.79.236.202 at 2012-07-30 08:54:49:

Place the _ within a (()) pair.

Aristotle Pagaltzis

From 72.207.242.240 at 2012-08-07 23:32:25:

"_" is idiomatic for a variable you don't care about in Haskell as well.

You pretty much need to know this idiom in order to understand this code without a bunch of thought (which is not the case for the C version).

I don't find this: "for (i = 0; i < times; i++) { .... }" idiomatic at all ... unless you are familiar with a language that writes it's loops that way. Theoretically you can reason through it, but to do that you have to know that the first statement is a declaration executed only once, the 2nd a conditional tested once per loop, and the 3rd a normal statement executed once per loop. Not really any more intuitive than Python's version. I'm with the posters above, unless the code says "repeat" it's always going to be a bit strange. Not that that's a problem.

By cks at 2012-08-22 13:00:49:

My reply about my view on understanding idioms got long enough that I turned it into an entry, IdiomUnderstandability.

Written on 29 July 2012.
« The ecological niches of current open source Unixes
IPv6 is going to be a fruitful source of configuration mistakes »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Jul 29 01:21:56 2012
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.