Arranging scopes for for loops

October 10, 2011

Suppose that you have a language where for loop index variables are local to the for loop; they aren't visible outside it. What I'll call the 'for loop closure problem' is that in a straightforward implementation of their lifetime, the following pseudo-code doesn't work the way people think it will:

nums := [];
for (var i = 0; i < 100; i++)
  nums.append(func(x) {i+x});

In straightforward scope and variable lifetime semantics, every lambda returns the same result because each lambda captured the same i (and after the loop finishes, the value of i is 100). When I was discussing what Python closures close over, I mentioned that you could fix this with careful use of how scopes are defined.

First, let's rewrite this for loop to show the scope explicitly:

{var i;
 for (i = 0; i < 100; i++)
   nums.append(func(x) {i+x});
}

If we want each iteration of the loop (and thus each lambda) to get a different version of i, we need to explicitly create a scope for each of them. However, we need all of the following to be true:

  • the for loop condition has a single sustained scope, in which its version of i is the same over all iterations of the loop. This creates an outer version of i.
  • the for loop body has a per-iteration scope which contains an inner version of i.
  • the inner version of i is initialized from the outer version of i.

Of these the only tricky requirement is the third one.

You could fix this with specific rules for scopes in for loops, but as it happens we don't have to bother with this; the language already has a way of expressing this outcome within the existing scoping rules. We just say that the loop body acts as if it was an anonymous function (well, closure) that is invoked on each loop iteration. This effectively rewrites the loop as:

{var i;
 for (i = 0; i < 100; i++)
   func(i) {
     nums.append(func(x) {i+x});
   }(i);
}

Anonymous functions already create new scopes when invoked and function calls are the general method for passing variables into a scope, so we're done.

(Various sorts of optimizations can and generally are applied in languages that define for loops this way, because you really don't want to be creating a call frame for every loop iteration.)

I believe that doing this sort of conceptual rewriting is especially popular in languages (like many versions of Lisp) that consider lambdas and closures to be fundamental building blocks of the language. In these languages, rewriting loops as invocation of anonymous functions makes a lot of sense even if you don't care about the scoping rules. The extreme version is to not even have loop iteration as such in the language's core; instead you transform loops into recursive function calls. (Full tail call optimization is required for obvious reasons.)


Comments on this page:

From 78.35.25.18 at 2011-10-10 07:32:48:

As a side note, Perl has block scope, it sets up a new scope each time it enters a block, and it treats en-passant declarations inside control flow statements as being declared inside the following block. So it works the way you describe by default.

Aristotle Pagaltzis

By cks at 2011-10-11 11:27:55:

As I see things, Perl can't work the way you describe it. If the scope of an en-passant declaration inside control flow statements is really in the scope of the loop's block, it would not be visible for the control flow operations themselves. You would have something that looks like:

for (i = 0; i < 100; i++) {
  var i;
  nums.append(func(x) {i+x});
}

Here i is clearly out of scope for 'i < 100' and 'i++', since both are outside of the loop body block where i is declared. Perl can only work if it magically allows the for loop to access the inner block variables and in fact do so even before the inner block's scope has been created, and also persists the loop body scope so that despite the loop body being repeatedly entered the same scope is reused each time.

(If it does not persist the loop body scope then something even weirder is going on, because it is clearly persisting the value of i despite making its scope go away and reappear each time.)

From 78.35.25.22 at 2011-10-11 16:40:03:

By en passant I am referring to the case of

   for my $i ( 0 .. 99 ) { … }

or

   while ( my $line = readline $fh ) { … }

or even

   if ( my $i = foo() ) { … }

In each case $i is declared only for the duration of the block, but it is also re-bound every time the block is entered. In other words,

   my @num;
   for my $i ( 0 .. 99 ) {
       push @num, sub { $i + shift };
   }

will do the expected thing without any further gymnastics.

In fact, because Perl has block scope and because declarations take effect only after the statement in which they are declared, you can even reconstruct this effect fully manually, as a thought exercise:

   my @num;
   {
       my $i = 0;
       {
           my $i = $i++; # pass the baton
           push @num, sub { $i + shift };
           redo if $i < 99;
       }
   }
   # $i is undeclared here

In the baton line, $i++, since it is not a declaration, refers to the $i declared in the outer block. At the end of the statement, the my $i declaration takes effect, so the following mentions of $i refer to the newly bound variable. And every time redo re-enters the block, this happens again.

In general, the fact that Perl has block scope makes many things Just Work that other languages make you work for. I am puzzled that no other language I know of gets it so right, because it seems a fairly trivial and fairly obvious design choice that yields big and obvious irritations when done wrong.

Aristotle Pagaltzis

Written on 10 October 2011.
« My view of the state of graphics cards for Linux (in fall 2011)
Why people are attracted to minimal language cores »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Oct 10 00:30:35 2011
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.