2011-10-11
Why people are attracted to minimal language cores
In the last entry I described how some languages rewrite loops with loop bodies by turning the loop body into an anonymous function that the loop invokes repeatedly. I then mentioned that some languages go all the way to turning loops into tail-recursive function calls. You might wonder why languages do this and why these sort of crazy transformations are considered attractive.
There are at least two reasons that these things are popular, for some definition of popular. First, the intellectual purity of a minimal core language appeals to a certain sort of language wonk; they tend to call the result simpler. These people are often drawn towards Lisp (especially Scheme, perhaps the canonical illustration of this philosophy in action).
(Lisp is not the only language family that has this sort of minimalism. For example, I think that Forth is just as minimal in its own way, although it gets far less language design attention.)
Second, it means that your code generator or interpreter core only needs
to handle a minimal set of things because higher levels have transformed
code written in the general version of the language down into this
minimal form (often automatically). This has traditionally simplified
optimizers; rather than implementing very similar analysis for each of
a whole bunch of control flow constructs, they only have to analyze one
really well. Which control flow construct you pick as your base depends
on what language you're compiling; some languages pick goto
, for
example.
(Then you can get a PhD thesis or two out of how to do this analysis.)
My understanding is that the pragmatic evidence is mixed on whether this is a good idea or not. There have certainly been some significant successes, but I have also heard stories of compilers where the frontend carefully reduced all control flow constructions down to the single fundamental control flow atom, passed the whole thing to the optimizer, and had the optimizer reverse engineer the high level control flow stuff again from the low-level minimized control flow information.
(The argument for still doing this even when it's inefficient is that this lets the optimizer (re)discover the true high level control flows in your program, regardless of how you actually wrote the code. In a sense it's discovering what you actually meant (or at least, what you created) instead of just what you wrote.)