Why languages like 'declare before use' for variables and functions

August 17, 2015

I've been reading my way through Lisp as the Maxwell's equations of software and ran into this 'problems for the author' note:

As a general point about programming language design it seems like it would often be helpful to be able to define procedures in terms of other procedures which have not yet been defined. Which languages make this possible, and which do not? What advantages does it bring for a programming language to be able to do this? Are there any disadvantages?

(I'm going to take 'defined' here as actually meaning 'declared'.)

To people with certain backgrounds (myself included), this question has a fairly straightforward set of answers. So here's my version of why many languages require you to declare things before you use them. We'll come at it from the other side, by asking what your language can't do if it allows you to use things before declaring them.

(As a digression, we're going to assume that we have what I'll call an unambiguous language, one where you don't need to know what things are declared as in order to know what a bit of code actually means. Not all languages are unambiguous; for example C is not (also). If you have an ambiguous language, it absolutely requires 'declare before use' because you can't understand things otherwise.)

To start off, you lose the ability to report a bunch of errors at the time you're looking at a piece of code. Consider:

lvar = ....
res = thang(a, b, lver, 0)

In basically all languages, we can't report the lver for lvar typo (we have to assume that lver is an unknown global variable), we don't know if thang is being called with the right number of arguments, and we don't even know if thang is a function instead of, say, a global variable. Or if it even exists; maybe it's a typo for thing. We can only find these things out when all valid identifiers must have been declared; in fully dynamic languages like Lisp and Python, that's 'at the moment where we reach this line of code during execution'. In other languages we might be able to emit error messages only at the end of compiling the source file, or even when we try to build the final program and find missing or wrong-typed symbols.

In languages with typed variables and arguments, we don't know if the arguments to thang() are the right types and if thang() returns a type that is compatible with res. Again we'll only be able to tell when we have all identifiers available. If we want to do this checking before runtime, the compiler (or linker) will have to keep track of the information involved for all of these pending checks so that it can check things and report errors once thang() is defined.

Some typed languages have features for what is called 'implicit typing', where you don't have to explicitly declare the types of some things if the language can deduce them from context. We've been assuming that res is pre-declared as some type, but in an implicit typing language you could write something like:

res := thang(a, b, lver, 0)
res = res + 20

At this point, if thang() is undeclared, the type of res is also unknown. This will ripple through to any code that uses res, for example the following line here; is that line valid, or is res perhaps a complex structure that can in no way have 10 added to it? We can't tell until later, perhaps much later.

In a language with typed variables and implicit conversions between some types, we don't know what type conversions we might need in either the call (to convert some of the arguments) or the return (to convert thang()'s result into res's type). Note that in particular we may not know what type the constant 0 is. Even languages without implicit type conversions often treat constants as being implicitly converted into whatever concrete numeric type they need to be in any particular context. In other words, thang()'s last argument might be a float, a double, a 64-bit unsigned integer, a 32-bit signed integer, or whatever, and the language will convert the 0 to it. But it can only know what conversion to do once thang() is declared and the types of its arguments are known.

This means that a language with any implicit conversions at all (even for constants like 0) can't actually generate machine code for this section until thang() is declared even under the best of circumstances. However, life is usually much worse for code generation than this. For a start, most modern architectures pass and return floating point values in different ways than integer values, and they may pass and return more complex values in a third way. Since we don't know what type thang() returns (and we may not know what types the arguments are either, cf lver), we basically can't generate any concrete machine code for this function call at the time we parse it even without implicit conversions. The best we can do is generate something extremely abstract with lots of blanks to be filled in later and then sit on it until we know more about thang(), lver, and so on.

(And implicit typing for res will probably force a ripple effect of abstraction on code generation for the rest of the function, if it doesn't prevent it entirely.)

This 'extremely abstract' code generation is in fact what things like Python bytecode are. Unless the bytecode generator can prove certain things about the source code it's processing, what you get is quite generic and thus slow (because it must defer a lot of these decisions to runtime, along with checks like 'do we have the right number of arguments').

So far we've been talking about thang() as a simple function call. But there are a bunch of more complicated cases, like:

res = obj.method(a, b, lver, 0)
res2 = obj1 + obj2

Here we have method calls and operator overloading. If obj, obj1, and/or obj2 are undeclared or untyped at this point, we don't know if these operations are valid (the actual obj might not have a method() method) or what concrete code to generate. We need to generate either abstract code with blanks to be filled in later or code that will do all of the work at runtime via some sort of introspection (or both, cf Python bytecode).

All of this prepares us to answer the question about what sort of languages require 'declare before use': languages that want to do good error reporting or (immediately) compile to machine code or both without large amounts of heartburn. As a pragmatic matter, most statically typed languages require declare before use because it's simpler; such languages either want to generate high quality machine code or at least have up-front assurances about type correctness, so they basically fall into one or both of those categories.

(You can technically have a statically typed language with up-front assurances about type correctness but without declare before use; the compiler just has to do a lot more work and it may well wind up emitting a pile of errors at the end of compilation when it can say for sure that lver isn't defined and you're calling thang() with the wrong number and type of arguments and so on. In practice language designers basically don't do that to compiler writers.)

Conversely, dynamic languages without static typing generally don't require declare before use. Often the language is so dynamic that there is no point. Carefully checking the call to thang() at the time we encounter it in the source code is not entirely useful if the thang function can be completely redefined (or deleted) by the time that code gets run, which is the case in languages like Lisp and Python.

(In fact, given that thang can be redefined by the time the code is executed we can't even really error out if the arguments are wrong at the time when we first see the code. Such a thing would be perfectly legal Python, for example, although you really shouldn't do that.)


Comments on this page:

This is something I feel fairly strongly about.

The absolute minimum standard for “serious Perl programmer” is to start off any program with use strict (or, since 5.12, use 5.012 or higher, which turns on strictures implicitly).

It catches a large class of mistakes that are trivial to make but may be decidedly less trivial to spot, simply because using the wrong variable often only makes the code work wrong – there is no hard failure. So a simple typo ends up manifesting as a (sometimes bewildering) logic bug.

Mistype one character, end up chasing a “bug” for an hour, then groan and smack your forehead when you finally cotton on: that’s just a terrible way of working.

So I’ve grown to resent languages that mandatorily permit use before declaration. There can be mitigating factors in the design that make it not that bad (e.g. sigils for scoping in Ruby seem to contain the worst of the problem), but I’ve never felt any language to be better for it.

Partly that’s because I’ve seen it done so well in Perl, which has the lowest-friction declaration construct that I’ve seen anywhere. Not only is the declaration keyword the shortest I know of (“my”), it also doesn’t need to be a statement (so you can declare variables on first use, right in the middle of an expression, if that’s most convenient). I have some sympathy when people complain about the need to declare things in languages that make it cumbersome, even just by rules like C89’s top-of-scope requirement.

But even when done badly it’s generally better than the alternative.

N.B.: there are languages that cannot require declaration before use, e.g. shell and some Lisps, because they have nothing but dynamic scoping. But that is in itself a wart. In the case of shell it just happens to be an inevitable one.

By Jeremy at 2015-08-17 15:00:29:

I'm a bit confused. You say that "most statically typed languages require declare before use because it's simpler", and yet the following go program:

package main
import "fmt"
func main() {
    stuff(0x100)
}

func stuff(num uint8) {
    fmt.Printf("you passed: %v\n", num)
}

... produces the following sane error message:

./declare.go:6: constant 256 overflows uint8

... while passing a valid uint8 works as expected. But one of the points of go was to optimize compilation times while saving developers from having to maintain header files (or otherwise maintain function prototypes).

Is using an AST and more than one compiler pass really that bad? More likely I'm not understanding your definition of "declare before use"...

By cks at 2015-08-18 00:02:48:

That's interesting; I hadn't realized (or perhaps noticed) that Go was that nice. As for how bad a multi-pass approach is, it depends on your view. Go demonstrates that it certainly can be done, but you wind up with additional complications. For instance, you have to hold on to all of the function ASTs in some way (usually in memory) until the end of the compilation unit when all names must be resolved. You're also generally going to want to buffer up errors to the end instead of emitting some of them when they're detected, so you can emit them in line order.

(I suppose modern compilers are already holding some function ASTs in memory so they can inline them.)

It's possible that I'm out of touch with modern compiled languages and they now frequently support 'use before declare' through more sophisticated compilers (ala Go). I haven't looked to see if things like the .NET languages, Swift, Scala, Rust, and so on are 'define before use' or not. Being able to use things before they're defined is certainly more convenient for the programmer, since you can order your code in the source files however makes sense.

(It can also make it easier to declare certain sorts of intertwined data structures.)

By Jeremy at 2015-08-18 09:40:19:

It's also impossible to write mutually recursive functions without declare-before-use. I'd be surprised if any of those languages didn't support mutual recursion with plain functions/methods and I don't think they support function prototypes (except through adherence to an interface).

Speaking from experience, java certainly doesn't require declare-before-use, so I assume the same about scala, and I don't remember having to worry about the issue when I last worked in c#.

Rust: http://is.gd/0T8ryu Produces a warning message, which seems less clear than go's, but still accurate. But it also casts and throws away data and still executes. Just running in the web playground, I'm not sure at what stage (compilation vs runtime) the warning is actually detected, but I'm assuming at compilation.

In the case of Common Lisp, you have a choice on most of these things. It doesn't require declaration before use, but it allows it. If you declare a function's ftype prior to use, the compiler can make use of that information to generate static errors/warnings, to optimize compilation, etc.

Written on 17 August 2015.
« My irritation with Intel's CPU segmentation (and why it probably exists)
Getting dd's skip and seek straight once and for all »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Aug 17 01:03:28 2015
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.