Why languages like 'declare before use' for variables and functions
I've been reading my way through Lisp as the Maxwell's equations of software and ran into this 'problems for the author' note:
As a general point about programming language design it seems like it would often be helpful to be able to define procedures in terms of other procedures which have not yet been defined. Which languages make this possible, and which do not? What advantages does it bring for a programming language to be able to do this? Are there any disadvantages?
(I'm going to take 'defined' here as actually meaning 'declared'.)
To people with certain backgrounds (myself included), this question has a fairly straightforward set of answers. So here's my version of why many languages require you to declare things before you use them. We'll come at it from the other side, by asking what your language can't do if it allows you to use things before declaring them.
(As a digression, we're going to assume that we have what I'll call an unambiguous language, one where you don't need to know what things are declared as in order to know what a bit of code actually means. Not all languages are unambiguous; for example C is not (also). If you have an ambiguous language, it absolutely requires 'declare before use' because you can't understand things otherwise.)
To start off, you lose the ability to report a bunch of errors at the time you're looking at a piece of code. Consider:
lvar = .... res = thang(a, b, lver, 0)
In basically all languages, we can't report the lver
for lvar
typo (we have to assume that lver
is an unknown global variable),
we don't know if thang
is being called with the right number of
arguments, and we don't even know if thang
is a function instead
of, say, a global variable. Or if it even exists; maybe it's a typo
for thing
. We can only find these things out when all valid
identifiers must have been declared; in fully dynamic languages
like Lisp and Python, that's 'at the moment where we reach this
line of code during execution'. In other languages we might be able
to emit error messages only at the end of compiling the source file,
or even when we try to build the final program and find missing or
wrong-typed symbols.
In languages with typed variables and arguments, we don't know if
the arguments to thang()
are the right types and if thang()
returns a type that is compatible with res
. Again we'll only be
able to tell when we have all identifiers available. If we want to
do this checking before runtime, the compiler (or linker) will have
to keep track of the information involved for all of these pending
checks so that it can check things and report errors once thang()
is defined.
Some typed languages have features for what is called 'implicit
typing', where you don't have to explicitly declare the types of
some things if the language can deduce them from context. We've
been assuming that res
is pre-declared as some type, but in an
implicit typing language you could write something like:
res := thang(a, b, lver, 0) res = res + 20
At this point, if thang()
is undeclared, the type of res
is also
unknown. This will ripple through to any code that uses res
, for
example the following line here; is that line valid, or is res
perhaps
a complex structure that can in no way have 10 added to it? We can't
tell until later, perhaps much later.
In a language with typed variables and implicit conversions between
some types, we don't know what type conversions we might need in
either the call (to convert some of the arguments) or the return
(to convert thang()
's result into res
's type). Note that in
particular we may not know what type the constant 0
is. Even
languages without implicit type conversions often treat constants
as being implicitly converted into whatever concrete numeric type
they need to be in any particular context. In other words, thang()
's
last argument might be a float, a double, a 64-bit unsigned integer,
a 32-bit signed integer, or whatever, and the language will convert
the 0
to it. But it can only know what conversion to do once
thang()
is declared and the types of its arguments are known.
This means that a language with any implicit conversions at all
(even for constants like 0
) can't actually generate machine code
for this section until thang()
is declared even under the best
of circumstances.
However, life is usually much worse for code generation than this.
For a start, most modern architectures pass and return floating
point values in different ways than integer values, and they may
pass and return more complex values in a third way. Since we don't
know what type thang()
returns (and we may not know what types
the arguments are either, cf lver
), we basically can't generate
any concrete machine code for this function call at the time we
parse it even without implicit conversions. The best we can do is
generate something extremely abstract with lots of blanks to be
filled in later and then sit on it until we know more about
thang()
, lver
, and so on.
(And implicit typing for res
will probably force a ripple effect
of abstraction on code generation for the rest of the function, if
it doesn't prevent it entirely.)
This 'extremely abstract' code generation is in fact what things like Python bytecode are. Unless the bytecode generator can prove certain things about the source code it's processing, what you get is quite generic and thus slow (because it must defer a lot of these decisions to runtime, along with checks like 'do we have the right number of arguments').
So far we've been talking about thang()
as a simple function call.
But there are a bunch of more complicated cases, like:
res = obj.method(a, b, lver, 0) res2 = obj1 + obj2
Here we have method calls and operator overloading. If obj
, obj1
,
and/or obj2
are undeclared or untyped at this point, we don't
know if these operations are valid (the actual obj
might not have
a method()
method) or what concrete code to generate. We need to
generate either abstract code with blanks to be filled in later or
code that will do all of the work at runtime via some sort of
introspection (or both, cf Python bytecode).
All of this prepares us to answer the question about what sort of languages require 'declare before use': languages that want to do good error reporting or (immediately) compile to machine code or both without large amounts of heartburn. As a pragmatic matter, most statically typed languages require declare before use because it's simpler; such languages either want to generate high quality machine code or at least have up-front assurances about type correctness, so they basically fall into one or both of those categories.
(You can technically have a statically typed language with up-front
assurances about type correctness but without declare before use;
the compiler just has to do a lot more work and it may well wind
up emitting a pile of errors at the end of compilation when it can
say for sure that lver
isn't defined and you're calling thang()
with the wrong number and type of arguments and so on. In practice
language designers basically don't do that to compiler writers.)
Conversely, dynamic languages without static typing generally don't
require declare before use. Often the language is so dynamic that
there is no point. Carefully checking the call to thang()
at the
time we encounter it in the source code is not entirely useful if
the thang
function can be completely redefined (or deleted) by
the time that code gets run, which is the case in languages like
Lisp and Python.
(In fact, given that thang
can be redefined by the time the code
is executed we can't even really error out if the arguments are
wrong at the time when we first see the code. Such a thing would
be perfectly legal Python, for example, although you really shouldn't
do that.)
Comments on this page:
|
|