2010-12-23
More on the Unix interpreter startup problem
To expand on the previous entry, there's a number of factors that influence how fast Unix script languages and programs written in them can start and whether it matters.
First off, people are generally not going to notice interpreter startup delays today, on normal hardware. Modern hardware is more than fast enough to get script startup time below our perception threshold, and most modern Unix machines are under low load with plenty of free memory (when they aren't, people sort of accept that everything is slow). There was a time in the past when interpreted programs were so slow to start that people noticed, but those days are over unless you have a really heavyweight and slow to start interpreter.
(It's possible that Java still qualifies, but I'm not sure.)
However, programs may well notice this extra delay and be affected by it, and this include shell scripts. How much so depends both on how much slower the interpreter is to start and on how many times the other program runs your interpreted program. If you're writing a program that will get used all over the place in your shell scripts, even a relatively small difference can add up. And this obviously matters if you're in a critical path that needs to run as fast as possible for latency reasons.
Resource constrained environments magnify this interpreter performance difference, possibly drastically. The simplest way to get a resource constrained environment is to have your system be under load, especially significant load; this is the classical 'Slashdot melted down my inefficient blogging system' problem. These days you may also be resource constrained because you're trying to run in a small virtual machine instance.
As commentators on the the previous entry noted, this penalty varies with the specific interpreter and with the size of your program, because the interpreter generally has to load and parse all of your program (and libraries that it depends on) even if it will never run most of that code. Various sorts of lazy loading can help, but language semantics sometimes rule them out (or at least require you to code them explicitly instead of taking the easy, language supported approach). This sort of slow startup isn't exclusive to interpreted programs, but it's generally rare that a relatively small compiled program has to do a lot of hidden initialization.
2010-12-22
The Unix interpreter problem(s)
A while back I mentioned that one reason I'm interested in Go is that it is compiled, not interpreted. Now it's time to elaborate on what I called 'the interpreter issue'.
First off, on Unix there are some things that interpreted languages
simply cannot do (I've mentioned this before).
One potentially significant limitation for an otherwise attractive
high-level language is that you cannot write (script) interpreters in an
interpreted language; the target of a '#! ...' in a shell script must
be directly executable. Also, interpreted programs cannot be setuid.
(You can of course solve at least the first problem with another level of indirection, but.)
For me, the big issue is differences in what I'll call 'system impact' (instead of just 'performance'). A program written in an interpreted language necessarily needs to load and start the language interpreter itself before it can do anything. This costs some amount of time and some amount of memory; how much time and memory depends on the interpreter. If the program runs for a long time and uses a lot of memory itself, the interpreter's overhead is unlikely to matter. If the program is small, runs only briefly, and you want it to work fast even when the system is under significant memory pressure and load, this may matter a lot.
A compiled program is not necessarily better in this respect; the runtime environment required by an odd language (especially a high level one) may take just as much time and memory to start up as an interpreter does. But the odds are generally better that you will be able to write a fast starting program that uses only a bit of memory in a compiled language than in an interpreted one.
(In fact, these days even the standard C runtime can do quite a lot before your program gets control. But at least you have alternatives in things like dietlibc.)
Whether this matters to you depends on what sort of program you're writing, what sort of system it will run on, and what sort of situations you're going to run it in. The discussion of this deserves a separate entry; the short summary is that I don't think it's an issue any more for programs that people will run directly themselves under ordinary circumstances.