The dynamic linking tax on
Most people will tell you that dynamic linking is an unalloyed good, or at least that any effects on performance are small (for simple programs that don't cascade to a huge set of shared libraries). This isn't necessarily so.
A long time ago, back when dynamic linking was new and people were
suspicious of it, I conducted some timings of how dynamic linking
affected the speed of
fork(). To my surprise, the impact was
significant, and on the hardware of the day it was actually worth
static-linking my shell.
Today, I dug up my test program from 1991 (which just repeatedly fork()s, has the child immediately exit(), and the parent wait()s for the child), and measured how much slower the dynamically linked version ran on various systems that I have convenient access to. The results are:
- Solaris 9 on an Ultra 10: 2.82 times worse
- FreeBSD 4.10 on an SMP Pentium II: 2.12 times worse.
- FreeBSD 3.4 on an SMP Pentium III: 2.48 times worse.
Linux needs a table:
|Kernel||CPU||How much worse|
|22.214.171.124||SMP Pentium III||2.45 times|
|RHEL 4 2.6.9-derived||64-bit SMP Opteron||2.72 times|
|FC4 2.6.13-derived||Pentium 4||1.70 times|
|2.4.32-rc1||Pentium III||1.84 times|
As an experiment, I statically linked the program with dietlibc as well as glibc on the two 126.96.36.199 machines. On the Athlon the dietlibc version was 7.7% faster, on the SMP P3 it was 10% faster. (The ratios in the table are against the static glibc version.)
I'm surprised that the SMP machines didn't pay a worse penalty than uniprocessor machines. It's annoying that Linux still has about the same penalty as Solaris, but on the other hand Linux forks pretty fast to start with; it's in the hundred microsecond range even for dynamically linked code.
One anomaly is that odd things seem to start happening on the FreeBSD 4.10 machine as the number of forks gets higher and higher; the execution times don't scale the way they should. (It's possible that some sort of PID or resource wrapping issue is responsible.)