== The dynamic linking tax on _fork()_ Most people will tell you that dynamic linking is an unalloyed good, or at least that any effects on performance are small (for simple programs that don't cascade to a huge set of shared libraries). This isn't necessarily so. A long time ago, back when dynamic linking was new and people were suspicious of it, I conducted some timings of how dynamic linking affected the speed of _fork()_. To my surprise, the impact was significant, and on the hardware of the day it was actually worth static-linking my shell. Today, I dug up my [[test program from 1991 ]] (which just repeatedly fork()s, has the child immediately exit(), and the parent wait()s for the child), and measured how much slower the dynamically linked version ran on various systems that I have convenient access to. The results are: * Solaris 9 on an Ultra 10: 2.82 times worse * FreeBSD 4.10 on an SMP Pentium II: 2.12 times worse. * FreeBSD 3.4 on an SMP Pentium III: 2.48 times worse. Linux needs a table: | Kernel | CPU | How much worse | 2.6.14.4 | Athlon | 3 times | 2.6.14.4 | SMP Pentium III | 2.45 times | {{AB:RHEL:Red Hat Enterprise Linux}} 4 2.6.9-derived | 64-bit SMP Opteron | 2.72 times | {{AB:FC4:Fedora Core 4}} 2.6.13-derived | Pentium 4 | 1.70 times | 2.4.32-rc1 | Pentium III | 1.84 times As an experiment, I statically linked the program with [[dietlibc http://www.fefe.de/dietlibc/]] as well as glibc on the two 2.6.14.4 machines. On the Athlon the [[dietlibc]] version was 7.7% faster, on the SMP P3 it was 10% faster. (The ratios in the table are against the static glibc version.) I'm surprised that the SMP machines didn't pay a worse penalty than uniprocessor machines. It's annoying that Linux still has about the same penalty as Solaris, but on the other hand Linux forks pretty fast to start with; it's in the hundred microsecond range even for dynamically linked code. One anomaly is that odd things seem to start happening on the FreeBSD 4.10 machine as the number of forks gets higher and higher; the execution times don't scale the way they should. (It's possible that some sort of PID or resource wrapping issue is responsible.)