How x86 Linux executes ELF programs

December 19, 2007

Yesterday I said that the kernel directly executes programs in place. Because I feel like walking through the details, here is what the kernel does to start ELF programs on x86 Linux; for simplicity, I'm going to talk about 32-bit programs.

  • First, the kernel maps the program's text, data, and BSS into memory. Almost all programs require these to be mapped at fixed addresses starting from 0x08048000 (128 Mb) and going on up.

  • if the program is dynamically linked, the kernel also maps the dynamic linker's text, data, and BSS into memory. Dynamic linkers are generally willing to be loaded anywhere in memory, so they get wedged into the first spot the kernel considers available.

    (ELF executables specify the full path of their dynamic linker, which is confusingly called the 'ELF interpreter' in various places.)

  • the kernel sticks an 'auxiliary table' of various information on the top of the stack.
  • the environment and the arguments are copied into the stack.

If the program is statically linked, the kernel sets the user-level program counter for the process to the start address in the program's ELF header, which is somewhere after 0x08048000. When the kernel returns back to user space, the program will wind up running directly.

(What the start address is depends on how much stuff has to go at the start of the program's text area, so it varies from program to program.)

If the program is dynamically linked, the kernel instead sets the program counter to the start address of the dynamic linker, and the process will start running the dynamic linker's code directly. The dynamic linker uses information in the auxiliary table to find the real program's code and data, and eventually start it.

(From this we can see how calling the dynamic linker an 'interpreter' is a misnomer; it works nothing like an interpreter for a script, although it is a regular ELF executable.)

Technically, you could make dynamically linked ELF executables that contained no actual machine code but instead had a 'dynamic linker' that actually was an interpreter. However, this would be tricky to pull off, because dynamic linkers cannot themselves be dynamically linked, so your interpreter would need either to not use any shared libraries (including the normal C and Unix runtime) or to bootstrap the regular dynamic linker somehow.

Written on 19 December 2007.
« What is a script language on Unix
Why setuid scripts are fundamentally a bad idea »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Dec 19 00:53:10 2007
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.