C's main()
is one of the places where Unix's user and kernel APIs differ
Modern Unixes often like to draw a legalistic distinction between
the API provided to user space by the kernel and the Unix API
provided to programs by the 'standard library', by which they mean
the standard C library. Some people, me included, don't entirely
like this (I've written about whether the C runtime and library
is a legitimate part of the Unix API). However,
regardless of what I might think about it, Unix has long had at
least one place where there was a real difference between the normal
API that everyone used and the API that the kernel actually
implemented. I'm talking about the traditional C style main()
entry point that starts your program.
Everyone knows the basic form of main()
, with argc
and argv
;
you're called with a count of the arguments and an array of strings.
In slightly more advanced usage there is a third argument, envp
,
an array of environment variables. This format is very old in Unix.
The two argument version of main()
goes back to at least Research
Unix V4's exec(2)
, while
the three argument form with environment variables seems to appear
in V7's exec(2)
.
However, this is not the actual program entry point that the V7 Unix kernel used when
starting your program, and the actual entry point had a somewhat
different API than main()
. Conventionally, V7 C programs actually
started at an assembly symbol called start
; the simplest version
of the assembly code involved is in crt0.s
and it clearly does a certain amount of setup work. There are other
versions of this startup in /usr/src/libc/csu that
do various amounts of more work, such as arranging to profile your
program.
(Research Unix V6 also had a crt0.s
,
but it's rather different; I think there are no loops, for example.
If I understood PDP-11 assembly language I might have a better idea of
what it was actually doing.)
In V7, the differences between the user API for main()
and the
kernel API are not huge. In current Unixes, there's often rather
more going on, especially once you include dynamic loaders and
things like the 'auxiliary vector'
present in some Unixes. I suspect that the simplest version of a
modern one to look at is musl libc for
Linux, where crt1.c
and the main libc bootstrap functions are
relatively straightforward.
(Some of the code is because the C runtime environment needs to be
set up (and yes, modern C has a runtime), but a certain amount of
it is converting between how the kernel involves programs and how
main()
wants to be invoked. For example, notice how musl libc's
main start function isn't called with argc
as an explicit argument;
instead it retrieves argc
from memory.)
Sidebar: The interesting V7 trick with data address 0
At the end of every version of V7's crt0.s is a little bit that initially puzzled me:
.data .=.+2 / loc 0 for I/D; null ptr points here.
What this is doing is that it's reserving two bytes of space at the start of the data section. V7 Unix ran on PDP-11's that supported split instruction and data address space, so the data section starts at (data) address 0. Reserving two bytes at the start insures that no variable or other thing in the data section can be located at address 0 and so C NULL is always distinct from valid pointers.
|
|