Why C uninitialized global variables have an initial value of zero

January 17, 2019

In C, uninitialized local variables are undefined but uninitialized global variables (whether static or not) are defined to start out as zero. This difference periodically strikes people as peculiar and you might wonder why C is this way. As it happens, there is a fairly simple answer.

One answer is certainly 'because the ANSI C standard says that global variables behave that way', and in some ways this is the right answer (but we'll get to that). Another answer is 'because C was documented to behave that way in "The C Programming Language" and so ANSI C had no choice but to adopt that behavior'. But the real answer is that C behaves this way because it was the most straightforward way for it to behave in Unix on PDP-11s, which was its original home.

In a straightforward compiled language like the early versions of C, all global variables have a storage location, which is to say that they have a fixed permanent address in memory. This memory comes from the operating system and when operating systems give you memory, they don't give it to you with random contents; for good reasons they have to set it to something and they tend to fill it with zero bytes. Early Unix was no exception, so the memory locations for uninitialized global variables were know to start out as all zero bytes. Hence early K&R C could easily and naturally declare that uninitialized global variables were zero, as they were located in memory that had been zero-filled by the operating system.

(Programs did not explicitly ask Unix for this memory. Instead, executable files simply had a field that said 'I have <X> bytes of bss', and the kernel set things up when it loaded the executable.)

The fly in the ointment for this simple situation is that there are some uncommon architectures where zero-filled memory doesn't give you zero valued variables for all types and instead the 0 value for some types has some of its bits turned on in memory. When this came up, people decided that C meant what it said; uninitialized values of these types were still zero, even though you could no longer implement this with no effort by just putting these variables in zero-filled memory. This is where 'the ANSI C standard says so' is basically the answer, although it is also really the only good answer since any other answer would make the initial value of uninitialized global variables non-portable.

(You can read more careful discussion of this on Wikipedia, and probably in many C FAQs. The comp.lang.c FAQ section 5.17 lists some architectures where null pointers are not all-bits-zero values. I suspect that there have been C compilers on architectures where floating point 0 is not all-bits-zero, although it is in IEEE 754 floating point, which pretty much everyone uses today.)

As a side note, the reason that this logic doesn't work for uninitialized local variables is that in a straightforward C implementation, they go on the stack and the stack is reused. The very first time you use a new section of stack, it's fresh memory from the operating system, so it's been zero-filled for you and your uninitialized local variables are zero, just like globals. But after that the memory has 'random' values left over from its previous use. And for various reasons you can't be sure when a section of the stack is being used for the first time.

(In a modern C environment, even completely untouched sections of the stack may not be zero. For security reasons, they may have been filled with random values or with specific 'poison' ones.)

Comments on this page:

By Icarus Sparry at 2019-01-17 12:46:44:

From the department of obvious comments (at least in hindsight).

When using C for embedded programming it is up to the programmer to ensure the memory is set to zero.

Most C compilers & linkers arrange for values which are not explictly set to zero to be allocated in the bss region (which as Cris notes is not stored), and then get the C startup code to zero this very early on, but some explitly create a memory image with all the zeros filled in.

And it should be noted that a static variable inside a function is essentially a global that's accessible only within that function. But it means that it is also in .bss.

void foo(void)
        static int zero;
        printf("zero = %d\n", zero);
Written on 17 January 2019.
« Perhaps you no longer want to force a server-preferred TLS cipher order on clients
Linux CPU numbers are not necessarily contiguous »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Jan 17 00:42:22 2019
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.