My understanding of modern C undefined behavior and its effects

August 15, 2013

Back in the old days, it was famously said that using undefined behavior in your C program gave the compiler license to delete all of your files if it felt like it. When people heard that we laughed, nodded sagely, and went cheerfully on our way because of course no actual compiler was ever going to react to undefined behavior in that way and everyone knew it. (The closest real compilers ever came to that was how early versions of GCC reacted to #pragma.)

This left a whole generation of programmers with the attitude that C's large collection of undefined and implementation defined behavior was no big deal. Different CPUs or compilers might behave differently but the whole result would be fundamentally sane and often even predictable in advance (given knowledge of CPU behavior).

In the modern world, as John Regehr has taught me, this is both wrong and dangerous. Modern compilers do not delete your files or launch ICBMs when they encounter undefined behavior, because that would still be very stupid. Instead they do something much more dangerous: modern compilers will assume that undefined behavior can't happen. This knowledge that certain things can't happen is then used in optimization; for example, the compiler may deduce things about variable values which then gets fed through into dead code elimination and pretty soon you are removing a security check because the compiler knows it can 'never' trigger (in proper code).

(That led to a cute Linux kernel security vulnerability, by the way.)

The practical upshot is that it is now basically impossible to reason about how a chunk of code will behave in the face of undefined behavior and anyways, it changes. To even start requires a thorough understanding of modern compiler optimizations and a ruthlessly objective skeptic's eye so that you can see what the code actually says, not what you think it does. Only then are you in a position to start following the implications of, say, dereferencing a structure pointer as part of local variable initialization before you explicitly check said pointer to see if it's NULL.

Or in short modern C compilers do terrifying things with undefined behavior.

PS: I recommend you read John Regehr's blog. It's hair-raising.

(This was inspired by C J Silverio pointing to this HN comment.)

Written on 15 August 2013.
« The pragmatics of an HTTP to HTTPS transition
Funding and the size of hardware you want to buy »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Aug 15 02:00:56 2013
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.