Wandering Thoughts archives

2006-12-29

What can go wrong if your compiler is not thread aware

Courtesy of Pete Zaitcev, here's a great example of what happens when optimizing compilers aren't thread aware.

Start with code of the form:

struct a b;
b.pos = *ppos;
ret = foo(&b, ..., b.pos);

Modern versions of gcc 4 on x86 will optimize the function call into:

ret = foo(&b, ..., *ppos);

(This is a less stupid optimization than it looks; ppos is a function parameter and I believe it's in a register, so it may well be faster to perform an indirect load from it than a computed indirect load off the stack pointer.)

What goes wrong in a multi-threaded environment (in this case, the Linux kernel) is that the value of *ppos can change between the store into b.pos and the function call, and that the foo function expects the two values to be the same. Of course, the authors of the function that this appeared in didn't think that they needed to do any locking, because after all they only dereference *ppos once.

(To deal with one possible code nitpick, the Linux kernel makes liberal use of implicit atomic reads and writes. This probably makes purists cringe, but the odds of a major CPU architecture ever not having large atomic reads and writes are pretty small by now.)

I don't think we can blame either the compiler or the programmers for this. What's really happening is that we've been tripped up because we have different implicit assumptions about the code than the compiler does. And a good part of the reason that these sorts of assumptions stay implicit is that we don't have good tools for making them explicit in non-annoying ways.

(So while we can blithely talk about a 'thread aware compiler', I'm not sure we know what one should actually look like.)

CompilerThreadAwareness written at 23:45:34; Add Comment

2006-12-25

The problem with #ifdef

Something that crystallized in the process of writing the earlier Bourne shell quoting entry is that #ifdef abuse leads to bad results for the same fundamental reason that multiple levels of escaping are bad: after a certain point, people can no longer clearly see what the real code will look like.

When you can't see what the code looks like, it's really hard to make sensible changes; either you make blind stabs or you have to carefully reconstruct the actual code, usually by hand. Either process is error prone, and it's easy to fool yourself, and you are effectively doing remote control programming (with mushy feedback).

Ironically I suspect that the really dangerous #ifdef'd code is not the code that is completely snarled up, but the code that is halfway there. Code that is a complete mess is clearly beyond understanding, but code that is only half-overgrown with #ifdefs tempts you into thinking that you can follow it when you actually can't.

(The worst #ifdef'd code that I've personally encountered and haven't frantically scrubbed out of my brain is the xterm code, which is a shining example of what not to do to make a portable program.)

IfdefProblem written at 23:48:25; Add Comment

2006-12-22

Another example of why Bourne shell quoting makes me grumpy

Following up the previous case, here's something we ran into recently.

Imagine that you want to pull lines out of /etc/shadow for specific login names; you'd use, for example, 'grep "^${UNAME}:" /etc/shadow'. Now imagine that some of your login names include '$', so you want to generate a version of $UNAME with the '$' quoted.

So what do you have to write? It turns out that you need this:

nname=`echo ${UNAME} | sed 's/\\$/\\\\$/'`

You might innocently expect that you can write the sed expression just as you would on the command line, as 'sed 's/\$/\\$/'', but if you try that it doesn't work. This is because backquotes perform one level of de-escaping on their own, so that you can escape backquotes.

(And you have to be able to escape backquotes because otherwise you couldn't nest backquote expansions. Eliminating this problem is why modern versions of the Bourne shell allow you to write $(...) instead; paired delimiters can nest without confusion.)

Any time you have multiple levels of escaping and de-escaping at work, you have entered into a land of pain. People are not good at counting escapes, or at keeping track of what each level of processing will do and what the results will look like, or even at remembering when quoting is and isn't needed. Making them do it anyways results in bugs, voodoo programming (add escapes until the code magically starts working), and often security bugs.

And this is why I get grumpy about any language that requires multiple levels of escaping and de-escaping, the Bourne shell included. (I more or less permanently soured on TCL after similar experiences with an early version, although I've heard that current versions have fixed this.)

BourneQuotingII written at 23:01:30; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.