What can go wrong if your compiler is not thread aware

December 29, 2006

Courtesy of Pete Zaitcev, here's a great example of what happens when optimizing compilers aren't thread aware.

Start with code of the form:

struct a b;
b.pos = *ppos;
ret = foo(&b, ..., b.pos);

Modern versions of gcc 4 on x86 will optimize the function call into:

ret = foo(&b, ..., *ppos);

(This is a less stupid optimization than it looks; ppos is a function parameter and I believe it's in a register, so it may well be faster to perform an indirect load from it than a computed indirect load off the stack pointer.)

What goes wrong in a multi-threaded environment (in this case, the Linux kernel) is that the value of *ppos can change between the store into b.pos and the function call, and that the foo function expects the two values to be the same. Of course, the authors of the function that this appeared in didn't think that they needed to do any locking, because after all they only dereference *ppos once.

(To deal with one possible code nitpick, the Linux kernel makes liberal use of implicit atomic reads and writes. This probably makes purists cringe, but the odds of a major CPU architecture ever not having large atomic reads and writes are pretty small by now.)

I don't think we can blame either the compiler or the programmers for this. What's really happening is that we've been tripped up because we have different implicit assumptions about the code than the compiler does. And a good part of the reason that these sorts of assumptions stay implicit is that we don't have good tools for making them explicit in non-annoying ways.

(So while we can blithely talk about a 'thread aware compiler', I'm not sure we know what one should actually look like.)

Comments on this page:

From at 2006-12-31 01:58:43:

I don't think this is a good example. One way to look at it is that Linux intentionally violates its own locking model here. If the access to the offset were guarded with a spinlock, there would be no problem, because lock primitives in the Linux kernel provide necessary barriers (both compiler barriers and CPU barriers). The AIO code is very performance critical, especially in the scalability department, which is why someone ignored locking. If they made the right call, well, everyone is a judge.

But the point is, what kind of thread-aware compiler would help here? If it was something like Java, where you designate objects as shared, then a mandatory locking would've been inserted, which is what the author tried to avoid.

I think a better example might be a case where the code would be ok, but blown up by preemptible kernel (CONFIG_PREEMPT). This is where thread-aware compiler would help.

By the way, non-cache-coherent SMP is just not viable. It pushes your granularity bigger, because the explicit management of cache lines with software tilts your costs that way. I can only remember of Solbourn, and that was long time ago. Perhaps also Be, although I am not sure how that worked. That one had two 601 CPUs which have small on-chip caches, I think.

Hunting bugs in such a system is expensive enough to sink it. Also, it might have been possible to abstract away in the days of BSD 4.3, but when we have true multithreading, the (in-)coherency immediately becomes visible in the user mode.

-- Pete

By cks at 2007-01-01 21:53:07:

Purely from the perspective of locking, I think that the unoptimized code is legitimate (provided that dereferencing *ppos is atomic); with or without locking around the dereference, you get some good value, then you store it, and then you only use the stored value afterwards so things stay coherent.

(Using locking would avoid the problem, but that's because it has the necessary side effect of forcing an optimization barrier.)

As for what sort of thread awareness, I need to go reread the original paper on the subject to see what they proposed (if anything). I don't think they were going to mandate automatic locking (at least not for atomic things that are safe to read and write without locking).

Written on 29 December 2006.
« A thought on the advance of X auto-configuration
Weekly spam summary on December 30th, 2006 »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Dec 29 23:45:34 2006
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.