When I might expect simultaneous multithreading to help

January 19, 2022

Let's accept for the moment the idea that simultaneous multithreading can help under the right circumstances (Intel apparently sometimes claims a potential 30% benefit from hyper-threading, for example). It then becomes interesting to ask when you might expect SMT to help out your machines and when it's probably not going to do anything, partly because if you don't expect much from SMT you should ignore the extra CPUs you theoretically get from SMT when considering choices of machines.

(There's at least two stories of why and how SMT might help.)

The first thing I would expect is that SMT mostly only helps if you have as many running processes as cores. Broadly speaking, a separate core is (or should normally be) superior to the second SMT CPU for unrelated processes, and you'd hope that operating systems schedule processes accordingly. Because SMT CPUs usually share caches and some other resources, it might be useful to schedule two threads from the same process into the same core so that they could take advantage of that, with one thread getting cache hits from the other thread's activities. Of course this won't work for all threads, since sometimes threads don't have any cache sharing.

(And if both threads are trying to access the same external resources at the same time, both could stall in eg RAM latency, which sort of defeats one of the points of SMT. You could be better off scheduling un-correlated processes on each CPU of a SMT pair.)

If a process is paused briefly and then resumed, the operating system will normally try to re-schedule it on the same CPU as it was so that it can take advantage of hot caches there. If this CPU is busy, it might be a win to schedule the process back onto the CPU's SMT sibling if it's free rather than push the process off to a different core; from the cache point of view, the two CPUs in a SMT pair are usually basically the same. This is not a sure thing, since the process that's now on the original CPU might have thrashed the caches up.

I don't know if SMT could be expected to reduce the latency of how long runnable processes wait before executing (and then how long before they produce useful work, which is what really matters). SMT does provide extra CPUs to put processes on, but this only matters if you're busy enough that you don't have any full cores still idle. And once the processes are running they probably need (slow) memory accesses to do useful work.

Given the stories about SMT, I'd expect that you would get bad results if two carefully optimized computational kernels were scheduled onto SMT siblings. Both kernels would have basically maximal use of a core's available resources, and there aren't two of all of those resources across two SMT pairs, so the two sides would be fighting each other for the core's resources. You might do well if you could schedule an integer compute kernel on one side and a floating point compute kernel on the other, but that probably depends on a lot of things.

Comments on this page:

Integer heavy thread + FP heavy thread is an obvious win, since they only conflict on control and load/store functional units.

But in practice, the win is usually less sharing unused functional units, and more about hiding memory latency and increasing the number of parallel requests out to the memory subsystem. If one thread blocks waiting for a slow DRAM access, hopefully the other thread can continue working out of cache. Basically 2 threads that both use less than 1/2 the available memory bandwidth, and end up stalling on memory latency reasonably frequently.

Written on 19 January 2022.
« Logs are invisible (at least most of the time and by default)
I'm using journalctl's --since option now to speed up checking logs »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Jan 19 23:58:59 2022
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.