Discussion x86 CPU cluster configurable for optimized ST or MT - feasible?

Jul 27, 2020
13,158
7,815
106
Imagine a cluster of dual CPUs, each with its own L2 cache. They have a shared L3 V-cache of an optimal size (whatever that may be). Both cores in the cluster are capable of SMT. However, there is something different about this cluster. It has two modes of operation. In multithreaded optimized (MTO) mode, it executes four threads. But in STO mode, it runs a single thread in four different stages of execution. Anytime a misprediction occurs, the focus switches to the thread that has the correct branch taken already. This way, the misprediction penalty can be minimized.

In terms of die space, this is an expensive way to increase ST throughput but maybe it would be feasible in the future where it's possible to cram dozens of such dual clusters economically into available die space? Could this also sidestep Amdahl's law, by using double the resources to boost each thread to maximum performance so that ST to MT ratio increases much more? For a 16 core CPU, we could have 8 dual clusters. Is it possible that in STO mode, 8 threads could get more work done more quickly than 32 hyperthreads in MTO mode?

DISCLAIMER: I've no idea what I'm talking about. Feel free to ruthlessly drill holes in my idea, provided the reasons given are sound.
 
  • Like
Reactions: Vattila

JustViewing

Member
Aug 17, 2022
108
208
76
First off modern processors are good at predicting branches. If code contains lots of unpredictable branches your suggestion might work, but branches doesn't happen in isolation. For example, branches may be part of complex if/elseif code. So all possible branch-combination expands exponentially to a point that having to run different branches simultaneously is not practical. Secondly running 2 threads and discarding one will waste lot of power, maybe even double the power requirement. Thirdly and most importantly, there is a dependency chain. It will be impractical to share Register states with other cores without very significant latency.

Dependency is the main cause preventing higher IPC. x64 only have 16 registers, and some of them has to be used as memory pointers. if their is dependancy, nothing much the register/renaming can do and you have to save/load registers constantly. From my personal experience, 16 is not enough. I think 32 integer registers would be ideal. 16 AVX registers are somewhat ok, and with AVX512 the registers are increased 32. So the potential is there to increase the IPC in AVX code. The bottleneck will be the memory ports and L1/L2/L3 bandwidth.

I would think the reverse of what you suggested might work. That is, have a very wide core and implement 4/8 way hyper threading. By doing this way you can have excellent ST as well as good MT.
 

Doug S

Platinum Member
Feb 8, 2020
2,002
3,087
106
Execution of both sides of a branch has been talked about (and run on simulators) forever, but the idea is always discarded because it increases power draw too much and has some negative side effects that hurt performance like cache pollution from loading little used code paths into i-cache (imagine having all your error handling and "should never happen" tests loaded into cache, evicted, re-loaded, ad infinitum...)
 
Jul 27, 2020
13,158
7,815
106
Let's suppose that the CPU is optimized for branch-y gaming code, at the expense of other types of general purpose application code. How does my idea look then, assuming gamers don't care about power draw? Is it possible to design a CPU that would run game engines much, much faster than a CPU with a balanced architecture? Kind of like a freak gaming CPU?
 

Thunder 57

Platinum Member
Aug 19, 2007
2,501
3,403
136
Let's suppose that the CPU is optimized for branch-y gaming code, at the expense of other types of general purpose application code. How does my idea look then, assuming gamers don't care about power draw? Is it possible to design a CPU that would run game engines much, much faster than a CPU with a balanced architecture? Kind of like a freak gaming CPU?

Why? A CPU is general purpose by definition. It sounds like you want something that already exists, "a CPU that would run games engines much, much faster...". That would be a GPU. Before video acceleration games ran on CPU's.

Also, why? How many games are CPU bound? Not many. You hardly need a top of the line CPU to get excellent performance.

And then you have the problem of who would make it? They could never recover the cost of R&D plus production. There aren't nearly enough gaming only people out there that would want a "gaming" CPU instead of a more balanced one.
 
Jul 27, 2020
13,158
7,815
106
And then you have the problem of who would make it? They could never recover the cost of R&D plus production. There aren't nearly enough gaming only people out there that would want a "gaming" CPU instead of a more balanced one.
There are gaming phones out there. I've yet to see someone in real life using one, which means they can't be high volume but some companies are still making them. A few million performance crazed gamers is not a small market.
 

DrMrLordX

Lifer
Apr 27, 2000
21,277
10,457
136
There are gaming phones out there. I've yet to see someone in real life using one, which means they can't be high volume but some companies are still making them. A few million performance crazed gamers is not a small market.

Gaming phones are a bigger deal in Asia. Plus if you buy one of those units in the United States, you may find that network support is spotty at best. ROG phone users have been coping with that for years.

fwiw I had an ROG Phone 2 and I had issues when moving away from 3G to VoLTE/Wifi calling since GSM carriers (notably AT&T) hadn't whitelisted the phone for VoLTE. Plus it had dodgy firmware that made it hard to change VoLTE settings in the first place. Anyway I had to go to a conventional Samsung S21 Ultra to avoid future problems with ROG Phones (ROG Phone 5/5s also apparently has some interesting network issues in the US). I might go back to ROG Phone eventually, but only once I'm convinced I won't have any more nasty surprises. It's sad because even if you don't want a "gamer" phone for gaming, the specs on those things can be fantastic.

Kind of off-topic but since your brought it up . . .
 
  • Like
Reactions: igor_kavinski