- Jul 27, 2020
- 13,158
- 7,815
- 106
Imagine a cluster of dual CPUs, each with its own L2 cache. They have a shared L3 V-cache of an optimal size (whatever that may be). Both cores in the cluster are capable of SMT. However, there is something different about this cluster. It has two modes of operation. In multithreaded optimized (MTO) mode, it executes four threads. But in STO mode, it runs a single thread in four different stages of execution. Anytime a misprediction occurs, the focus switches to the thread that has the correct branch taken already. This way, the misprediction penalty can be minimized.
In terms of die space, this is an expensive way to increase ST throughput but maybe it would be feasible in the future where it's possible to cram dozens of such dual clusters economically into available die space? Could this also sidestep Amdahl's law, by using double the resources to boost each thread to maximum performance so that ST to MT ratio increases much more? For a 16 core CPU, we could have 8 dual clusters. Is it possible that in STO mode, 8 threads could get more work done more quickly than 32 hyperthreads in MTO mode?
DISCLAIMER: I've no idea what I'm talking about. Feel free to ruthlessly drill holes in my idea, provided the reasons given are sound.
In terms of die space, this is an expensive way to increase ST throughput but maybe it would be feasible in the future where it's possible to cram dozens of such dual clusters economically into available die space? Could this also sidestep Amdahl's law, by using double the resources to boost each thread to maximum performance so that ST to MT ratio increases much more? For a 16 core CPU, we could have 8 dual clusters. Is it possible that in STO mode, 8 threads could get more work done more quickly than 32 hyperthreads in MTO mode?
DISCLAIMER: I've no idea what I'm talking about. Feel free to ruthlessly drill holes in my idea, provided the reasons given are sound.