Question Superscalar vs SIMD vs MIMD

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
The three aren't mutually exclusive I'm sure. Both the Apple A series and contemporary x86-64 CPUs can be considered superscalar and SIMD capable microprocessors. But it seems that Apple has focused more on the superscalar aspect, while Intel and AMD have focused more on the SIMD approach when designing their CPUs.

Also, due to the high core counts and SMT capabilities in Intel and AMD CPUs, wouldn't they also qualify as MIMD architectures?

Is it wrong to state this? The Apple A series uses NEON, which is 128 bit SIMD extension, while Intel currently uses vector widths up to 512 bit, and AMD up to 256 bit with two per core plus two FMA units.

So to the informed, I am asking these questions:

1) Superscalar and SIMD/MIMD obviously all have a tremendous impact on the overall design of the CPU. However, does supporting wider vectors come at the expense of superscalar IPC, and vice versa?

2) Do you think it's a mistake for AMD and Intel to adopt wider vector extensions into their CPU designs in the long term, especially if it comes at the expense of superscalar IPC?

3) Does SIMD + multicore/multithread = MIMD?

I can only say this as a hardware enthusiast and PC gamer, but I am very pleased with the evolutionary trajectory of the x86 CPU over the past decade. The focus on more cores as well as wider vectors and SMT has enabled CPUs to do things that were simply not possible before. I hope they keep it up! :cool:
 

moinmoin

Diamond Member
Jun 1, 2017
4,933
7,618
136
I think you just blew everybody's mind. :sweatsmile:

They are good questions, but I personally wouldn't know where to start. Let's try...
1) Going with AMD's improvements in Zen it seems both can happen at the same time. As least I interpret fused macro-ops as a form of superscalarity.
2) I personally think adopt wider vector extensions is a mistake (they take too much area and power relative to the benefits, better integration of GPUs would be both faster and more future proof), but it doesn't appear to be at the expense of superscalar IPC (see previous answer).
3) Is that a trick question? :grinning: SIMD + multicore/multithread is not MIMD. But considering superscalarity, no idea if there's any approach that could manage to turn SIMD into effectively MIMD internally.

I'm sure there are others much more capable of shining a better light on the whole picture.
 
  • Like
Reactions: Carfax83

ThatBuzzkiller

Golden Member
Nov 14, 2014
1,120
260
136
1) There is always going to be a limited and contended transistor budget for improving any aspects of a CPU design whether it'd be either superscalar IPC or SIMD/MIMD ...

2) Application design in the near future will vindicate whether or not wider vectors were necessary ...

3) Yes according to the definition in Flynn's Taxonomy. Practically any CPU that has a number of processors that function asynchronously from each other can be defined as 'MIMD' ...
 
  • Like
Reactions: Carfax83

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
I think you just blew everybody's mind. :sweatsmile:

Well I have to say the reason why I asked these questions is because I was wondering whether Intel's pursuance of wider SIMD vectors was also a factor for Apple being able to increase their superscalar IPC to such a high level, as to surpass Intel in that arena. I'm not saying that's the only factor of course as there are others, ie clock speed, process node etcetera.

But these technologies take up die space and it's conceivable that focusing on one could come at the expense of another.

2) I personally think adopt wider vector extensions is a mistake (they take too much area and power relative to the benefits, better integration of GPUs would be both faster and more future proof), but it doesn't appear to be at the expense of superscalar IPC (see previous answer).

This is a common refrain on these forums it seems. But using GPUs instead would come with a lot more limitations it would seem. For instance, I remember many predictions from years ago that GPU accelerated encoding was the future, but it never became manifest. While the GPU is much faster, the quality per bit rate is significantly lower.

3) Is that a trick question? :grinning: SIMD + multicore/multithread is not MIMD. But considering superscalarity, no idea if there's any approach that could manage to turn SIMD into effectively MIMD internally.

I'm sure there are others much more capable of shining a better light on the whole picture.

Nope, not a trick question! It seems that Intel and AMD CPUs are becoming more and more throughput oriented with higher numbers of cores and SMT, plus wider vectors. To me that's more akin to MIMD if anything.
 

moinmoin

Diamond Member
Jun 1, 2017
4,933
7,618
136
Well I have to say the reason why I asked these questions is because I was wondering whether Intel's pursuance of wider SIMD vectors was also a factor for Apple being able to increase their superscalar IPC to such a high level, as to surpass Intel in that arena.
Intel's focus on where to improve their tech has been oddly onesided in the last decade, aside relatively minor improvements it was all SIMD and iGPU. Increasing the superscalar IPC is an obvious area for improvement especially in low power usage scenarios. To me it seems Apple just made the most effort in that area so far.

But using GPUs instead would come with a lot more limitations it would seem. For instance, I remember many predictions from years ago that GPU accelerated encoding was the future, but it never became manifest. While the GPU is much faster, the quality per bit rate is significantly lower.
That's a software support issue imo. Programming for GPU is naturally not as flexible as for CPU, so to replicate the latter's flexibility the former needs much more code and case specific optimizations (but the payoff is huge). My old wish is eventually having the CPU detect any embarrassingly parallel loops on its own and transparently execute them on GPU. One can dream.

To me that's more akin to MIMD if anything.
Akin, yes. To me this makes the term essentially meaningless since technically it then applies as soon as more than one thread can run on a CPU concurrently. Once could say all multithreading CPUs are "accidentally" MIMD then.
 
  • Like
Reactions: Carfax83

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
1) There is always going to be a limited and contended transistor budget for improving any aspects of a CPU design whether it'd be either superscalar IPC or SIMD/MIMD ...

That's what I thought. The design choices are primarily affected by the target platform and usage scenarios. It wouldn't be a good idea for Apple to implement a 512 bit vector unit in their mobile CPU for obvious reasons. And, for Intel and AMD, making their CPUs too wide would likely come at the expense of clock speed.

2) Application design in the near future will vindicate whether or not wider vectors were necessary ...

I'd say it's already been determined. More and more applications are being tuned for multithreading and SIMD, and Intel has been very aggressive in helping developers speed up their applications with their Intel® Implicit SPMD Program Compiler.

A really good example of this is their SVT codecs.

3) Yes according to the definition in Flynn's Taxonomy. Practically any CPU that has a number of processors that function asynchronously from each other can be defined as 'MIMD' ...

When I did my research I did come across that definition as well. So it would seem that any modern CPU is by definition MIMD, since they are all multicore and can function asynchronously.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Intel's focus on where to improve their tech has been oddly onesided in the last decade, aside relatively minor improvements it was all SIMD and iGPU. Increasing the superscalar IPC is an obvious area for improvement especially in low power usage scenarios. To me it seems Apple just made the most effort in that area so far.

That's my observation as well. I think the reason is because they have been developing their Intel Implicit SPMD Program compiler technology which makes it much easier to auto vectorize and parallelize applications, leading to huge performance gains. The implication being that their CPUs would become much faster over time, and yield much larger performance increases than if they had focused on superscalar IPC like Apple, which is by all accounts much harder to increase.

I can't say they are wrong for doing this either. So many software variants are optimized for multithreading and SIMD these days, and trend is going upward.

That's a software support issue imo. Programming for GPU is naturally not as flexible as for CPU, so to replicate the latter's flexibility the former needs much more code and case specific optimizations (but the payoff is huge). My old wish is eventually having the CPU detect any embarrassingly parallel loops on its own and transparently execute them on GPU. One can dream.

I don't do encoding very often, but from reading doom9 and other forums, it seems that hardware accelerated encoding (whether GPU or specialized ASICS processors like NVEnc) sacrifice quality for speed. And if you want the same quality as done in software, you will end up with HUGE file sizes because these processors cannot handle the extreme complexity and serial/branchy nature of the encoding algorithms which lend towards efficiency.

Intel's development of their SVT codecs has managed to increase performance by leaps and bounds without sacrificing quality, due to multithreading and SIMD optimizations.

Akin, yes. To me this makes the term essentially meaningless since technically it then applies as soon as more than one thread can run on a CPU concurrently. Once could say all multithreading CPUs are "accidentally" MIMD then.

It would appear that way yes. From what I have read, it seems that ALL modern CPUs are by definition MIMD since there are no more single core CPUs to my knowledge.
 
  • Like
Reactions: moinmoin

ThatBuzzkiller

Golden Member
Nov 14, 2014
1,120
260
136
This is a common refrain on these forums it seems. But using GPUs instead would come with a lot more limitations it would seem. For instance, I remember many predictions from years ago that GPU accelerated encoding was the future, but it never became manifest. While the GPU is much faster, the quality per bit rate is significantly lower.

The reason why GPU compute isn't so popular has to do with the fact that they have an extremely different programming models across different hardware vendors ...

You have to port your C++ kernels to a hardware vendor specific version of C++ for device acceleration. That means if you're on an AMD GPU, you need to port the C++ kernels into the HIP kernel language. If you're on an Nvidia GPU, you have to port the C++ kernels into the CUDA kernel language ...

By default, your GPU doesn't support standard C++. The world would be a very different place if GPUs somehow came their own standard C++ compiler and then we wouldn't have to deal with hardware vendor specific kernel languages for obvious reasons we do now ...

That's what I thought. The design choices are primarily affected by the target platform and usage scenarios. It wouldn't be a good idea for Apple to implement a 512 bit vector unit in their mobile CPU for obvious reasons. And, for Intel and AMD, making their CPUs too wide would likely come at the expense of clock speed.

Another just as likely reason is because the Android software ecosystem doesn't want to standardize larger vector widths! Google, Samsung, or Qualcomm are not interested in wider vectors for mobile either even if Apple wanted to beat them into submission for supporting it ...

I'd say it's already been determined. More and more applications are being tuned for multithreading and SIMD, and Intel has been very aggressive in helping developers speed up their applications with their Intel® Implicit SPMD Program Compiler.

A really good example of this is their SVT codecs.

Intel no doubt is a big contributor but without AMD or Microsoft, they can't truly solidify everything into a standard just as observed with Thunderbolt technology until it was rolled into the USB standard. Intel just like any other needs industry consensus because standardization doesn't come from a single corporation but it comes with cooperation between different corporations ...
 
  • Like
Reactions: Carfax83