Any comparisons with PowerPC, SPARC, or MIPS?
One of the papers referenced in the thesis included MIPS. This was the conclusion:
We analyze measurements on seven platforms spanning three ISAs (MIPS, ARM, and x86) over workloads spanning mobile, desktop, and server computing. Our methodical investigation demonstrates the role of ISA in modern microprocessors’ performance and energy efficiency. We find that ARM, MIPS, and x86 processors are simply engineering design points optimized for different levels of performance, and there is nothing fundamentally more energy efficient in one ISA class or the other. The ISA being RISC or CISC seems irrelevant.
Basically they found that when taking into account the different workloads each CPU was designed for, the ISA made no appreciable difference and that the microarchitecural design for the target workloads was what made the difference.
Interesting that other study shows ISAs gets equal when using OoO engine because it this study is exactly opposite way. The thing is that there is only one super wide engine.... that's famous 6xALU APple core and this is based on ARM. So I tend to believe that this physical evidence of such a wide scalar core supports this study.
Apple's ALU count is a microarchitecture design decision, not an ISA comparison.
I don't want to create thread about x86 vs. ARM war and who is gonna win because that's about economy. I'd like to keep it technical here. Later we can discuss ARMv9 pros and cons.
If you want to keep economics out of it, that's fine, but then it'll be a very short discussion. You still have to face the fact that the best hardware design in the world doesn't mean anything if there isn't software to run it. I don't think anyone disagrees that ARM is a more modern and efficient ISA, the question is by how much (not that significant by those who have done studies on it) and does it actually matter for markets where x86 is dominant because of the software component. This obviously leads to the economic side of things but we can just stop there.
IMHO there are no huge restrictions under the hood of x86 CPU. x86 use CISC/RISC abstraction layer so the difference in registry size for example is overcome (ARM has 31x 64-bit registry while x86-64 has only 16) because internally is using bigger resources. But here comes the problem: this abstraction layer dealing with old bottlenecks cost transistors, power and also produce inefficient code (compiler doesn't see the bigger resource of modern CPU and must produce code that fits resource poor i386 or x86-64 Atom core). However the most important question is how big the penalty is. Because if it small then who cares and you have all that great compatibility. But this penalty increases in time so while K8&C2D was doing great, Ice Lake might start to struggle.
Modern compilers are really, really good at extracting performance from x86. It's not like the compilers aren't aware of how modern x86 CPUs function. Compilers also allow you to target optimizations for different architectures (Zen, Haswell+, Atom, etc.) so I don't see having to support old/small x86 cores being a big deal from a compiler perspective.
This leads me to reason why we don't see some super wide 6xALU design at x86 similar to Apple (and if Matterhorn is suggested to have IPC of A12 then it will have probably 6xALUs too). x86 core with 6xALU is possible but it would struggle much more than Ice Lake so it would need more time to develop.
Because it's not as simple as just slapping a couple more ALUs on the chip and winning. There are a lot of trade-offs that are made when designing a CPU and Intel/AMD have very different design goals than Apple does.
That's probably why Intel and AMD focus much more at FPU/ SIMD performance because these new ISA extensions like AVX are modern and bottleneck free. That's why I'm starting to believe that Zen3 performance rumors makes sense (+17% int IPC, +50% fpu).
AVX+ isn't purely about floating point operations, there are integer instructions as well though the focus is on floating point. Again though, design goals and trade-offs.
That article aside, how much advantage/disadvantage are we talking about here overall, in terms of ISA? Will that even matter that much in the big picture given the bazillion other variables involved?
Depends on what types of instructions are being executed but from the paper referenced earlier where they cycle accurate simulate the difference between the two ISAs with OoO designs, it appears to be 10 - 15% or so on average.