It's hard to compare microarchitecture when AMD is behind on process by their own profiteering choice. But people will always do it anyway because that's what we can.
All I ask is that the ARM fans try to understand that AMD has more or less slapped all ARM server attempts back to the safety of the hypervisors big enough to exploit ARM's discount IP. And why that might be. One contributing factor is that Apple is the exception to ARM implementations so far. Maybe Nuvia/Qualcomm can also pull it off but they haven't yet. It seems just as much a matter of time as it did five years ago, which is odd for something inevitable.
My argument has always been that there is a lesson in
why Apple is the exception to ARM implementations.
It's obviously not an intrinsic feature of the ARM ISA or else all implementations would show it, but it's possibly an enabling feature. Nobody else is getting Apple's decode width, which may not be possible with x86 right now, and that favors single core over AMD scale SMP. And what are we considering in this comparison? Memory architecture? Asymmetric/heterogeneous cores?
I have argued that the reason why Apple Silicon is the exception is more due to business model than engineering. For example, how did Apple get a 5x improvement on releasing memory in M1 over x86? Is that an inherent property of the design or is that something they sought because it was more critical to performance than it would be on x86, and how did it come to be that was more critical?
There's been a debate raging in F1 regarding the Red Bull car performance difference between their #1 and #2 drivers. See, their #1 driver Max Verstappen does great in the car, and their #2 drivers have been generally terrible - like really terrible - averaging something like 10 positions behind Max. What does this say about the performance of the car relative to other teams? One driver can finish on the podium and the other can't even get in the points - and they've gone through 3 drivers like this. Nobody has ever seen anything like it. The leading theory is that the car is pretty mid but if you set it up precisely for Maxs driving style and talent he, and only he, can get good results out of it. Essentially, the car is mid, but has the capacity to be great, but only in a controlled environment.
So is Apple Silicon performant because Apple has so much control over the entire environment and can tune it for that environment, and is the rest of ARM and x86 less performant because their business models don't allow for that and they need to engineer to a broad set of applications such that trade offs for performance can't be well controlled? Because Apple better controls the
nature of how the CPU performs, they can make tradeoffs in favor of what it is most likely to be doing and how. So it can accept worse performance in areas that it sees infrequently in exchange for better performance in areas it sees frequently. By comparison, the component suppliers don't have that agency and need to balance performance across all potential situations which means they have to forgo that peak potential. They make up for some of that with their SKU spam by having variants that are better suited for some applications than others, but ultimately can't fully make up for it.
Microsoft quite a while ago figured out that they needed to follow Apple. So I think they by and large understand the benefits of Apple's model and their arrangement with Qualcomm could get them there. But I'm not sure Microsoft is able to go as far as Apple has. They don't have the degree of influence over their developers. x86 is still the primary business. But we'll see.