Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 610 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

SarahKerrigan

Senior member
Oct 12, 2014
735
2,036
136
Do you realize that'd mean all legacy code would need recompilation to be fast on Zen5?

That's not how things work. And that's why OoOE is required to have a fast CPU (until things change dramatically in the HW/SW space...): to let a beefed up core benefit from its extra ressources without having to recompile everything. And all modern CPU are good at that.

I can't remember when was the last time, recompiling with instruction scheduling for native CPU brought any significant speedup (>2%, that is not in measurement noise) for code I run. Of course native instruction selection is a different story but that doesn't apply to clang.

This.

Uarch-specific machine models buy effectively nothing on modern general-purpose CPUs. They don't actively hurt, and sometimes help a tiny bit, but it's geerally insignificant.

Anyone hoping for non-trivial gains on Zen5 from compilers getting a machine model is likely to be disappointed.
 
Jul 27, 2020
28,047
19,148
146
Anyone hoping for non-trivial gains on Zen5 from compilers getting a machine model is likely to be disappointed.
Not generally but some workloads could see massive gains such as the one here: https://www.phoronix.com/review/amd-aocc-4/2

1720456809359.png

I think there is a good chance that Zen 5 would benefit a lot from recompilation. Since AMD is more concerned about enterprise workloads, most people running those workloads would recompile open source software anyway. One could say that Zen 5 is a FOSS-focused architecture.
 
  • Like
Reactions: lightmanek

Nothingness

Diamond Member
Jul 3, 2013
3,301
2,373
136
Not generally but some workloads could see massive gains such as the one here: https://www.phoronix.com/review/amd-aocc-4/2

View attachment 102602

I think there is a good chance that Zen 5 would benefit a lot from recompilation. Since AMD is more concerned about enterprise workloads, most people running those workloads would recompile open source software anyway. One could say that Zen 5 is a FOSS-focused architecture.
You either didn't read my reply or didn't understand the problem.
 
Jul 27, 2020
28,047
19,148
146
You either didn't read my reply or didn't understand the problem.
Somewhat understood you but my stance is that AMD no longer cares about legacy code that much and I kinda understand if that's the truth because modern CPUs are plenty fast for legacy code as it is. Wasting transistors on making that legacy code run faster seems unwise when those same transistors could be used to extract much more performance from recompiling source code.
 

SarahKerrigan

Senior member
Oct 12, 2014
735
2,036
136
Not generally but some workloads could see massive gains such as the one here: https://www.phoronix.com/review/amd-aocc-4/2

View attachment 102602

I think there is a good chance that Zen 5 would benefit a lot from recompilation. Since AMD is more concerned about enterprise workloads, most people running those workloads would recompile open source software anyway. One could say that Zen 5 is a FOSS-focused architecture.

I said specifically from a machine model. What's happening there with AOCC looks likely to be that it found a vectorization opportunity that upstream LLVM didn't.

The "sharpen" benchmark is heavily vectorizable. That's not "just recompile it and lots of things will magically go faster." That test also has significant differences between GCC and LLVM for the same reason:


Somewhat understood you but my stance is that AMD no longer cares about legacy code that much and I kinda understand if that's the truth because modern CPUs are plenty fast for legacy code as it is. Wasting transistors on making that legacy code run faster seems unwise when those same transistors could be used to extract much more performance from recompiling source code.

Hey, I've seen this one!
 

Nothingness

Diamond Member
Jul 3, 2013
3,301
2,373
136

W10 AVX 512 on/off. Hardly any difference. Ignore comparison to previous W11 scores, W10 seems to score slightly better for ST.
Here is an example of 5950x vs 7950x:

Notice how Object Remover stands out as an outlier. This is due to the use of AVX-512.
 

Nothingness

Diamond Member
Jul 3, 2013
3,301
2,373
136
Somewhat understood you but my stance is that AMD no longer cares about legacy code that much and I kinda understand if that's the truth because modern CPUs are plenty fast for legacy code as it is. Wasting transistors on making that legacy code run faster seems unwise when those same transistors could be used to extract much more performance from recompiling source code.
Faster code will be mostly due to new instructions that allow vectorizing not due to uarch details. And I doubt any new instruction of Zen5 will benefit clang test.
 

Hail The Brain Slug

Diamond Member
Oct 10, 2005
3,882
3,311
146
Yeah I double checked on an AVX-512 machine and indeed GB always shows AVX2. But see my post above comparing 5950x vs 7950x. It seems to show AVX-512 benefits at least Object Remover.

The correct scores. I ran each one twice to make sure I didn't get an erroneously slow run due to background processes, then I linked both AVX512 disabled scores in comparison.
 

Nothingness

Diamond Member
Jul 3, 2013
3,301
2,373
136
Well then my conclusion is that AVX-512 disablement in the UEFI is broken and doesn't actually disable AVX-512. Or just prevents certain AVX-512 instructions from running.
Or perhaps GB AVX-512 path is faster only on Intel machines due to the 256-bit DP of AMD Zen4. But I find it strange.
 
  • Like
Reactions: igor_kavinski

CouncilorIrissa

Senior member
Jul 28, 2023
729
2,685
106
So I calculated the integer and FP geomean for the 9900X and compared it to an average performing 7900X and this is what I came up with.
1720464420689.png
So around 15-16% in INT and 17-18% in FP (not an iso-clock comparison; that would be pointless anyway since there are too many unknowns about test setups. The 9900X is clocking around 5.65 GHz, whereas the 7900X around 5.45GHz).

GB versions are also slightly different, 6.2.2 for the 9900X and 6.3.0 for the 7900X, but the results should be comparable according to Primate Labs: "For systems without SME instructions, Geekbench 6.3 CPU Benchmark scores are comparable with Geekbench 6.1 and Geekbench 6.2 scores."
 

Fjodor2001

Diamond Member
Feb 6, 2010
4,211
583
126
I'm wondering whether the (to some people) disappointing next-gen perf increase for both AMD and Intel is due to different focus?

Focus is not on gaining 30-40% instead of 10-20% compared to previous CPU gen, but to provide roughly the same perf as the competitor at the lowest price. Not worth spending a lot more to only gain a few extra percent perf. The world has been hit hard by inflation, and people's budgets are stretched. A few percent extra perf for a CPU is not a priority on their shopping list.

Initial MSRP leaks seem to suggest Zen5 will be priced lower at launch compared to Zen4, hinting in the above mentioned direction. The question however is whether Zen4 will perhaps be an even better option in that case, since perf difference compared to Zen5 is not great.

Final question is whether AMD could have had time to adjust Zen5 arch accordingly compared to original plan. Could the plan originally have been 30-40% IPC increase, and that was what was communicated (as indicated by some leakers), but then they changed their strategy and went for lower per increase and lower price? Or would it be to late for them to change in such a way, from the time they communicated the original intended perf increase until time of launch?
 

CouncilorIrissa

Senior member
Jul 28, 2023
729
2,685
106
Final question is whether AMD could have had time to adjust Zen5 arch accordingly compared to original plan. Could the plan originally have been 30-40% IPC increase, and that was what was communicated (as indicated by some leakers), but then they changed their strategy and went for lower per increase and lower price? Or would it be to late for them to change in such a way, from the time they communicated the original intended perf increase until time of launch?
The thirty-something percent IPC figures were based on a single thing: the 96C Turin sample scoring 50% in SIR nT at +25% of the power over 96C Genoa. That's it. AMD's own roadmaps (leaked by yours truly MLID) never mentioned anything of the sort.
 

Fjodor2001

Diamond Member
Feb 6, 2010
4,211
583
126
The thirty-something percent IPC figures were based on a single thing: the 96C Turin sample scoring 50% in SIR nT at +25% of the power. That's it. AMD's own roadmaps (leaked by yours truly MLID) never mentioned anything of the sort.
So are you suggesting Zen5 Server core looks completely different compared to Zen5 DT core? Or why would the former score 50%, and the latter 10-20% higher IPC vs Zen4? Only taking higher TDP for the former into account does not explain the huge difference in IPC increase.

Also, are you saying that AMD's plan for Zen5 DT was always 10-20% IPC (or whatever it ends up to be)? I.e. no change of plan from the time of the 30-40% leaks until now due to e.g. focusing on perf/price for consumers that globally are stretched financially due to inflation?
 

CouncilorIrissa

Senior member
Jul 28, 2023
729
2,685
106
So are you suggesting Zen5 Server core looks completely different compared to Zen5 DT core? Or why would the former score 50%, and the latter 10-20% higher IPC vs Zen4? Only taking higher TDP for the former into account does not explain the huge difference in IPC increase.
No, I expect it to look exactly the same.
It's just that you can arrive at the +50% (if it's real anyway) perf figure with more than just IPC increase, namely higher sustained clocks, SMT yield, etc.

Also, are you saying that AMD's plan for Zen5 DT was always 10-20% IPC (or whatever it ends up to be)?
This admittedly old roadmap appears to suggest so.
1720466965916.png
 

Fjodor2001

Diamond Member
Feb 6, 2010
4,211
583
126
No, I expect it to look exactly the same.
It's just that you can arrive at the +50% (if it's real anyway) perf figure with more than just IPC increase, namely higher sustained clocks, SMT yield, etc.
How would you achieve the higher sustained clocks necessary to reach +50% IPC increase with only 25% extra power consumption vs Zen4 (especially when results show that the last few extra watts usually gain very little extra perf)? Also, what do you mean with "SMT yield" and how much of the +50% IPC increase would that account for, and why?

Finally, why the huge difference in IPC increase between Zen5 server vs desktop CPUs, compared to corresponding for Zen4?
 

SarahKerrigan

Senior member
Oct 12, 2014
735
2,036
136
How would you achieve the higher sustained clocks necessary to reach +50% IPC increase with only 25% extra power consumption vs Zen4? Also, what do you mean with "SMT yield" and how much of the IPC increase would that account for, and why?

Could be improved perf from SMT (thread-private frontends); could be improved power gating or other power-oriented microarchitectural changes allowing for a significantly higher clock; could be optimistic pre-release slideware; could be a hallucination among those claiming it exists.
 

Fjodor2001

Diamond Member
Feb 6, 2010
4,211
583
126
Could be improved perf from SMT (thread-private frontends); could be improved power gating or other power-oriented microarchitectural changes allowing for a significantly higher clock; could be optimistic pre-release slideware; could be a hallucination among those claiming it exists.
But then is it still the same core on server as on DT?
 

poke01

Diamond Member
Mar 8, 2022
4,231
5,566
106
Zen 5 doesn’t need patches for clang. It should just show the performance improvements if core improvements are large enough.