Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

static shock · Jul 8, 2024

Ryzen 9 9950x 50% ipc increase over 7950x.

SarahKerrigan · Jul 8, 2024

Nothingness said:
Do you realize that'd mean all legacy code would need recompilation to be fast on Zen5?

That's not how things work. And that's why OoOE is required to have a fast CPU (until things change dramatically in the HW/SW space...): to let a beefed up core benefit from its extra ressources without having to recompile everything. And all modern CPU are good at that.

I can't remember when was the last time, recompiling with instruction scheduling for native CPU brought any significant speedup (>2%, that is not in measurement noise) for code I run. Of course native instruction selection is a different story but that doesn't apply to clang.

This.

Uarch-specific machine models buy effectively nothing on modern general-purpose CPUs. They don't actively hurt, and sometimes help a tiny bit, but it's geerally insignificant.

Anyone hoping for non-trivial gains on Zen5 from compilers getting a machine model is likely to be disappointed.

igor_kavinski · Jul 8, 2024

SarahKerrigan said:
Anyone hoping for non-trivial gains on Zen5 from compilers getting a machine model is likely to be disappointed.

Not generally but some workloads could see massive gains such as the one here: https://www.phoronix.com/review/amd-aocc-4/2

I think there is a good chance that Zen 5 would benefit a lot from recompilation. Since AMD is more concerned about enterprise workloads, most people running those workloads would recompile open source software anyway. One could say that Zen 5 is a FOSS-focused architecture.

Nothingness · Jul 8, 2024

igor_kavinski said:
Not generally but some workloads could see massive gains such as the one here: https://www.phoronix.com/review/amd-aocc-4/2

View attachment 102602

I think there is a good chance that Zen 5 would benefit a lot from recompilation. Since AMD is more concerned about enterprise workloads, most people running those workloads would recompile open source software anyway. One could say that Zen 5 is a FOSS-focused architecture.

You either didn't read my reply or didn't understand the problem.

igor_kavinski · Jul 8, 2024

Nothingness said:
You either didn't read my reply or didn't understand the problem.

Somewhat understood you but my stance is that AMD no longer cares about legacy code that much and I kinda understand if that's the truth because modern CPUs are plenty fast for legacy code as it is. Wasting transistors on making that legacy code run faster seems unwise when those same transistors could be used to extract much more performance from recompiling source code.

SarahKerrigan · Jul 8, 2024

igor_kavinski said:
Not generally but some workloads could see massive gains such as the one here: https://www.phoronix.com/review/amd-aocc-4/2

View attachment 102602

I think there is a good chance that Zen 5 would benefit a lot from recompilation. Since AMD is more concerned about enterprise workloads, most people running those workloads would recompile open source software anyway. One could say that Zen 5 is a FOSS-focused architecture.

I said specifically from a machine model. What's happening there with AOCC looks likely to be that it found a vectorization opportunity that upstream LLVM didn't.

The "sharpen" benchmark is heavily vectorizable. That's not "just recompile it and lots of things will magically go faster." That test also has significant differences between GCC and LLVM for the same reason:

110062 – missed vectorization in graphicsmagick

gcc.gnu.org

igor_kavinski said:
Somewhat understood you but my stance is that AMD no longer cares about legacy code that much and I kinda understand if that's the truth because modern CPUs are plenty fast for legacy code as it is. Wasting transistors on making that legacy code run faster seems unwise when those same transistors could be used to extract much more performance from recompiling source code.

Hey, I've seen this one!

Nothingness · Jul 8, 2024

Hail The Brain Slug said:
Gigabyte Technology Co., Ltd. X670E AORUS XTREME vs Gigabyte Technology Co., Ltd. X670E AORUS XTREME - Geekbench

W10 AVX 512 on/off. Hardly any difference. Ignore comparison to previous W11 scores, W10 seems to score slightly better for ST.

Here is an example of 5950x vs 7950x:

ASUS System Product Name vs System manufacturer System Product Name - Geekbench

browser.geekbench.com

Notice how Object Remover stands out as an outlier. This is due to the use of AVX-512.

Hail The Brain Slug · Jul 8, 2024

Nothingness said:
Both your scores show AVX2. Is that expected?

No idea. I just ran it normal, then disabled AVX512 in the UEFI and ran it again.

Edit: I linked the wrong ones

Nothingness · Jul 8, 2024

igor_kavinski said:
Somewhat understood you but my stance is that AMD no longer cares about legacy code that much and I kinda understand if that's the truth because modern CPUs are plenty fast for legacy code as it is. Wasting transistors on making that legacy code run faster seems unwise when those same transistors could be used to extract much more performance from recompiling source code.

Faster code will be mostly due to new instructions that allow vectorizing not due to uarch details. And I doubt any new instruction of Zen5 will benefit clang test.

igor_kavinski · Jul 8, 2024

Nothingness said:
Here is an example of 5950x vs 7950x:

ASUS System Product Name vs System manufacturer System Product Name - Geekbench

browser.geekbench.com

Good but we would have to run both CPUs with the same clockspeed to get a better idea of the gain.

Nothingness · Jul 8, 2024

Hail The Brain Slug said:
No idea. I just ran it normal, then disabled AVX512 in the UEFI and ran it again.

Yeah I double checked on an AVX-512 machine and indeed GB always shows AVX2. But see my post above comparing 5950x vs 7950x. It seems to show AVX-512 benefits at least Object Remover.

Hail The Brain Slug · Jul 8, 2024

Nothingness said:
Yeah I double checked on an AVX-512 machine and indeed GB always shows AVX2. But see my post above comparing 5950x vs 7950x. It seems to show AVX-512 benefits at least Object Remover.

Gigabyte Technology Co., Ltd. X670E AORUS XTREME vs Gigabyte Technology Co., Ltd. X670E AORUS XTREME - Geekbench

The correct scores. I ran each one twice to make sure I didn't get an erroneously slow run due to background processes, then I linked both AVX512 disabled scores in comparison.

igor_kavinski · Jul 8, 2024

Nothingness said:
It seems to show AVX-512 benefits at least Object Remover.

Well then my conclusion is that AVX-512 disablement in the UEFI is broken and doesn't actually disable AVX-512. Or just prevents certain AVX-512 instructions from running.

Nothingness · Jul 8, 2024

igor_kavinski said:
Well then my conclusion is that AVX-512 disablement in the UEFI is broken and doesn't actually disable AVX-512. Or just prevents certain AVX-512 instructions from running.

Or perhaps GB AVX-512 path is faster only on Intel machines due to the 256-bit DP of AMD Zen4. But I find it strange.

eek2121 · Jul 8, 2024

There is a non-AVX binary, but it disables all AVX.

CouncilorIrissa · Jul 8, 2024

So I calculated the integer and FP geomean for the 9900X and compared it to an average performing 7900X and this is what I came up with.

So around 15-16% in INT and 17-18% in FP (not an iso-clock comparison; that would be pointless anyway since there are too many unknowns about test setups. The 9900X is clocking around 5.65 GHz, whereas the 7900X around 5.45GHz).

GB versions are also slightly different, 6.2.2 for the 9900X and 6.3.0 for the 7900X, but the results should be comparable according to Primate Labs: "For systems without SME instructions, Geekbench 6.3 CPU Benchmark scores are comparable with Geekbench 6.1 and Geekbench 6.2 scores."

Fjodor2001 · Jul 8, 2024

I'm wondering whether the (to some people) disappointing next-gen perf increase for both AMD and Intel is due to different focus?

Focus is not on gaining 30-40% instead of 10-20% compared to previous CPU gen, but to provide roughly the same perf as the competitor at the lowest price. Not worth spending a lot more to only gain a few extra percent perf. The world has been hit hard by inflation, and people's budgets are stretched. A few percent extra perf for a CPU is not a priority on their shopping list.

Initial MSRP leaks seem to suggest Zen5 will be priced lower at launch compared to Zen4, hinting in the above mentioned direction. The question however is whether Zen4 will perhaps be an even better option in that case, since perf difference compared to Zen5 is not great.

Final question is whether AMD could have had time to adjust Zen5 arch accordingly compared to original plan. Could the plan originally have been 30-40% IPC increase, and that was what was communicated (as indicated by some leakers), but then they changed their strategy and went for lower per increase and lower price? Or would it be to late for them to change in such a way, from the time they communicated the original intended perf increase until time of launch?

CouncilorIrissa · Jul 8, 2024

Fjodor2001 said:
Final question is whether AMD could have had time to adjust Zen5 arch accordingly compared to original plan. Could the plan originally have been 30-40% IPC increase, and that was what was communicated (as indicated by some leakers), but then they changed their strategy and went for lower per increase and lower price? Or would it be to late for them to change in such a way, from the time they communicated the original intended perf increase until time of launch?

The thirty-something percent IPC figures were based on a single thing: the 96C Turin sample scoring 50% in SIR nT at +25% of the power over 96C Genoa. That's it. AMD's own roadmaps (leaked by yours truly MLID) never mentioned anything of the sort.

Fjodor2001 · Jul 8, 2024

CouncilorIrissa said:
The thirty-something percent IPC figures were based on a single thing: the 96C Turin sample scoring 50% in SIR nT at +25% of the power. That's it. AMD's own roadmaps (leaked by yours truly MLID) never mentioned anything of the sort.

So are you suggesting Zen5 Server core looks completely different compared to Zen5 DT core? Or why would the former score 50%, and the latter 10-20% higher IPC vs Zen4? Only taking higher TDP for the former into account does not explain the huge difference in IPC increase.

Also, are you saying that AMD's plan for Zen5 DT was always 10-20% IPC (or whatever it ends up to be)? I.e. no change of plan from the time of the 30-40% leaks until now due to e.g. focusing on perf/price for consumers that globally are stretched financially due to inflation?

CouncilorIrissa · Jul 8, 2024

Fjodor2001 said:
So are you suggesting Zen5 Server core looks completely different compared to Zen5 DT core? Or why would the former score 50%, and the latter 10-20% higher IPC vs Zen4? Only taking higher TDP for the former into account does not explain the huge difference in IPC increase.

No, I expect it to look exactly the same.
It's just that you can arrive at the +50% (if it's real anyway) perf figure with more than just IPC increase, namely higher sustained clocks, SMT yield, etc.

Fjodor2001 said:
Also, are you saying that AMD's plan for Zen5 DT was always 10-20% IPC (or whatever it ends up to be)?

This admittedly old roadmap appears to suggest so.

H433x0n · Jul 8, 2024

edit: moving to Geekbench specific thread.

Fjodor2001 · Jul 8, 2024

CouncilorIrissa said:
No, I expect it to look exactly the same.
It's just that you can arrive at the +50% (if it's real anyway) perf figure with more than just IPC increase, namely higher sustained clocks, SMT yield, etc.

How would you achieve the higher sustained clocks necessary to reach +50% IPC increase with only 25% extra power consumption vs Zen4 (especially when results show that the last few extra watts usually gain very little extra perf)? Also, what do you mean with "SMT yield" and how much of the +50% IPC increase would that account for, and why?

Finally, why the huge difference in IPC increase between Zen5 server vs desktop CPUs, compared to corresponding for Zen4?

SarahKerrigan · Jul 8, 2024

Fjodor2001 said:
How would you achieve the higher sustained clocks necessary to reach +50% IPC increase with only 25% extra power consumption vs Zen4? Also, what do you mean with "SMT yield" and how much of the IPC increase would that account for, and why?

Could be improved perf from SMT (thread-private frontends); could be improved power gating or other power-oriented microarchitectural changes allowing for a significantly higher clock; could be optimistic pre-release slideware; could be a hallucination among those claiming it exists.

Fjodor2001 · Jul 8, 2024

SarahKerrigan said:
Could be improved perf from SMT (thread-private frontends); could be improved power gating or other power-oriented microarchitectural changes allowing for a significantly higher clock; could be optimistic pre-release slideware; could be a hallucination among those claiming it exists.

But then is it still the same core on server as on DT?

poke01 · Jul 8, 2024

Zen 5 doesn’t need patches for clang. It should just show the performance improvements if core improvements are large enough.

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Member

Senior member

Lifer

Diamond Member

Lifer

Senior member

Diamond Member

Diamond Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Senior member

Diamond Member

Senior member

Diamond Member

Senior member

Golden Member

Diamond Member

Senior member

Diamond Member

Diamond Member