Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

igor_kavinski · Jul 27, 2024

Josh128 said:
Igor, assuming the ES user that provided the leaks earlier still has it, is there any way it would be possible to get him to run a locked 4 or 5 GHz ST run...

Address him as "ES user" in your post and say please and maybe he will?

igor_kavinski · Jul 27, 2024

Philste said:
They claimed a 11-22% MT uplift in Blender. (11% for 9700X, 16% for 9600X, 17% for 9900X and 22% for 9950X). Blender has 23% IPC according to them. So for 17% IPC R23 you get like 8-16% uplift. 8% for 9700X, ~12% for 9600X and 9900X and 16% for 9950X.

https://www.amd.com/en/technologies/zen-core.html

There. Matter settled. It's roughly 16% average IPC lift. I don't see why we should keep beating this IPC dead horse over and over. Not gonna make the dead horse run any faster

AMD footnote for Zen 5 IPC calculation:

Testing as of May 2024 by AMD Performance labs. "Zen 5" system configured with: Ryzen 9 9950X GIGABYTE X670E AORUS MASTER motherboard, Balanced, DDR5-6000, Radeon RX 7900 XTX, VBS=ON, SAM=ON, KRACKENX63 vs. "Zen 4" system configured with: Ryzen 7 7700X, ASUS ROG Crosshair X670E motherboard, Balanced, DDR5-6000, Radeon RX 7900 XTX, VBS=ON, SAM=ON, KRAKENX62 {FixedFrequency=4.0 GHz}. Applications tested include: Handbrake, League of Legends, FarCry 6, Puget Adobe Premiere Pro, 3DMark Physics, Kraken, Blender, Cinebench (n-thread), Geekbench, Octane, Speedometer, and WebXPRT. System manufacturers may vary configurations, yielding different results. GNR-03

SO

Turn off VBS, push RAM all the way up to 7200 or 8000 MT/s for Zen 5 (maybe do that for Zen 4 too if possible), change power plan to High Performance, run all benchmarks at same fixed frequency above 5 GHz and there's a good chance of improvement in the average IPC uplift.

AMD IS SANDBAGGING!!!!

But here's something new to predict for you folks:

What will be the rough average IPC uplift for Turin?

I predict ~17%

SarahKerrigan · Jul 27, 2024

Nothingness said:
And you'd have to recompile all your Linux distro from scratch with the proper Zen5 compiler flags. I have never done that as I never felt the need during my 30 years of using Linux because my new CPU was always performing better by default than the previous one.

If a CPU requires recompilation of existing applications to perform well, I call that a failure. [...]

TLDR: A modern successful CPU should not require recompilation of applications to perform well.

Amen.

"Developers just need to optimize for it more!" is a bad excuse made by vendors of subpar microarchitectures for significantly longer than I've been alive.

SarahKerrigan · Jul 27, 2024

igor_kavinski said:
AMD Ryzen 9 7950X "Zen 4" Rocks On Intel's Clear Linux - Phoronix

www.phoronix.com

Even Zen 4 sees a performance improvement when running on an Intel optimized (and thus recompiled) distro.

View attachment 103989

The article makes it clear that the majority of the difference between vanilla Ubuntu 22.10 and Clear Linux is... using a different power management mode.

Hardly a compelling argument for "recompiling to get acceptable perf is normal."

igor_kavinski · Jul 27, 2024

SarahKerrigan said:
"Developers just need to optimize for it more!" is a bad excuse made by vendors of subpar microarchitectures for significantly longer than I've been alive.

Yeah but there will be developers who WILL optimize for Zen 5.

Think Unreal Engine.

Think AAA games.

Think Windows itself (once they release an optimized Visual Studio).

And obviously Linux developers will optimize too coz that's their passion.

And many, many others.

If you are competing in a tough market, you will do whatever is necessary to maintain your edge.

SarahKerrigan · Jul 27, 2024

igor_kavinski said:
Yeah but there will be developers who WILL optimize for Zen 5.

Think Unreal Engine.

Think AAA games.

Think Windows itself (once they release an optimized Visual Studio).

And obviously Linux developers will optimize too coz that's their passion.

And many, many others.

If you are competing in a tough market, you will do whatever is necessary to maintain your edge.

Okay. Elaborate. What kind of optimizations do you propose? Be specific.

igor_kavinski · Jul 27, 2024

SarahKerrigan said:
Hardly a compelling argument for "recompiling to get acceptable perf is normal."

Because that's the quickest one I could find. My reasoning is that if an Intel optimized distro is giving Zen 4 an uplift, imagine what a distro focused on Zen 4/5 performance would do.

If I had the time and resources, I would find open source versions of software before Zen 4 announcement and one year after announcement and build them both with the latest compiler and maybe then we would have more data on what I'm claiming.

SarahKerrigan · Jul 27, 2024

igor_kavinski said:
Because that's the quickest one I could find. My reasoning is that if an Intel optimized distro is giving Zen 4 an uplift, imagine what a distro focused on Zen 4/5 performance would do.

If I had the time and resources, I would find open source versions of software before Zen 4 announcement and one year after announcement and build them both with the latest compiler and maybe then we would have more data on what I'm claiming.

Except that doesn't say what you think it does, because there are generalized compiler improvements over time - that affect all microarchitectures.

What sorts of optimizations are you proposing?

igor_kavinski · Jul 27, 2024

SarahKerrigan said:
Okay. Elaborate. What kind of optimizations do you propose? Be specific.

You are asking a code monkey who dropped out of the 42 main program coz I couldn't figure out in one month how to write my own secure malloc() function that passed strict Valgrind checks

I have no clue TBH. But my brain says it's possible.

SarahKerrigan · Jul 27, 2024

igor_kavinski said:
You are asking a code monkey who dropped out of the 42 main program coz I couldn't figure out in one month how to write my own secure malloc() function that passed strict Valgrind checks

I have no clue TBH. But my brain says it's possible.

It's really not, for most stuff. The finer points of instruction scheduling are critical when you're on small or highly static cores. On massive OoO cores, the kinds of optimizations that matter tend to be the ones that compilers try to do anyway, and that affect most implementations of the instruction set rather than a specific one - ie, good instruction selection, minimizing unnecessary fills/spills, autovectorization, initiating loads early, etc, etc.

There's no goldmine of potential performance upside from optimization for a specific core. Most "hand optimization" wins in Free software tend to be manual vectorization, which isn't specific to any given microarchitecture.

igor_kavinski · Jul 27, 2024

SarahKerrigan said:
What sorts of optimizations are you proposing?

Here's one idea:

https://www.amd.com/content/dam/amd/en/documents/developer/version-4-2-documents/uprof/uprof-user-guide-v4.2.pdf

Suppose a developer has done extensive profiling for Zen 4 and made changes to his application so that when Zen 4 is detected, he uses specific hot functions that matter a lot to his application's core performance.

Now suppose he uses the same tool to profile Zen 5 and sees some big differences. Some of his assumptions about Zen 4 no longer hold true with Zen 5. So he creates specific functions for Zen 5 again to make sure that his application can get the most out of the new architecture.

This isn't something unheard of in the software world. Yes, most monkey programmers won't go to all this trouble. But Linux gurus, benchmark writers, game engine developers and authors of widely used software like 7-zip and WinRAR may do that. Can't say anything about the latter since it's closed source but maybe someone can look at 7-zip source and see if there are architecture specific optimizations in that.

Hackers love to hack. If someone like that thinks there is more performance to be squeezed out of a new architecture, you bet they will love to tackle that challenge. Because that's the real fun of hacking. The feeling of satisfaction when you crack a hard problem.

gdansk · Jul 27, 2024

Zen 5 is expected to offer around 16% IPC in non-tuned code.
But with its weird front end and big FPU I do expect hand-tuned assembly could be much better. But no one outside of Oak Ridge and Lawrence Livermore will even consider that.

stayfrosty · Jul 27, 2024

igor_kavinski said:
What will be the rough average IPC uplift for Turin?

I predict ~17%

With how starved the uncore / memory of Granite Ridge is i'm gonna go out on a limb and say 19% IPC for Turin (at 5600MT/s).
It's already starved on Raphael and I can only see the gap widening between Desktop and Epyc.

SarahKerrigan · Jul 27, 2024

igor_kavinski said:
Here's one idea:

https://www.amd.com/content/dam/amd/en/documents/developer/version-4-2-documents/uprof/uprof-user-guide-v4.2.pdf

View attachment 103997
Suppose a developer has done extensive profiling for Zen 4 and made changes to his application so that when Zen 4 is detected, he uses specific hot functions that matter a lot to his application's core performance.

Now suppose he uses the same tool to profile Zen 5 and sees some big differences. Some of his assumptions about Zen 4 no longer hold true with Zen 5. So he creates specific functions for Zen 5 again to make sure that his application can get the most out of the new architecture.

This isn't something unheard of in the software world. Yes, most monkey programmers won't go to all this trouble. But Linux gurus, benchmark writers, game engine developers and authors of widely used software like 7-zip and WinRAR may do that. Can't say anything about the latter since it's closed source but maybe someone can look at 7-zip source and see if there are architecture specific optimizations in that.

Hackers love to hack. If someone like that thinks there is more performance to be squeezed out of a new architecture, you bet they will love to tackle that challenge. Because that's the real fun of hacking. The feeling of satisfaction when you crack a hard problem.

Oh. Well, "runtime performance bottlenecks" are totally a specific answer.

7zip has x86 optimizations. It doesn't have optimizations for any specific microarchitecture AFAIK. See for yourself: https://github.com/mcmilk/7-Zip/tree/master/Asm/x86

You continue to show yourself totally incapable of naming a specific optimization Zen5 would benefit from, and to keep leaning on magical thinking.

igor_kavinski · Jul 27, 2024

gdansk said:
But with its weird front end and big FPU I do expect hand-tuned assembly could be much better. But no one outside of Oak Ridge and Lawrence Livermore will even consider that.

Don't forget John Carmack (his personal pet projects that he doesn't release to the world), Tim Sweeney, all the engine developers of AAA studios, Adobe and other workstation software developers.

igor_kavinski · Jul 27, 2024

stayfrosty said:
With how starved the uncore / memory of Granite Ridge is i'm gonna go out on a limb and say 19% IPC for Turin (at 5600MT/s).
It's already starved on Raphael and I can only see the gap widening between Desktop and Epyc.

Good point. Yeah, twelve channels of DDR5-8800 could really make that baby rock!

igor_kavinski · Jul 27, 2024

SarahKerrigan said:
You continue to show yourself totally incapable of naming a specific optimization Zen5 would benefit from, and to keep leaning on magical thinking.

Because there is no other developer here to help me out.

@NTMBK @eek2121 @quikah @repoman0 , HELP!!!!

gdansk · Jul 27, 2024

igor_kavinski said:
Don't forget John Carmack (his personal pet projects that he doesn't release to the world), Tim Sweeney, all the engine developers of AAA studios, Adobe and other workstation software developers.

I don't know but it is possible. Epic/Rad did write extremely optimized Zen 2 specific code for Unreal. They even had an article about it but I can't find it now.

SarahKerrigan · Jul 27, 2024

igor_kavinski said:
Because there is no other developer here to help me out.

@NTMBK @eek2121 @quikah @repoman0 , HELP!!!!

And remember, you weren't just talking about optimized assembly - which is a thing but not necessarily a highly profitable one most of the time - but improvements just from recompiling with Zen5 specific compiler support.

soresu · Jul 27, 2024

igor_kavinski said:
Don't forget John Carmack (his personal pet projects that he doesn't release to the world), Tim Sweeney, all the engine developers of AAA studios, Adobe and other workstation software developers.

Don't forget about Frostbite engine's former lead dev Johan Andersson who spearheaded the Mantle work with AMD all those years ago.

He and some of the former DICE people left and formed a new company called Embark Studios, which is now a subsidiary of Nexon.

maddie · Jul 27, 2024

SarahKerrigan said:
And remember, you weren't just talking about optimized assembly - which is a thing but not necessarily a highly profitable one most of the time - but improvements just from recompiling with Zen5 specific compiler support.

Twisting the knife, are you?

SarahKerrigan · Jul 27, 2024

maddie said:
Twisting the knife, are you?

What can I say? "It'll work great with our magic future compiler" is a pet peeve of mine.

For some reason.

yottabit · Jul 27, 2024

SarahKerrigan said:
Okay. Elaborate. What kind of optimizations do you propose? Be specific.

Clearly he’ll need to add some nested if statements and nested for loops to take advantage of the 2-ahead branch predictor

igor_kavinski · Jul 27, 2024

yottabit said:
Clearly he’ll need to add some nested if statements and nested for loops to take advantage of the 2-ahead branch predictor

Thank you!

SarahKerrigan · Jul 27, 2024

igor_kavinski said:
Thank you!

For... what?

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Lifer

Lifer

Senior member

Senior member

Lifer

Senior member

Lifer

Senior member

Lifer

Senior member

Lifer

Diamond Member

Member

Senior member

Lifer

Lifer

Lifer

Diamond Member

Senior member

Diamond Member

Diamond Member

Senior member

Golden Member

Lifer

Senior member