Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 986 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

MS_AT

Senior member
Jul 15, 2024
786
1,594
96
His geomean results put 9980X as +30% over 7980X. Very interesting as it should have similar clocks as the 7980X. This holds true for 9970X vs 7970X as well (+28%). Whats accounting for the additional +14% overall perf over the claimed +16% IPC?
Sustained clocks should be higher. Just compare on Epyc https://www.amd.com/en/products/pro...ation-9004-and-8004-series/amd-epyc-9554.html vs https://www.amd.com/en/products/processors/server/epyc/9005-series/amd-epyc-9555.html (all core boost speed, this is not specified for threadripper parts) this is also what they were talking about around Zen5 release. That Zen5 parts don't boost higher but are able to hold higher clocks under heavier loads compared to Zen4. I mean it's few hundred MHz but if you add IPC increase and memBW increase it will all add up.
 

StefanR5R

Elite Member
Dec 10, 2016
6,593
10,392
136

That Strix Halo vs. 9950X3D comparison also shows some weirdly great multicore wins for the Halo chip and I think membw is the only explanation.
STX Halo has a faster and more expensive inter-CCD connect.
The write direction is wider than in Granite Ridge, and serialization/deserialization is eliminated, otherwise they left it the same:
Granite Ridge ......... 32 B/cycle read, 16 B/cycle write per CCX
Strix Point .............. 32 B/cycle read, 32 B/cycle write per CCX
Strix Halo ............... 32 B/cycle read, 32 B/cycle write per CCX

Sources:
https://chipsandcheese.com/p/amds-ryzen-9950x-zen-5-on-desktop
https://chipsandcheese.com/p/amds-strix-halo-under-the-hood
 

fastandfurious6

Senior member
Jun 1, 2024
634
809
96
That's what the PS6 handheld is.

I just described a few days ago the future is handheld / pocket devices 😂 my third eye needs to go to sleep...

great decision by AMD and Sony, gonna be killer and just makes sense

(the proper thing to do will be the Dock to offer extra cooling to unlock full potential of the special Medusa Halo variant. something like a slot to connect to huge heatsink. if they haven't got that ready I guess they'll 2 versions, handheld and Pro desktop for 4k gaming)
 
  • Like
Reactions: poke01

aigomorla

CPU, Cases&Cooling Mod PC Gaming Mod Elite Member
Super Moderator
Sep 28, 2005
21,063
3,558
126
After watching the reviews and the honest odds of even getting a 9960X, i think i will keep my 7960X and not undergo the future buyers remourse as there will be non, unless they come out with a X3D variant.


Screenshot 2025-07-31 154246.jpg

Is plenty more then enough for me.
 

mostwanted002

Member
Jun 16, 2023
63
125
86
mostwanted002.page
From Wendell's linux video,

Launch of TR 9000s is not just performance upliftment, but it also brought in platform updates to unlock some really cool features. E-SMI is one of the examples, where it can monitor real-time PCIe bandwidth from the platform itself, instead of querying the PCIe device.


RTX Pro 6000 cards in P2P were maxing out the available bandwidth. Makes this and TR PROs ideal platform for builds incorporating high performance PCIe peripherals.

Another was availability of CXL configuration.
Fully unlocked BIOS are available on AsRock and Gigabyte boards.
ASUS is there as well, but doesn't have CXL options.
 
Jul 27, 2020
26,612
18,313
146
What's so big about it? The LLVM results? Everything seemed normal there.
The cache is helping unexpectedly in some workloads like CB 2024.

Makes you wonder where else it could be working its magic. Phoronix needs to pit these laptops against each other in hundreds of benchmarks to give us the full picture.

This is also very impressive:

1754113058409.png
 

aigomorla

CPU, Cases&Cooling Mod PC Gaming Mod Elite Member
Super Moderator
Sep 28, 2005
21,063
3,558
126
Idle power drain was also lowered. @aigomorla, I wonder if TR 7000 benefits in this regard from latest firmware too.

If you have a thread ripper and your concerned about idle power draw... i think you chose the wrong processor. lol.
Its like me concerned my 4090 is drawing too much current just from the fans spinning on all the time and not going to sleep..
(well i dont have fans... its watercooled... but i think you get the point)

But i dont think the firmware will apply to me.
Im sort of scared of flashing my bios on my board as everything is working great.
And the last time i decided to be smart and started playing with PBO and other stuff, i had to spend about an hour resetting and debugging the bios settings that i swore i wouldnt do it again.
 
Jul 27, 2020
26,612
18,313
146
And the last time i decided to be smart and started playing with PBO and other stuff, i had to spend about an hour resetting and debugging the bios settings that i swore i wouldnt do it again.
Well, hopefully in 3 or 4 years when you think you've gotten more out of the CPU than you paid for it and then you can go ballistic on it with tweaking. And if it dies, the better the excuse to upgrade to 9960X.
 

yuri69

Senior member
Jul 16, 2013
672
1,202
136
Apart from Zen 5's already known bottlenecks as register file capacity and weird frontend overrides, it seems interesting the Op cache bandwidth not being utilized properly. Is that maximum throughput also oriented at SMT or something?

ICache size being a limiting factor can potentially see a fix by Zen 7.
 

MS_AT

Senior member
Jul 15, 2024
786
1,594
96
The most important chart from there, in my opinion, and the corresponding 285K chart
And now we will get people saying if 285K gets higher IPC than Zen5 in games, it was unjustly bashed for its gaming performance;) You could have at least added the systems specs for both tests. So the Intel system was running memory at DDR5-6000 28-36-36-96, and had E cores disabled in the BIOS. The AMD system had memory running at DDR5-5600 36-36-36-89. And the author also notes:
I’ll be using the same games as in the Lion Cove gaming article, namely Palworld, COD Cold War, and Cyberpunk 2077. However, the data is not directly comparable; I’ve built up my Palworld base since then, COD Cold War multiplayer sessions are inherently unpredictable, and Cyberpunk 2077 received an update which annoyingly forces a 60 FPS cap, regardless of VSYNC or FPS cap settings. My goal here is to look for broad trends rather than do a like-for-like performance comparison.
I think for clarity you could have mentioned these things, to avoid misunderstandings.
it seems interesting the Op cache bandwidth not being utilized properly. Is that maximum throughput also oriented at SMT or something?
The article attributes that to high ratio of branches to other instructions. Since games are low IPC workloads, I wouldn't draw conclusions about uOP cache based on them. In their other article when testing with 8B nops, single thread is able to achieve 8 inst / cycle (remember the rename is capped at 8) what would suggest it's able to draw more than 6 inst/c from the uop. The uOP has also various limitations about what each entry is able to hold, you would need to read the Software Manual from AMD for details.
 

Doug S

Diamond Member
Feb 8, 2020
3,376
5,938
136
The most important chart from there, in my opinion, and the corresponding 285K chart:

The interesting thing about those charts for me isn't the gaming stuff, but in the SPEC benchmarks where there is a big divergence between AMD and Intel. I don't know to what extent compilers are now able to generate AVX-512 instructions for which SPEC subtests, but I'm assuming that's likely to be the reason for why there are certain benchmarks where AMD dominates Intel.

So what I'm really curious about are results that go the other way, because that lacks the easy explanation - why is AMD's povray result so much worse than Intel's? Is it just a matter of Intel having internal structures like ROB or number of rename registers that's a little bigger and that's just enough to make a big difference in povray? Or maybe (though I doubt it) it is particularly sensitive to branch prediction, and Intel's branch prediction is guessing better for that benchmark? Or perhaps differences in prefetch?

You know, the kind of stuff Anandtech used to dig into back in the day and write about. Chips and Cheese digs deeper into that stuff than just about anyone else these days, but I'd love to see them do a direct comparison of AMD's and Intel's cores and dig into the places where their performance differs and the reasons why. I guess the problem is most people reading reviews just want to know which one runs games better.
 

CouncilorIrissa

Senior member
Jul 28, 2023
668
2,592
106
The interesting thing about those charts for me isn't the gaming stuff, but in the SPEC benchmarks where there is a big divergence between AMD and Intel. I don't know to what extent compilers are now able to generate AVX-512 instructions for which SPEC subtests, but I'm assuming that's likely to be the reason for why there are certain benchmarks where AMD dominates Intel.
Somewhat ironically, the int suite (at least when compiled with gcc) runs better compiled without AVX-512 support.
1754251551457.png
 

MS_AT

Senior member
Jul 15, 2024
786
1,594
96
I don't know to what extent compilers are now able to generate AVX-512 instructions for which SPEC subtests, but I'm assuming that's likely to be the reason for why there are certain benchmarks where AMD dominates Intel.
According to this https://chipsandcheese.com/p/runnin...s-and-cheese?utm_campaign=post&utm_medium=web they are using -mcpu flag, which is deprecated version of -mtune, which in theory tunes the code for the native architecture but does not enable instruction set extensions over the generic set for a given architecture (x64 in this case) https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html#index-mtune-17 In other words, I guess at most it's using SSE4 so no AVX2 or AVX512 if I got the defaults right but for sure no AVX512.

Somewhat ironically, the int suite (at least when compiled with gcc) runs better compiled without AVX-512 support.
There was a bug in GCC related to that. Not sure if he means after the bug was fixed or still refers to the bug itself.
 
  • Like
Reactions: lightmanek

fastandfurious6

Senior member
Jun 1, 2024
634
809
96
any minor intel ipc advantage on gaming gets massively overshadowed by amd X3D, 1% is most important metric, eradication of most annoying thing which is stutters/lag/freeze

gaming on x3d is always fluid it's insane, makes everything else obsolete

I think the most important chart is the first showing bound workloads

AMD has frontend bottleneck which should be fixable, looks like something they didnt have time to adjust
and minimal core bound

Intel is 40% backend bound and 20% core bound which seems like the issue is more inherent?
 

Geddagod

Golden Member
Dec 28, 2021
1,456
1,557
106
The interesting thing about those charts for me isn't the gaming stuff, but in the SPEC benchmarks where there is a big divergence between AMD and Intel. I don't know to what extent compilers are now able to generate AVX-512 instructions for which SPEC subtests, but I'm assuming that's likely to be the reason for why there are certain benchmarks where AMD dominates Intel.

So what I'm really curious about are results that go the other way, because that lacks the easy explanation - why is AMD's povray result so much worse than Intel's? Is it just a matter of Intel having internal structures like ROB or number of rename registers that's a little bigger and that's just enough to make a big difference in povray? Or maybe (though I doubt it) it is particularly sensitive to branch prediction, and Intel's branch prediction is guessing better for that benchmark? Or perhaps differences in prefetch?

You know, the kind of stuff Anandtech used to dig into back in the day and write about. Chips and Cheese digs deeper into that stuff than just about anyone else these days, but I'd love to see them do a direct comparison of AMD's and Intel's cores and dig into the places where their performance differs and the reasons why. I guess the problem is most people reading reviews just want to know which one runs games better.
It's unfortunate because a lot of their articles that look at this don't do direct comparisons. In this one it's due to updating their testing methodology, the lion cove one is because of the game selection was different than their Zen 4 article.

David Huang looked into the cause of some difference for some of the spec subtests though, in his Zen 5 article, for Zen 5 vs Zen 4. There's also numerous papers studying the different memory, branch prediction, and other various bottleneck characterizations surrounding spec2017 subtests.
As for povray specifically, both AMD and Intel have very similar, and relatively high, branch prediction accuracy, though Intel being better here than AMD in BP accuracy is against the norm.
AMD has frontend bottleneck which should be fixable, looks like something they didnt have time to adjust
and minimal core bound

Intel is 40% backend bound and 20% core bound which seems like the issue is more inherent?
I would imagine fixing a front end bottleneck is far harder than fixing a backend bottleneck.
 
  • Like
Reactions: lightmanek

Geddagod

Golden Member
Dec 28, 2021
1,456
1,557
106
The cache is helping unexpectedly in some workloads like CB 2024.

Makes you wonder where else it could be working its magic. Phoronix needs to pit these laptops against each other in hundreds of benchmarks to give us the full picture.

This is also very impressive:

View attachment 128078
TBH I don't think V-cache in laptops is all that impressive.
Checking 1080p for these laptops don't make much sense IMO. High end laptops are being equipped with 1440p screens esentially at minimum, other than 1080p ultra high refresh options which aren't usually standard. Alienware? I think does it, as well as some MSI models IIRC?
And a 10% lead in 1440p 1% lows is undoubtedly nice, but not all that game changing.
Apart from Zen 5's already known bottlenecks as register file capacity and weird frontend overrides, it seems interesting the Op cache bandwidth not being utilized properly. Is that maximum throughput also oriented at SMT or something?

ICache size being a limiting factor can potentially see a fix by Zen 7.
The Zen 5 BPU is weird. Wonder if more people will look into it.
 

511

Diamond Member
Jul 12, 2024
3,360
3,257
106
TBH I don't think V-cache in laptops is all that impressive.
Checking 1080p for these laptops don't make much sense IMO. High end laptops are being equipped with 1440p screens esentially at minimum, other than 1080p ultra high refresh options which aren't usually standard. Alienware? I think does it, as well as some MSI models IIRC?
And a 10% lead in 1440p 1% lows is undoubtedly nice, but not all that game changing.
I think we missed an.important part with V cache we can lower CPU Power so more power to GPU and if it's maxed lower total system power means less heat
 
  • Like
Reactions: lightmanek