Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 988 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Josh128

Golden Member
Oct 14, 2022
1,163
1,757
106
They will probably do both, consumer and Epyc 4005 models.
Hmm, now that you mention it, nothing about this leak says Ryzen...being AMD likes to disappoint the enthusiasts and all, I bet the dual X3D chip is merely that, an AM5 compatible EPYC SKU. Probably not going to be for consumers at all.
 

gdansk

Diamond Member
Feb 8, 2011
4,433
7,470
136
Hmm, now that you mention it...being AMD likes to disappoint the enthusiasts and all, I bet the dual X3D chip is merely that, an AM5 compatible EPYC SKU. Probably not going to be for consumers at all.
Doesn't matter. It's AM5. It would be nice to launch it as an Epyc to avoid gamers thinking they should buy them.
 
Jul 27, 2020
27,142
18,655
146
Could the additional 30W be to feed the extra V-cache die?

I'm hoping it's a newer, more refined V-cache die with slightly more bandwidth.
 

StefanR5R

Elite Member
Dec 10, 2016
6,618
10,454
136
The most important chart from there, in my opinion, and the corresponding 285K chart: [...]
For those who haven't read the article yet: As far as I have registered, it contains some good discussion of how Zen 5 and Lion Cove have similar yet different bottlenecks in game workloads, furthermore game workload traces put in contrast to SPEC traces, and also some IMO enlightening bits about games versus inter-CCX traffic. Finally somebody who, rather than merely speculating about the latter effects, did some actual performance counter based monitoring.
 

ToTTenTranz

Senior member
Feb 4, 2021
547
979
136
That 4% advantage example is a super dubious claim for the benefits of X3D in transcoding. Thats margin of error stuff. I'd bet adding a second vcache die would not add another 4%, if anything at all. Certainly nothing worth the extra $100-$200 (15%-30%) that they will surely be asking.

View attachment 128216

What about putting 2x VCache dies underneath one of the CCDs? Can't they stack?
Give one CCD access to a whopping 160MB of cache.
 

MS_AT

Senior member
Jul 15, 2024
821
1,662
96
IMO enlightening bits about games versus inter-CCX traffic. Finally somebody who, rather than merely speculating about the latter effects, did some actual performance counter based monitoring.
Advocates of dual x3D chips might not like the conclusions. On the other hand, I wonder how the author has handled the internal game settings for SMT. As the game by default, in theory, should use 1 worker per core for 2CCD chips. It should be now possible to modify that in the options, but I never really tried to see what these are really doing. Still I wonder if anything could interfere with manually setting the affinity + whatever the special chipset driver is doing. Especially that the very last plot suggest, that by default almost all the load is rather contained on one CCD, within SMT threads.
 

fastandfurious6

Senior member
Jun 1, 2024
687
868
96
what is easily observable during any bench + hwinfo is that the boost core changes every second

t1: core 13 5ghz
t2: core 6 5ghz
t3: core 7 + 10 5ghz
.
.
. etc. not efficient

the assumption is, scheduler throws too much work over random cores, waste
 

Josh128

Golden Member
Oct 14, 2022
1,163
1,757
106
what is easily observable during any bench + hwinfo is that the boost core changes every second

t1: core 13 5ghz
t2: core 6 5ghz
t3: core 7 + 10 5ghz
.
.
. etc. not efficient

the assumption is, scheduler throws too much work over random cores, waste
Thats a function of the CPU to sustain high 1t boost frequencies. The CPU "hands off" the load between preferred cores to allow cores to rest & cool while others take up the load. Max frequency is sustained much longer by doing this. The algorithm is so efficient that there is basically no perf loss from doing it. This has been the case since at least Zen 2 on the AMD side.
 

Hail The Brain Slug

Diamond Member
Oct 10, 2005
3,840
3,219
146
A few very interesting benchmarks here


Besides the obvious efficiency numbers, Halo has better absolute CPU performance in quite a few ML/AI workloads, Gromacs, Palabos, GPAW and other stuff.
I'd like to have seen the Granite Ridge parts tested in eco mode.

In the testing I've been doing against a strix halo mini PC I have, once I enable eco mode on the 9950X3D it performs similarly (they are workloads that don't benefit from strix halo's design) and draws similar power from the wall, even with the big desktop idle overhead.

I imagine outside of the workloads that benefit significantly from strix halo's unique configuration, it actually isn't significantly more efficient than granite ridge at lower power.
 

LightningZ71

Platinum Member
Mar 10, 2017
2,432
3,100
136
From what we know, it's the same cores and the same node. There shouldn't be much of a difference in non bandwidth limited scenarios.
 

Hail The Brain Slug

Diamond Member
Oct 10, 2005
3,840
3,219
146
From what we know, it's the same cores and the same node. There shouldn't be much of a difference in non bandwidth limited scenarios.
All the proclamations of a big efficiency advantage really seem like, for a lot of cases, it may just boil down to comparing something pushed way out of the pocket of the V/F curve against the same thing right in the pocket.
 

MS_AT

Senior member
Jul 15, 2024
821
1,662
96
draws similar power from the wall, even with the big desktop idle overhead.
Do you mean that, at idle, the Strix Halo draws a similar amount of power from the wall as a desktop part despite the SoC reporting lower power consumption? Or are you saying that if you enable Eco Mode on the 9950X3D, it will draw a similar amount of power under load as the Strix Halo does under load?
 
  • Like
Reactions: Thibsie

Hail The Brain Slug

Diamond Member
Oct 10, 2005
3,840
3,219
146
Do you mean that, at idle, the Strix Halo draws a similar amount of power from the wall as a desktop part despite the SoC reporting lower power consumption? Or are you saying that if you enable Eco Mode on the 9950X3D, it will draw a similar amount of power under load as the Strix Halo does under load?
I have seen workloads where my 9950X3D with 88W PPT draws 190 watts at the wall and completes work slightly faster than strix halo in 120W mode, also drawing 190 watts at the wall.

The power per core in this specific case was only 3.1 watts for the 9950X3D, but 6.5 watts on Strix Halo. I assume it's at a disadvantage from the memory latency, unless something else is going on with it.

9950X3D was sustaining ~3.8GHz while the Strix Halo sustained ~4.3 GHz. It's memory intensive, but not bandwidth intensive, hence the thinking maybe the incredibly high memory latency from the LPDDR5X is at fault.

Edit: The workload was compiling unreal engine. It doesn't benefit very much, maybe 3.25% from V$ on the 9950X3D. Nothing significant. Other related workloads varied back and forth a bit, but in my line of work there was nowhere I found a compelling performance or efficiency advantage for Strix Halo.
 
Last edited:
  • Like
Reactions: igor_kavinski

Geddagod

Golden Member
Dec 28, 2021
1,507
1,604
106
Has anyone checked Zen 5 L3 latency vs Zen 4 btw? I remember AMD saying in some slide that it stayed the same or improved, however I remember some aida64 pic showing that it got worse.
Looking at the impact of switching to mesh vs ring is bound to be some what interesting for Zen 5.
There should also be a way to compare power between ring and mesh too- perhaps by subtracting CCD power vs core only power? IIRC AMD does have software power reporting for both of those metrics.
 

Geddagod

Golden Member
Dec 28, 2021
1,507
1,604
106
47 cycles (i.e. it's the same).
does the mesh run at the same frequency as the ring in previous gen? The ring boosting algo for desktop is IIRC ring frequency = frequency of highest boosting core, and in mobile it's some specific ratio of the highest boosting core?
 

Hail The Brain Slug

Diamond Member
Oct 10, 2005
3,840
3,219
146

Good review. The Framework included cooler isn’t that good other than that looks good.
Kinda weird, my mini pc with the same cooler that most of the other ones have never throttles. It looked like frameworks heatsink was going to be better than that cooler that's making the rounds in the other designs.
 
Jul 27, 2020
27,142
18,655
146
maybe the incredibly high memory latency from the LPDDR5X is at fault.
This is a fatal flaw with LPDDR5X. Sure, when on battery, it should run as-is but when on AC power, it should switch to low latency power burning mode. Always conserving energy was just a bad engineering decision. It badly needs a turbo mode.