Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 13 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

moinmoin

Diamond Member
Jun 1, 2017
5,240
8,454
136
BTW shouldn't Strix Point and/or RDNA3+ get their own thread(s)?
Strix Point would be the successor to the thread for Phoenix Point, so yes.

RDNA3+ probably depends on how it will be handled officially (to make matters more complicated we already have a thread for RDNA4, which currently sees activity for CDNA3...).
 
  • Like
Reactions: Kaluan

Exist50

Platinum Member
Aug 18, 2016
2,452
3,106
136
So with the Dual CCD Zen 4 X3D parts being wonky designs what are the chances AMD will do 8c Zen 5 with v-cache and 16c Zen 5c without v-cache as their top tier 8950X(3D?) part to help get back the undisputed multithreaded crown.
I think something like that is all but inevitable. Realistically, the multi-die chips exist for productivity use cases and benchmark victory, and as we've seen with Alder/Raptor Lake, those have no qualms with a hybrid core arrangement.
 

CakeMonster

Golden Member
Nov 22, 2012
1,629
808
136
Sounds to me like the existing schedulers would have an easier time with that kind of Z5 combination compared to the recent Z4 X3D combination that doesn't fit into a big/little model.
 

deasd

Senior member
Dec 31, 2013
602
1,033
136
So with the Dual CCD Zen 4 X3D parts being wonky designs what are the chances AMD will do 8c Zen 5 with v-cache and 16c Zen 5c without v-cache as their top tier 8950X(3D?) part to help get back the undisputed multithreaded crown.
I think something like that is all but inevitable. Realistically, the multi-die chips exist for productivity use cases and benchmark victory, and as we've seen with Alder/Raptor Lake, those have no qualms with a hybrid core arrangement.
Sounds to me like the existing schedulers would have an easier time with that kind of Z5 combination compared to the recent Z4 X3D combination that doesn't fit into a big/little model.

AMD said it would also make use of higher frequency non-CCD cores in gaming. Why this make sense is Intel still deplete all resources to maintain lead in gaming where AMD has pressure to catch up.

But the cloud based and density optimized ZenXC core for AMD in desktop still makes no sense since AMD still has nice MT performance and efficiency compare to Intel. Especially Intel still stuck at 16 E cores in the foreseeable future. Not to mentioned AMD still as Workstation series which is easier spamming cores.

It seems to me like AMD go different way to improve user(gamer) experience rather than spamming little cores. This just make ZenXC cores less possible to be implemented in DT since these low frequency cores makes no sense in gaming.

The last thing is there is still doubt whether the IOdie could handle more than 16 cores or not.

Also CakeMonster is right, the thread detector inside Windows11 doesn't work with X3D concept. But I think gamers would be clever enough using something like Process Lasso.
 
Last edited:
  • Like
Reactions: Kaluan

Timorous

Golden Member
Oct 27, 2008
1,976
3,861
136
AMD said it would also make use of higher frequency non-CCD cores in gaming. Why this make sense is Intel still deplete all resources to maintain lead in gaming where AMD has pressure to catch up.

But the cloud based and density optimized ZenXC core for AMD in desktop still makes no sense since AMD still has nice MT performance and efficiency compare to Intel. Especially Intel still stuck at 16 E cores in the foreseeable future. Not to mentioned AMD still as Workstation series which is easier spamming cores.

It seems to me like AMD go different way to improve user(gamer) experience rather than spamming little cores. This just make ZenXC cores less possible to be implemented in DT since these low frequency cores makes no sense in gaming.

The last thing is there is still doubt whether the IOdie could handle more than 16 cores or not.

Also CakeMonster is right, the thread detector inside Windows11 doesn't work with X3D concept. But I think gamers would be clever enough using something like Process Lasso.

The thing about the Zen4c CCD is that each CCX looks like Phoenix at a high level and what is the APU but a density and power optimised die? They are not exactly low frequency either are they.

If you combine that with stacking the cache at the bottom (or having the entire IO die be the bottom layer with extra L3 included) like AMD have done with MI300 then that can help with the thermals and voltage meaning AMD can have their cake and eat it too and that would ease up the scheduler issues.

So Zen5 could be 2 CCDs stacked on an io/cache die. AMD have so many packaging options here that there are a lot of ways they can go with it.
 
  • Like
Reactions: Joe NYC

lopri

Elite Member
Jul 27, 2002
13,312
687
126
Hope AMD price Zen4X3D right. Sometimes it is a good idea to go for the market share than a quarterly profit.
 
  • Like
Reactions: Kaluan

Bigos

Senior member
Jun 2, 2019
203
518
136
The thing about the Zen4c CCD is that each CCX looks like Phoenix at a high level and what is the APU but a density and power optimised die? They are not exactly low frequency either are they.

There is no way APU is density optimized with such high clocks. I expect at least -25% lower clocks on Zen4c compared to Zen4.
 
  • Like
Reactions: Exist50

Anhiel

Member
May 12, 2022
81
34
61
There is no way APU is density optimized with such high clocks. I expect at least -25% lower clocks on Zen4c compared to Zen4.

AMD already said or hinted there would be no difference in either performance. Before RPL things were different with less cache but it seems the decision was reversed. So now they have the same performance at lower power.

But the cloud based and density optimized ZenXC core for AMD in desktop still makes no sense since AMD still has nice MT performance and efficiency compare to Intel. Especially Intel still stuck at 16 E cores in the foreseeable future. Not to mentioned AMD still as Workstation series which is easier spamming cores.

There is a good reason to have ZenXc. Currently, Intel's E-core affords them lower power consumption for lower TPD parts. AMD's full cores are less efficient at here. So having a mix of CCD with full cores and cloud cores would serve well for mobile.

Also Zen3 and Zen4 has a memory bandwidth problem they are too fast and starve out. You can see that in the benchmarks for Zen4 and this recent article: https://www.anandtech.com/show/17641/lighter-touch-cpu-power-scaling-13900k-7950x
Despite increasing power the performance drops of as a cliff for the higher power increases. It's because the cores starve so putting in more power won't lead to more work done. This might also be the reason why there's no point for Zen4X3D to have more than one SRAM die (aside from games not using more cores).

Infinity fabric bandwidth needs to double to better utilize 3D V-cache or double DRAM memory channels to 4. I double consumer CPUs will ever get 4 memory channels, though. So better PCIe is the only solution which is why we'll see a quick change up to PCIe 7 in coming years. Welp, I've been saying and predicting all these for the past 2-3 years... so nothing is surprising really.
 

Khanan

Senior member
Aug 27, 2017
203
91
111
Should have either

- big little but then avx512 is a question mark like with intel
- or 24 cores via additional ccx (unlikely) or increase of ccx by 50% (somewhat unlikely due to ring bus problems as well)
- always higher IPC
- more features possibly
- higher clocks or average clocks
- better efficiency of course

one of those features could be a dedicated AI Accelerator like with Ryzen7000 mobile which was just presented, called Ryzen AI.
 

Exist50

Platinum Member
Aug 18, 2016
2,452
3,106
136
But the cloud based and density optimized ZenXC core for AMD in desktop still makes no sense since AMD still has nice MT performance and efficiency compare to Intel. Especially Intel still stuck at 16 E cores in the foreseeable future. Not to mentioned AMD still as Workstation series which is easier spamming cores.
AMD and Intel are rather close in MT, and that's with a node deficit for Intel. What happens with Arrow Lake when they should be comparable, if not Intel having an edge? I think it makes tons of sense for AMD to plan for such a scenario. And why wouldn't they? It's basically free extra performance. Extra MT margin certainly wouldn't hurt to have.
AMD already said or hinted there would be no difference in either performance.
AMD never said Zen 4c would have the same performance. It's the same architecture / IPC, but they've surely taken a hit to clock speeds.
Also Zen3 and Zen4 has a memory bandwidth problem they are too fast and starve out. You can see that in the benchmarks for Zen4 and this recent article:
Is that a memory bottleneck, or a power scaling limitation? Seems more like the latter.
 

Hitman928

Diamond Member
Apr 15, 2012
6,692
12,350
136
AMD and Intel are rather close in MT, and that's with a node deficit for Intel. What happens with Arrow Lake when they should be comparable, if not Intel having an edge? I think it makes tons of sense for AMD to plan for such a scenario. And why wouldn't they? It's basically free extra performance. Extra MT margin certainly wouldn't hurt to have.

AMD never said Zen 4c would have the same performance. It's the same architecture / IPC, but they've surely taken a hit to clock speeds.

Is that a memory bottleneck, or a power scaling limitation? Seems more like the latter.

Power scaling limitation. Just look at the Cinebench results. If it were due to lack of memory bandwidth, Cinebench would stick out as a large exception as it doesn't care about memory bandwidth, all modern CPUs have plenty of on die cache to make the memory bandwidth meaningless in that test. However, the Cinebench results show the same scaling behavior. Zen4 just can't scale frequency effectively with higher power past ~125 W - 145 W or so.
 

Anhiel

Member
May 12, 2022
81
34
61
AMD never said Zen 4c would have the same performance. It's the same architecture / IPC, but they've surely taken a hit to clock speeds.

Is that a memory bottleneck, or a power scaling limitation? Seems more like the latter.

What I'm saying unless it's for servers there's no reason to lower clockspeed for consumer parts, hence, making them 1:1 replaceable for mobile.

Power scaling limitation. Just look at the Cinebench results. If it were due to lack of memory bandwidth, Cinebench would stick out as a large exception as it doesn't care about memory bandwidth, all modern CPUs have plenty of on die cache to make the memory bandwidth meaningless in that test. However, the Cinebench results show the same scaling behavior. Zen4 just can't scale frequency effectively with higher power past ~125 W - 145 W or so.

The prove is simple by looking at the power consumption and raw performance between 8-core and 16-core versions.
Igor's lab has them ready for comparison:

If you look at the power draw regardless of screen size the ratio between 8-core and 16-core remain at 0.694... only only differ by sub point when reverse calculating for the raw watt numbers. Therefore, the cores are powered by the same amount.

But if you look at the performance (AutoCAD 2021) the ratio is 77.4/92.9=0.833... there's over 10% loss.
 

Khanan

Senior member
Aug 27, 2017
203
91
111
Zen4C is a smaller version with less dark silicon it means it will have clearly lower clocks.
 

Hitman928

Diamond Member
Apr 15, 2012
6,692
12,350
136
What I'm saying unless it's for servers there's no reason to lower clockspeed for consumer parts, hence, making them 1:1 replaceable for mobile.



The prove is simple by looking at the power consumption and raw performance between 8-core and 16-core versions.
Igor's lab has them ready for comparison:

If you look at the power draw regardless of screen size the ratio between 8-core and 16-core remain at 0.694... only only differ by sub point when reverse calculating for the raw watt numbers. Therefore, the cores are powered by the same amount.

But if you look at the performance (AutoCAD 2021) the ratio is 77.4/92.9=0.833... there's over 10% loss.

Autocad only uses a few cores and performance is dependent on IPC and boost clocks which is why there is such a small gap between the 7950x and 7700x. It has nothing to do with memory bandwidth.
 
Last edited:
  • Like
Reactions: Thibsie

Timorous

Golden Member
Oct 27, 2008
1,976
3,861
136
There is no way APU is density optimized with such high clocks. I expect at least -25% lower clocks on Zen4c compared to Zen4.

The APU is 140M xtors per mm despite having logic in the core.

Zen4 is about 45mm cores+links and 25mm L3 cache at 94M xtors per mm. That means 16 cores + 32MB cache should be around 11M transistors. At 140M per mm that would be a CCD that is ~79mm², smaller than zen3.

So at APU density 16c and 32MB cache seems to fit in a typical CCD sized package. I see no reason aside from socket TDP limits (which will impact base and all core clock) why Zen4c necessarily has low clocks, lower sure but still capable of pretty high clocks.
 
  • Love
Reactions: moinmoin

Exist50

Platinum Member
Aug 18, 2016
2,452
3,106
136
What I'm saying unless it's for servers there's no reason to lower clockspeed for consumer parts, hence, making them 1:1 replaceable for mobile.
Zen 4c wouldn't have lower clocks just because. It's the only reasonable tradeoff to explain how they got half the area with the same uarch.
 

Anhiel

Member
May 12, 2022
81
34
61
Zen4C is a smaller version with less dark silicon it means it will have clearly lower clocks.
Zen 4c wouldn't have lower clocks just because. It's the only reasonable tradeoff to explain how they got half the area with the same uarch.

Before Zen4 was revealed we assumed the cloud variant was shrunk with regard to the N7 to N5 transition having 1.8x density. So squeezing 16-cores into the same space seemed do-able.
Now that the cloud variant is expected to be a N4 shrink...
N4 to N5 gives 1.3x density. So ~70% need to be design reduction. Cache is certainly one which also gives a lot of area. Just halving L3 frees nearly 25% real estate on the CCD and roughly worth 4 full cores. So that's 50% taken care of without design change. Then since there's CDNA there's no need for AVX512, althouth, it's not much ~5% maybe. So we are left with ~15%.
Doesn't seem a lot.

Autocad only uses a few cores and performance is dependent on IPC and boost clocks which is why there is such a small gap between the 7950x and 7700x. It has nothing to do with memory bandwidth.

Limited cores might be a problem here but everything else you listed would be ideal. Anyhow, regardless of benchmark or app as long as the tests are consistent within the benchmark the difference it's all a matter of a ratio.
 

BorisTheBlade82

Senior member
May 1, 2020
703
1,122
136
Oh, and while we are at it: We already have proof that Bergamo supports AVX512.
 

Kaluan

Senior member
Jan 4, 2022
507
1,074
106
The biggest mystery of Zen 5 so far imo
We'll probably get hints once Bergamo successor and Strix Point info starts dropping.

Knowing Turin's core count would also give insight on Granite Ridge. At least for the regular Zen5 cores.
 
  • Like
Reactions: Geddagod

Geddagod

Golden Member
Dec 28, 2021
1,513
1,613
106
The APU is 140M xtors per mm despite having logic in the core.

Zen4 is about 45mm cores+links and 25mm L3 cache at 94M xtors per mm. That means 16 cores + 32MB cache should be around 11M transistors. At 140M per mm that would be a CCD that is ~79mm², smaller than zen3.

So at APU density 16c and 32MB cache seems to fit in a typical CCD sized package. I see no reason aside from socket TDP limits (which will impact base and all core clock) why Zen4c necessarily has low clocks, lower sure but still capable of pretty high clocks.
I'm pretty sure the APU has a way higher logic/cache ratio than what a Zen 4D CCX should have... Idk if it's really comparable in that way.
 

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
@Anhiel
Unless stated officially otherwise, I fully believe in AVX512 support with Zen4c. We have AMD on record regarding identical ISA support.
I support the theory that it will not clock that high because it is density and efficiency optimized.
I posted proof of that Yesterday...!

1673218093385.png

1673218102529.png