Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

moinmoin · Jan 6, 2023

Kaluan said:
BTW shouldn't Strix Point and/or RDNA3+ get their own thread(s)?

Strix Point would be the successor to the thread for Phoenix Point, so yes.

RDNA3+ probably depends on how it will be handled officially (to make matters more complicated we already have a thread for RDNA4, which currently sees activity for CDNA3...).

Exist50 · Jan 6, 2023

Timorous said:
So with the Dual CCD Zen 4 X3D parts being wonky designs what are the chances AMD will do 8c Zen 5 with v-cache and 16c Zen 5c without v-cache as their top tier 8950X(3D?) part to help get back the undisputed multithreaded crown.

I think something like that is all but inevitable. Realistically, the multi-die chips exist for productivity use cases and benchmark victory, and as we've seen with Alder/Raptor Lake, those have no qualms with a hybrid core arrangement.

CakeMonster · Jan 6, 2023

Sounds to me like the existing schedulers would have an easier time with that kind of Z5 combination compared to the recent Z4 X3D combination that doesn't fit into a big/little model.

deasd · Jan 7, 2023

Timorous said:
So with the Dual CCD Zen 4 X3D parts being wonky designs what are the chances AMD will do 8c Zen 5 with v-cache and 16c Zen 5c without v-cache as their top tier 8950X(3D?) part to help get back the undisputed multithreaded crown.

Exist50 said:
I think something like that is all but inevitable. Realistically, the multi-die chips exist for productivity use cases and benchmark victory, and as we've seen with Alder/Raptor Lake, those have no qualms with a hybrid core arrangement.

CakeMonster said:
Sounds to me like the existing schedulers would have an easier time with that kind of Z5 combination compared to the recent Z4 X3D combination that doesn't fit into a big/little model.

AMD said it would also make use of higher frequency non-CCD cores in gaming. Why this make sense is Intel still deplete all resources to maintain lead in gaming where AMD has pressure to catch up.

But the cloud based and density optimized ZenXC core for AMD in desktop still makes no sense since AMD still has nice MT performance and efficiency compare to Intel. Especially Intel still stuck at 16 E cores in the foreseeable future. Not to mentioned AMD still as Workstation series which is easier spamming cores.

It seems to me like AMD go different way to improve user(gamer) experience rather than spamming little cores. This just make ZenXC cores less possible to be implemented in DT since these low frequency cores makes no sense in gaming.

The last thing is there is still doubt whether the IOdie could handle more than 16 cores or not.

Also CakeMonster is right, the thread detector inside Windows11 doesn't work with X3D concept. But I think gamers would be clever enough using something like Process Lasso.

Timorous · Jan 7, 2023

deasd said:
AMD said it would also make use of higher frequency non-CCD cores in gaming. Why this make sense is Intel still deplete all resources to maintain lead in gaming where AMD has pressure to catch up.

But the cloud based and density optimized ZenXC core for AMD in desktop still makes no sense since AMD still has nice MT performance and efficiency compare to Intel. Especially Intel still stuck at 16 E cores in the foreseeable future. Not to mentioned AMD still as Workstation series which is easier spamming cores.

It seems to me like AMD go different way to improve user(gamer) experience rather than spamming little cores. This just make ZenXC cores less possible to be implemented in DT since these low frequency cores makes no sense in gaming.

The last thing is there is still doubt whether the IOdie could handle more than 16 cores or not.

Also CakeMonster is right, the thread detector inside Windows11 doesn't work with X3D concept. But I think gamers would be clever enough using something like Process Lasso.

The thing about the Zen4c CCD is that each CCX looks like Phoenix at a high level and what is the APU but a density and power optimised die? They are not exactly low frequency either are they.

If you combine that with stacking the cache at the bottom (or having the entire IO die be the bottom layer with extra L3 included) like AMD have done with MI300 then that can help with the thermals and voltage meaning AMD can have their cake and eat it too and that would ease up the scheduler issues.

So Zen5 could be 2 CCDs stacked on an io/cache die. AMD have so many packaging options here that there are a lot of ways they can go with it.

lopri · Jan 7, 2023

Hope AMD price Zen4X3D right. Sometimes it is a good idea to go for the market share than a quarterly profit.

Bigos · Jan 7, 2023

Timorous said:
The thing about the Zen4c CCD is that each CCX looks like Phoenix at a high level and what is the APU but a density and power optimised die? They are not exactly low frequency either are they.

There is no way APU is density optimized with such high clocks. I expect at least -25% lower clocks on Zen4c compared to Zen4.

Anhiel · Jan 7, 2023

Bigos said:
There is no way APU is density optimized with such high clocks. I expect at least -25% lower clocks on Zen4c compared to Zen4.

AMD already said or hinted there would be no difference in either performance. Before RPL things were different with less cache but it seems the decision was reversed. So now they have the same performance at lower power.

deasd said:
But the cloud based and density optimized ZenXC core for AMD in desktop still makes no sense since AMD still has nice MT performance and efficiency compare to Intel. Especially Intel still stuck at 16 E cores in the foreseeable future. Not to mentioned AMD still as Workstation series which is easier spamming cores.

There is a good reason to have ZenXc. Currently, Intel's E-core affords them lower power consumption for lower TPD parts. AMD's full cores are less efficient at here. So having a mix of CCD with full cores and cloud cores would serve well for mobile.

Also Zen3 and Zen4 has a memory bandwidth problem they are too fast and starve out. You can see that in the benchmarks for Zen4 and this recent article: https://www.anandtech.com/show/17641/lighter-touch-cpu-power-scaling-13900k-7950x
Despite increasing power the performance drops of as a cliff for the higher power increases. It's because the cores starve so putting in more power won't lead to more work done. This might also be the reason why there's no point for Zen4X3D to have more than one SRAM die (aside from games not using more cores).

Infinity fabric bandwidth needs to double to better utilize 3D V-cache or double DRAM memory channels to 4. I double consumer CPUs will ever get 4 memory channels, though. So better PCIe is the only solution which is why we'll see a quick change up to PCIe 7 in coming years. Welp, I've been saying and predicting all these for the past 2-3 years... so nothing is surprising really.

Khanan · Jan 7, 2023

Should have either

- big little but then avx512 is a question mark like with intel
- or 24 cores via additional ccx (unlikely) or increase of ccx by 50% (somewhat unlikely due to ring bus problems as well)
- always higher IPC
- more features possibly
- higher clocks or average clocks
- better efficiency of course

one of those features could be a dedicated AI Accelerator like with Ryzen7000 mobile which was just presented, called Ryzen AI.

Exist50 · Jan 7, 2023

deasd said:
But the cloud based and density optimized ZenXC core for AMD in desktop still makes no sense since AMD still has nice MT performance and efficiency compare to Intel. Especially Intel still stuck at 16 E cores in the foreseeable future. Not to mentioned AMD still as Workstation series which is easier spamming cores.

AMD and Intel are rather close in MT, and that's with a node deficit for Intel. What happens with Arrow Lake when they should be comparable, if not Intel having an edge? I think it makes tons of sense for AMD to plan for such a scenario. And why wouldn't they? It's basically free extra performance. Extra MT margin certainly wouldn't hurt to have.

Anhiel said:
AMD already said or hinted there would be no difference in either performance.

AMD never said Zen 4c would have the same performance. It's the same architecture / IPC, but they've surely taken a hit to clock speeds.

Anhiel said:
Also Zen3 and Zen4 has a memory bandwidth problem they are too fast and starve out. You can see that in the benchmarks for Zen4 and this recent article:

Is that a memory bottleneck, or a power scaling limitation? Seems more like the latter.

Hitman928 · Jan 7, 2023

Exist50 said:
AMD and Intel are rather close in MT, and that's with a node deficit for Intel. What happens with Arrow Lake when they should be comparable, if not Intel having an edge? I think it makes tons of sense for AMD to plan for such a scenario. And why wouldn't they? It's basically free extra performance. Extra MT margin certainly wouldn't hurt to have.

AMD never said Zen 4c would have the same performance. It's the same architecture / IPC, but they've surely taken a hit to clock speeds.

Is that a memory bottleneck, or a power scaling limitation? Seems more like the latter.

Power scaling limitation. Just look at the Cinebench results. If it were due to lack of memory bandwidth, Cinebench would stick out as a large exception as it doesn't care about memory bandwidth, all modern CPUs have plenty of on die cache to make the memory bandwidth meaningless in that test. However, the Cinebench results show the same scaling behavior. Zen4 just can't scale frequency effectively with higher power past ~125 W - 145 W or so.

Anhiel · Jan 7, 2023

Exist50 said:
AMD never said Zen 4c would have the same performance. It's the same architecture / IPC, but they've surely taken a hit to clock speeds.

Is that a memory bottleneck, or a power scaling limitation? Seems more like the latter.

What I'm saying unless it's for servers there's no reason to lower clockspeed for consumer parts, hence, making them 1:1 replaceable for mobile.

Hitman928 said:
Power scaling limitation. Just look at the Cinebench results. If it were due to lack of memory bandwidth, Cinebench would stick out as a large exception as it doesn't care about memory bandwidth, all modern CPUs have plenty of on die cache to make the memory bandwidth meaningless in that test. However, the Cinebench results show the same scaling behavior. Zen4 just can't scale frequency effectively with higher power past ~125 W - 145 W or so.

The prove is simple by looking at the power consumption and raw performance between 8-core and 16-core versions.
Igor's lab has them ready for comparison:

AMD Ryzen 9 7950X and Ryzen 7 7700X Review with gaming and workstation benchmarks- A new era begins with Zen 4 an the new Socket AM5 | Page 10 | igor´sLAB

AMD let us release tests of the four new Zen 4 CPUs today in the form of the Ryzen 9 7950X, Ryzen 9 7900X, Ryzen 7 7700X and Ryzen 5 7600X. We have already reported about it for a long time…

www.igorslab.de

If you look at the power draw regardless of screen size the ratio between 8-core and 16-core remain at 0.694... only only differ by sub point when reverse calculating for the raw watt numbers. Therefore, the cores are powered by the same amount.

But if you look at the performance (AutoCAD 2021) the ratio is 77.4/92.9=0.833... there's over 10% loss.

Khanan · Jan 7, 2023

Zen4C is a smaller version with less dark silicon it means it will have clearly lower clocks.

Hitman928 · Jan 7, 2023

Anhiel said:
What I'm saying unless it's for servers there's no reason to lower clockspeed for consumer parts, hence, making them 1:1 replaceable for mobile.

The prove is simple by looking at the power consumption and raw performance between 8-core and 16-core versions.
Igor's lab has them ready for comparison:

AMD Ryzen 9 7950X and Ryzen 7 7700X Review with gaming and workstation benchmarks- A new era begins with Zen 4 an the new Socket AM5 | Page 10 | igor´sLAB

AMD let us release tests of the four new Zen 4 CPUs today in the form of the Ryzen 9 7950X, Ryzen 9 7900X, Ryzen 7 7700X and Ryzen 5 7600X. We have already reported about it for a long time…

www.igorslab.de

If you look at the power draw regardless of screen size the ratio between 8-core and 16-core remain at 0.694... only only differ by sub point when reverse calculating for the raw watt numbers. Therefore, the cores are powered by the same amount.

But if you look at the performance (AutoCAD 2021) the ratio is 77.4/92.9=0.833... there's over 10% loss.

Autocad only uses a few cores and performance is dependent on IPC and boost clocks which is why there is such a small gap between the 7950x and 7700x. It has nothing to do with memory bandwidth.

Timorous · Jan 7, 2023

Bigos said:
There is no way APU is density optimized with such high clocks. I expect at least -25% lower clocks on Zen4c compared to Zen4.

The APU is 140M xtors per mm despite having logic in the core.

Zen4 is about 45mm cores+links and 25mm L3 cache at 94M xtors per mm. That means 16 cores + 32MB cache should be around 11M transistors. At 140M per mm that would be a CCD that is ~79mm², smaller than zen3.

So at APU density 16c and 32MB cache seems to fit in a typical CCD sized package. I see no reason aside from socket TDP limits (which will impact base and all core clock) why Zen4c necessarily has low clocks, lower sure but still capable of pretty high clocks.

Exist50 · Jan 7, 2023

Anhiel said:
What I'm saying unless it's for servers there's no reason to lower clockspeed for consumer parts, hence, making them 1:1 replaceable for mobile.

Zen 4c wouldn't have lower clocks just because. It's the only reasonable tradeoff to explain how they got half the area with the same uarch.

Anhiel · Jan 8, 2023

Khanan said:
Zen4C is a smaller version with less dark silicon it means it will have clearly lower clocks.

Exist50 said:
Zen 4c wouldn't have lower clocks just because. It's the only reasonable tradeoff to explain how they got half the area with the same uarch.

Before Zen4 was revealed we assumed the cloud variant was shrunk with regard to the N7 to N5 transition having 1.8x density. So squeezing 16-cores into the same space seemed do-able.
Now that the cloud variant is expected to be a N4 shrink...
N4 to N5 gives 1.3x density. So ~70% need to be design reduction. Cache is certainly one which also gives a lot of area. Just halving L3 frees nearly 25% real estate on the CCD and roughly worth 4 full cores. So that's 50% taken care of without design change. Then since there's CDNA there's no need for AVX512, althouth, it's not much ~5% maybe. So we are left with ~15%.
Doesn't seem a lot.

Hitman928 said:
Autocad only uses a few cores and performance is dependent on IPC and boost clocks which is why there is such a small gap between the 7950x and 7700x. It has nothing to do with memory bandwidth.

Limited cores might be a problem here but everything else you listed would be ideal. Anyhow, regardless of benchmark or app as long as the tests are consistent within the benchmark the difference it's all a matter of a ratio.

BorisTheBlade82 · Jan 8, 2023

@Anhiel
Unless stated officially otherwise, I fully believe in AVX512 support with Zen4c. We have AMD on record regarding identical ISA support.
I support the theory that it will not clock that high because it is density and efficiency optimized.

BorisTheBlade82 · Jan 8, 2023

Oh, and while we are at it: We already have proof that Bergamo supports AVX512.

128 Core AMD Epyc Bergamo Process Spotted: 3.1GHz Boost Clock and a TDP of 360W? | Hardware Times

AMD’s cloud-oriented Epyc Bergamo processors are planned for an early 2023 launch. Featuring the stripped-down variant of the Zen 4 core (Zen 4c), they’ll offer up to 128 cores per socket for compute-intensive cloud servers. The emergence of Arm-based designs such as Graviton and Ampere nudged...

www.hardwaretimes.com

Khanan · Jan 8, 2023

Anhiel said:
N4 to N5 gives 1.3x density.

According to Wikipedia (5nm article) it’s barely better, and nowhere near 30%.

https://en.m.wikipedia.org/wiki/5_nm_process

Geddagod · Jan 8, 2023

uzzi38 said:
They did not specify what Zen 5 designs will utilise N3 at all.

The biggest mystery of Zen 5 so far imo

Kaluan · Jan 8, 2023

Geddagod said:
The biggest mystery of Zen 5 so far imo

We'll probably get hints once Bergamo successor and Strix Point info starts dropping.

Knowing Turin's core count would also give insight on Granite Ridge. At least for the regular Zen5 cores.

Geddagod · Jan 8, 2023

Timorous said:
The APU is 140M xtors per mm despite having logic in the core.

Zen4 is about 45mm cores+links and 25mm L3 cache at 94M xtors per mm. That means 16 cores + 32MB cache should be around 11M transistors. At 140M per mm that would be a CCD that is ~79mm², smaller than zen3.

So at APU density 16c and 32MB cache seems to fit in a typical CCD sized package. I see no reason aside from socket TDP limits (which will impact base and all core clock) why Zen4c necessarily has low clocks, lower sure but still capable of pretty high clocks.

I'm pretty sure the APU has a way higher logic/cache ratio than what a Zen 4D CCX should have... Idk if it's really comparable in that way.

Timorous · Jan 8, 2023

Geddagod said:
I'm pretty sure the APU has a way higher logic/cache ratio than what a Zen 4D CCX should have... Idk if it's really comparable in that way.

Not entirely but it should serve as an upper bound for Zen4C die size.

nicalandia · Jan 8, 2023

BorisTheBlade82 said:
@Anhiel
Unless stated officially otherwise, I fully believe in AVX512 support with Zen4c. We have AMD on record regarding identical ISA support.
I support the theory that it will not clock that high because it is density and efficiency optimized.

I posted proof of that Yesterday...!

Page 468 - Discussion - Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 468 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Diamond Member

Platinum Member

Golden Member

Senior member

Golden Member

Elite Member

Senior member

Member

Senior member

Platinum Member

Diamond Member

Member

Senior member

Diamond Member

Golden Member

Platinum Member

Member

Senior member

Senior member

Senior member

Golden Member

Senior member

Golden Member

Golden Member

Diamond Member