Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

Timorous · Jan 9, 2023

Exist50 said:
That's the claim AMD themselves made. Probably talking about core + L2.

Again, where are all of these numbers you keep throwing out coming from? No, Zen 4 in Phoenix is basically identical to Zen 4 elsewhere, but all current indications.

The APU is 25B xtors in 176mm² which is about 140M xtors per mm² despite all the IO in a monolithic design. Compared to the 94M Xtors per mm² of the normal Zen4 CCD it represents a 49% increase in density.

So ~11B transistors at 140M xtors per mm² is 79mm² for a 16c 32MB L3 CCD.

None of that is exact of course. Zen4c may be more dense due to less IO vs the APU. It may have slightly different transistor counts to standard Zen 4 so the die size will vary but what 79mm² does tell you is that it is small enough to make 8 CCDs work on Bergamo which seems to have 2 IO dies (probably 2 of the ones used for Siena which would be typical AMD to so that).

My question remains. What is the business case for AMD to design yet another 4N Zen4 CCX at > 140M xtors mm² using a different library. Why do that work twice to not fit more cores in?

nicalandia · Jan 9, 2023

Timorous said:
My question remains. What is the business case for AMD to design yet another 4N Zen4 CCX at > 140M xtors mm² using a different library. Why do that work twice to not fit more cores in?

A single Bergamo CPU is worth $15,000 CPU(selling for $22,000 right now in China), so as you can see they will be exceedingly profitable, enough to warrant a different design.

Timorous · Jan 9, 2023

nicalandia said:
A single Bergamo CPU is worth $15,000 CPU(selling for $22,000 right now in China), so as you can see they will be exceedingly profitable, enough to warrant a different design.

Sure and a CCD designed around 2 Phoenix CCXs will do the job nicely. Why spend 10s of millions designing yet another small variation of that design? It is not a semi custom gig where AMD get paid to do this

nicalandia · Jan 9, 2023

Timorous said:
Sure and a CCD designed around 2 Phoenix CCXs will do the job nicely.

There is no enough space on the package to fit 16 of those on the CPU.

Khanan · Jan 9, 2023

nicalandia said:
A single Bergamo CPU is worth $15,000 CPU(selling for $22,000 right now in China), so as you can see they will be exceedingly profitable, enough to warrant a different design.

Question is also how much CDNA3 will sell for, that 150B transistor APU

Exist50 · Jan 9, 2023

Timorous said:
The APU is 25B xtors in 176mm² which is about 140M xtors per mm² despite all the IO in a monolithic design. Compared to the 94M Xtors per mm² of the normal Zen4 CCD it represents a 49% increase in density.

You're comparing apples and oranges. The density of a high speed, cache-heavy compute die will be vastly different from an SoC with a GPU and everything else included. I'm sure if you actually look at the Zen 4 cluster in isolation, the density will be very similar to the desktop Zen 4.

trexfromouterspace · Jan 10, 2023

Khanan said:
Question is also how much CDNA3 will sell for, that 150B transistor APU

Yes. It'll sell for Yes.

Khanan · Jan 10, 2023

trexfromouterspace said:
Yes. It'll sell for Yes.

I’m gonna go on a limb and make a guess, 30K$ea.

BorisTheBlade82 · Jan 10, 2023

@nicalandia already posted detailed screenshots containing the most important specs about Bergamo. Seems the top dog has 3.1Ghz boost clock.

Page 3 - Discussion - Zen 4 Core Specifications Discussion

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

Timorous · Jan 10, 2023

nicalandia said:
There is no enough space on the package to fit 16 of those on the CPU.

There is 8. 8CCDs*16c=128c.

Timorous · Jan 10, 2023

Exist50 said:
You're comparing apples and oranges. The density of a high speed, cache-heavy compute die will be vastly different from an SoC with a GPU and everything else included. I'm sure if you actually look at the Zen 4 cluster in isolation, the density will be very similar to the desktop Zen 4.

That was not the case for Renoir or Cezanne. Further Navi 31 is not 140M xtors per mm² although getting the density of the 5nm portion of the die alone is tricky because I don't think we have a transistor count for the GCD on its own, just the whole thing.

Anyway we will find out when we get more details but I don't see AMD doing the same thing twice for no benefit (like say squeezing 3 CCXs into a CCD).

Exist50 · Jan 10, 2023

Timorous said:
That was not the case for Renoir or Cezanne.

Yes, it was.

Timorous · Jan 10, 2023

Exist50 said:
Yes, it was.

I just went and checked and you are right. Locuza did some analysis and the difference between the Cezanne die size and Vermeer die size was purely the L3 cache and lack of TSVs. The core itself was exactly the same.

So given that my entire argument is nonsense since AMD already copy/pasted the existing CCX into the APU. So yes, AMD will need to design a higher density version for Zen4c and that will come with clock speed tradeoffs.

Still find it curious how AMD manage to have such high transistor density for the APUs when it seems the CCXs and the Compute units are the same density as desktop counterparts and there is a lot of IO. I guess the power gating circuitry can be really dense.

uzzi38 · Jan 10, 2023

Timorous said:
That was not the case for Renoir or Cezanne. Further Navi 31 is not 140M xtors per mm² although getting the density of the 5nm portion of the die alone is tricky because I don't think we have a transistor count for the GCD on its own, just the whole thing.

Anyway we will find out when we get more details but I don't see AMD doing the same thing twice for no benefit (like say squeezing 3 CCXs into a CCD).

Just compared from a die shot, the Zen 3 core + L2 cache on Cezanne is the same size as the desktop one.

Timorous · Jan 10, 2023

uzzi38 said:
Just compared from a die shot, the Zen 3 core + L2 cache on Cezanne is the same size as the desktop one.

Yea I did the same, see above post.

moinmoin · Jan 10, 2023

I was also very surprised to see that the core sizes were the same. Some kind of changes had to be done to lead to the significant difference in v/f curves.

eek2121 · Jan 10, 2023

Timorous said:
I just went and checked and you are right. Locuza did some analysis and the difference between the Cezanne die size and Vermeer die size was purely the L3 cache and lack of TSVs. The core itself was exactly the same.

So given that my entire argument is nonsense since AMD already copy/pasted the existing CCX into the APU. So yes, AMD will need to design a higher density version for Zen4c and that will come with clock speed tradeoffs.

Still find it curious how AMD manage to have such high transistor density for the APUs when it seems the CCXs and the Compute units are the same density as desktop counterparts and there is a lot of IO. I guess the power gating circuitry can be really dense.

There may a few things at play here.

N5 actually provided for a decent SRAM shrink
Analog portions of a chip also see a shrink.
AMD used a custom tailored process for desktop and EPYC that allows them to scale clocks.
TSMC has various “scaling boosters” available on N5, which were not used on desktop due to #1.

As I have stated in another thread, Mobile has a similar density to the Apple M2. Bergamo will likely have even higher density. I will be surprised if it does not.

EDIT:

Another thing I wanted to add (from the business side of things) is that with Zen 2 and Zen 3, AMD wanted to clock their mobile chips as high as possible (read: competitive with Intel) while being power efficient, hence why they went with a similar design as desktop. With Zen 4, AMD very likely added a ton of dark silicon and otherwise sacrificed density in order to hit insanely high clocks. They did this in order to compete with Intel. On mobile, they have no need to hit such high clock targets. Instead their objective would be density and power savings. After all, a mobile chip doesn't need to hit 5.75 ghz or higher. By removing dark silicon areas, using more scaling boosters, and increasing density, they can bring the die size down so the chip costs less to make (and they can make more of them) without sacrificing all that much in the way of clocks.

I expect mobile Zen 4 to be extremely efficient from a perf/watt standpoint. Perf/$? Not so much.

Khanan · Jan 10, 2023

RDNA3 with the Navi31 has extreme density that’s mentioned by AMD in the slides. Could however be marketing wash.

Kaluan · Jan 10, 2023

Khanan said:
RDNA3 with the Navi31 has extreme density that’s mentioned by AMD in the slides. Could however be marketing wash.

IDK, I believe that part is true.
Even N33 manages to be almost 40% more dense than N23, of which less than half can at best be explained by N7 -> N6.

DisEnchantment · Jan 15, 2023

The MI300 infographic disclosure did raise even more questions regarding the techs for the upcoming AMD products. It seems to me the CPUs are sharing the IC with the GPU so a possible SLC could trickle down to consumer parts. The CPU L3 as well is not on the top die, this could lend some credibility to the rumor of CPU cores stacked on top of cache or at least on top of an SLC. The GCDs appearing as one single GPU to the SW also would mean they have sorted out the multiple GCD concept but that's for another thread.

However for me, the interconnects are the most interesting part of the MI300 and how that would carry over to Zen 5. Lots of interesting design choices would have to be made for Zen 5 and fairly exciting for the folks working on such things considering the tape out would be within a quarter or so if not done already.

In case there is really stacking throughout the Zen 5 DT/Server product lines

Infinity fabric links via the package would be gone if all dies are stacked on the base die. This would be a necessity if the supposed 64 Gbps links (from LinkedIn) are to be implemented. Current gen Ryzen is bottlenecked by the IF clock to some extent.
More cores per CCX (alluded to by Mike Clark) sharing an L3 would need some stacking, that would make routing each core to the L3 possible with uniform distance and not bloat the core area too much and leave the RDL on the base die side instead of core die. This could in fact shave off a non trivial amount of traces if the distance to the L3 from the core is cut.
Base die can host the IO and other PCIe stuffs. They might even get away with N6 for the base die on the desktop

One drawback is that they would need to limit the Tjmax to something like 85 C to keep the hybrid bond from degrading they might need a big IPC jump to recover the lost ST perf ( for instance 7800X3D tops out at 5 GHz ). But the thought of having the big L3 chunks off the leading edge nodes would make the bean counters extremely happy.

If the CCDs are not stacked, they would need something like the interconnects on RDNA3 to replace the current Z4 GMI. Then they have FinFlex to play with.

JustViewing · Jan 15, 2023

DisEnchantment said:
The MI300 infographic disclosure did raise even more questions regarding the techs for the upcoming AMD products. It seems to me the CPUs are sharing the IC with the GPU so a possible SLC could trickle down to consumer parts. The CPU L3 as well is not on the top die, this could lend some credibility to the rumor of CPU cores stacked on top of cache or at least on top of an SLC. The GCDs appearing as one single GPU to the SW also would mean they have sorted out the multiple GCD concept but that's for another thread.

However for me, the interconnects are the most interesting part of the MI300 and how that would carry over to Zen 5. Lots of interesting design choices would have to be made for Zen 5 and fairly exciting for the folks working on such things considering the tape out would be within a quarter or so if not done already.

In case there is really stacking throughout the Zen 5 DT/Server product lines

Infinity fabric links via the package would be gone if all dies are stacked on the base die. This would be a necessity if the supposed 64 Gbps links (from LinkedIn) are to be implemented. Current gen Ryzen is bottlenecked by the IF clock to some extent.

More cores per CCX (alluded to by Mike Clark) sharing an L3 would need some stacking, that would make routing each core to the L3 possible with uniform distance and not bloat the core area too much and leave the RDL on the base die side instead of core die. This could in fact shave off a non trivial amount of traces if the distance to the L3 from the core is cut.

Base die can host the IO and other PCIe stuffs. They might even get away with N6 for the base die on the desktop

One drawback is that they would need to limit the Tjmax to something like 85 C to keep the hybrid bond from degrading they might need a big IPC jump to recover the lost ST perf ( for instance 7800X3D tops out at 5 GHz ). But the thought of having the big L3 chunks off the leading edge nodes would make the bean counters extremely happy.

If the CCDs are not stacked, they would need something like the interconnects on RDNA3 to replace the current Z4 GMI. Then they have FinFlex to play with.

All well and good but the big questions is, would it bring down the cost for consumer CPUs? AMD may separate desktop CPU and Server CPU package design.

moinmoin · Jan 15, 2023

JustViewing said:
the big questions is, would it bring down the cost for consumer CPUs? AMD may separate desktop CPU and Server CPU package design.

I agree. I continue to expect AMD to use more complex packages only for higher performance high margin products. The FAD reference to "AMD chiplet technology" for Phoenix appears to have turned out to be a false flag. At this rate Intel may well be first to chiplet based lower performance mobile parts with Meteor Lake. Will be very interesting to see if that is a competitive cost advantage or disadvantage.

Exist50 · Jan 15, 2023

DisEnchantment said:
The MI300 infographic disclosure did raise even more questions regarding the techs for the upcoming AMD products. It seems to me the CPUs are sharing the IC with the GPU so a possible SLC could trickle down to consumer parts. The CPU L3 as well is not on the top die, this could lend some credibility to the rumor of CPU cores stacked on top of cache or at least on top of an SLC.

I might have missed info, but have they shown that the L3 is moved to the base die? Actually, what (if anything) is confirmed for it?

DisEnchantment said:
More cores per CCX (alluded to by Mike Clark) sharing an L3 would need some stacking, that would make routing each core to the L3 possible with uniform distance and not bloat the core area too much and leave the RDL on the base die side instead of core die.

They moved to a ring bus with the move to an 8c CCX, so I don't think stacking is a necessity for them to scale per-CCX core count. Nor do I think 2 layer hybrid bonding alone would be enough to solve that routing issue, so they'll probably keep the ring.

DisEnchantment said:
One drawback is that they would need to limit the Tjmax to something like 85 C to keep the hybrid bond from degrading they might need a big IPC jump to recover the lost ST perf ( for instance 7800X3D tops out at 5 GHz )

Putting the cache on the bottom will probably help a lot. No longer need to go through the top die for cooling. And maybe future gens of hybrid bonding can get the thermal resilience up to where it's a non-issue, if that's even the limiting factor today. Might just be a lack of telemetry on the cache die driving a more conservative approach for now.

TESKATLIPOKA · Jan 16, 2023

This is a continuation to what @Glo. posted here about Strix Point possibly having 24CU.
I don't really believe in such a big IGP, because I don't think there would be a big enough market for It, but let's imagine It's real.
I will use LLC instead of L3 and IC and It will have Its own chiplet.

There could be 10 different chiplets and 5 chiplets per APU:
1.) Zen5 chiplet in 2 versions, 4core and 8core (3nm)
2.) Zen4c chiplet in 2 versions, 4core and 8core (3nm)
3.) IGP chiplet in 2 versions, 16CU and 24CU (3nm)
4.) Cache chiplet in 3 versions, 36MB, 64MB and 96MB (6nm)
5.) A single IO chiplet with 2*64bit DDR5(4x32bit LPDDR5), video stuff, PHY etc. (5-6nm)

This is how different models could look like in real life, but these sizes are just for show.
Width would stay the same and only height would change depending on which chiplet you use.

You could make a lot of combinations with different chiplets while not cutting them down much excluding LLC chiplet. Power limit and LLC was based on the IGP's needs. I made a table with 15 models:

	Zen 5	Zen 4c	IGP	IGP gaming frequency	Last Level Cache	Power Limit -> gaming
6C16T Model	3 Cores; L2: 6MB	3 Cores; L2: 1.5MB	12 CU; 48TMU; 24 ROP	2400 MHz	Total: 21MB CPU: 9 + IGP: 12MB	30W CPU: 15W + IGP: 15W
6C16T Model	3 Cores; L2: 6MB	3 Cores; L2: 1.5MB	16 CU; 64TMU; 32 ROP	2400 MHz	Total: 33MB CPU: 9 + IGP: 24MB	35W CPU: 15W + IGP: 20W
8C16T Model	4 Cores; L2: 4MB	4 Cores; L2: 2MB	16 CU; 64TMU; 32 ROP	2400 MHz	Total: 36MB CPU: 12 + IGP: 24MB	35W CPU: 15W + IGP: 20W
8C16T Model	4 Cores; L2: 4MB	4 Cores; L2: 2MB	20 CU; 80TMU; 40 ROP	2400 MHz	Total: 48MB CPU: 12 + IGP: 36MB	40W CPU: 15W + IGP: 25W
8C16T Model	4 Cores; L2: 4MB	4 Cores; L2: 2MB	24 CU; 96TMU; 48 ROP	2400 MHz	Total: 60MB CPU: 12 + IGP: 48MB	45W CPU: 15W + IGP: 30W
10C20T model	4 Cores; L2: 4MB	6 Cores; L2: 3MB	16 CU; 64TMU; 32 ROP	2800 MHz	Total: 46 MB CPU: 14 + IGP: 32MB	50W CPU: 20W + IGP: 30W
10C20T model	4 Cores; L2: 4MB	6 Cores; L2: 3MB	20 CU; 80TMU; 40 ROP	2800 MHz	Total: 60MB CPU: 14 + IGP: 46MB	57W CPU: 20W + IGP: 37W
12C24T model	4 Cores; L2: 4MB	8 Cores; L2: 4MB	16 CU; 64TMU; 32 ROP	2800 MHz	Total: 48MB CPU: 16 + IGP: 32MB	55W CPU: 25W + IGP: 30W
12C24T model	4 Cores; L2: 4MB	8 Cores; L2: 4MB	20 CU; 80TMU; 40 ROP	2800 MHz	Total: 62MB CPU: 16 + IGP: 46MB	62W CPU: 25W + IGP: 37W
12C24T model	4 Cores; L2: 4MB	8 Cores; L2: 4MB	24 CU; 96TMU; 48 ROP	2800 MHz	Total: 76MB CPU: 16 + IGP: 60MB	70W CPU: 25W + IGP: 45W
14C28T model	6 Cores; L2: 6MB	8 Cores; L2: 4MB	16 CU; 64TMU; 32 ROP	2800 MHz	Total: 52MB CPU: 20 + IGP: 32MB	60W CPU: 30W + IGP: 30W
14C28T model	6 Cores; L2: 6MB	8 Cores; L2: 4MB	20 CU; 80TMU; 40 ROP	2800 MHz	Total: 66MB CPU: 20 + IGP: 46MB	67W CPU: 30W + IGP: 37W
16C32T model	8 Cores; L2: 8MB	8 Cores; L2: 4MB	16 CU; 64TMU; 32 ROP	3200 MHz	Total: 64MB CPU: 24 + IGP: 40MB	75W CPU: 35W + IGP: 40W
16C32T model	8 Cores; L2: 8MB	8 Cores; L2: 4MB	20 CU; 80TMU; 40 ROP	3200 MHz	Total: 80MB CPU: 24 + IGP: 56MB	85W CPU: 35W + IGP: 50W
16C32T model	8 Cores; L2: 8MB	8 Cores; L2: 4MB	24 CU; 96TMU; 48 ROP	3200 MHz	Total: 96MB CPU: 24 + IGP: 72MB	95W CPU: 35W + IGP: 60W

edit: IO chiplet could have a 2CU IGP for models without IGP(GPU) chiplet.

Mopetar · Jan 16, 2023

That's essentially where I'd like to see AMD go in the future and really is the ultimate end-game of a chiplet based approach.

There's probably always going to be some market for a monolithic design, but I could see that being relegated to niche markets over time.

I suspect that we potentially get some dual-GPU designs as well. As long as the physical size doesn't interfere, there's no reason note to offer a 36/48 CU option for a gaming APU that provides pretty good performance without the need to add in a discrete card.

I also could imagine AMD doing something like Intel where it designs a smaller core that's built around providing more throughput for CPU compute. The Zen core is already considerably smaller than Intel's performance core so there's not as much pressure for them to do this. You could also say that the more densely packed Zen 4c is already them doing this, but I wonder how much further they could take it.

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Golden Member

Diamond Member

Golden Member

Diamond Member

Senior member

Platinum Member

Member

Senior member

Senior member

Golden Member

Golden Member

Platinum Member

Golden Member

Platinum Member

Golden Member

Diamond Member

Platinum Member

Senior member

Senior member

Golden Member

Member

Diamond Member

Platinum Member

Platinum Member

Diamond Member