Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 14 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

nicalandia

Diamond Member
Jan 10, 2019
3,330
5,281
136
Also
Before Zen4 was revealed we assumed the cloud variant was shrunk with regard to the N7 to N5 transition having 1.8x density. So squeezing 16-cores into the same space seemed do-able.
Now that the cloud variant is expected to be a N4 shrink...
N4 to N5 gives 1.3x density.

The Zen4c(Cloud) is a TSMC N5 Product, But using TSMC High Density Libraries.

1673220841102.png
 

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
The Zen4c(Cloud) is a TSMC N5 Product, But using TSMC High Density Libraries.
Again, there's no current evidence that they're using different libraries. "Dense" more likely refers to "dense compute", i.e. the market Bergamo is targeting. They wouldn't tie the naming to library choice like that.
 
  • Like
Reactions: Geddagod

Timorous

Golden Member
Oct 27, 2008
1,626
2,794
136
Again, there's no current evidence that they're using different libraries. "Dense" more likely refers to "dense compute", i.e. the market Bergamo is targeting. They wouldn't tie the naming to library choice like that.

They need to fit around 11B transistors in an ~80mm² CCD. If the CCD is larger then 8 of them + 2 IO dies won't fit.

That is around 137.5M transistors per mm.
 

Timorous

Golden Member
Oct 27, 2008
1,626
2,794
136
Where're those numbers coming from?

Zen 4 is 6.58B transistors. Zen4c is double the cores but same amount of cache so 13B - 2B for 32MB cache is 11B or there abouts.

Max CCD size is an estimate but but 2 IO dies and 8 chiplets with routing for a link per CCX to the IO die (presuming they are using 2 of the ones for sienna here) does not leave a ton of space to go much larger. So you have some boundaries.

Won't be exact of course and actuals will be different but I highly doubt Zen4c is 94M xtors per mm because then it would need around 103mm² and 8 of them + IO dies that combine to be bigger than the one on Genoa is too big.

We also see AMD/TSMC can hit 140M per mm² and hit 5.2Ghz core speeds and 3Ghz GPU speed in a low power design. Laptop and server requirements are very similar. So why wouldn't AMD use the same library as phoenix for Zen4c. It hits good clocks at low power which is exactly what AMD need for the 128 core server chip.
 

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
Zen 4 is 6.58B transistors. Zen4c is double the cores but same amount of cache so 13B - 2B for 32MB cache is 11B or there abouts.
You're assuming the cores need the same amount of transistors for the same uarch. I wouldn't make that assumption.
Laptop and server requirements are very similar. So why wouldn't AMD use the same library as phoenix for Zen4c.
They are quite different. Laptops highly value bursty, single core speed, hence similar design priorities to the desktop designs. For the market Bergamo is targeting, it's all about throughput.
 

Timorous

Golden Member
Oct 27, 2008
1,626
2,794
136
You're assuming the cores need the same amount of transistors for the same uarch. I wouldn't make that assumption.

They are quite different. Laptops highly value bursty, single core speed, hence similar design priorities to the desktop designs. For the market Bergamo is targeting, it's all about throughput.

Yes because I am trying to get to a 'is this doable or is something else different' position, not this is the Zen4c transistor count and die size to +/- 1%.

So APU transistor density, estimate for Zen4C transistor count and you have a CCD that is sub 80mm based on an 11B transistor CCD. Totally in range of what will fit with 2 IO dies on the Genoa package. The Zen4 94M xtors per mm library though would mean a 117mm^2 die size which is getting to the point, when paired with 2x io dies of being too big to fit.

In addition there is Siena which also uses Zen4c to make the package smaller. This will support 64 Zen4c cores, given AMD like to re-use stuff then why not design the Siena IO die with enough links that it could talk to another one and boom, we have Zen4c used in 2 designs, low end Siena and high end Bergamo as well as the IO Die being used in both segments. It would be so typical for AMD to do this because this is precisely why AMD went to chiplets. Again here though 117mm^2 per CCD with 4 of them would be a tight fit where as 4x 80mm or less is a much easier fit.

What else is obvious for AMD to do. Well the APU CCXs have always been density and power optimised because they are usually built on the best node AMD are currently using and because they are designed for laptops where power efficiency and battery life are key metrics. Power efficiency being a key server metric and density being key for Zen4c makes this a perfect candidate to design the 2 CCX zen4c part. It would save on R&D to re-use that CCX block for Zen4c which is AMD SOP.

I may be wrong of course but it makes sense strategically and it fits with how AMD have operated for the last 5 years.

The interesting thing to me is the impact for Zen5 and Zen5c. It would mean the desktop Zen 5 IO die would need 3 links so it can have 1 per CCX. Has there been a die shot of the Zen 4 IO Die yet? Looking at its size vs genoa (just under 1/3, 122mm vs 397mm) it would make sense if the Zen 4 IO die has 3 links like a quarter of the Genoa die would + a bit extra for the iGPU so it looks like the current IO die may have this already but if not it would need a new one for Zen 5 if the plan is to go with a mix of Zen5 CCDs and Zen5c CCDs.
 
  • Like
Reactions: BorisTheBlade82

eek2121

Platinum Member
Aug 2, 2005
2,930
4,026
136
Yes because I am trying to get to a 'is this doable or is something else different' position, not this is the Zen4c transistor count and die size to +/- 1%.

So APU transistor density, estimate for Zen4C transistor count and you have a CCD that is sub 80mm based on an 11B transistor CCD. Totally in range of what will fit with 2 IO dies on the Genoa package. The Zen4 94M xtors per mm library though would mean a 117mm^2 die size which is getting to the point, when paired with 2x io dies of being too big to fit.

In addition there is Siena which also uses Zen4c to make the package smaller. This will support 64 Zen4c cores, given AMD like to re-use stuff then why not design the Siena IO die with enough links that it could talk to another one and boom, we have Zen4c used in 2 designs, low end Siena and high end Bergamo as well as the IO Die being used in both segments. It would be so typical for AMD to do this because this is precisely why AMD went to chiplets. Again here though 117mm^2 per CCD with 4 of them would be a tight fit where as 4x 80mm or less is a much easier fit.

What else is obvious for AMD to do. Well the APU CCXs have always been density and power optimised because they are usually built on the best node AMD are currently using and because they are designed for laptops where power efficiency and battery life are key metrics. Power efficiency being a key server metric and density being key for Zen4c makes this a perfect candidate to design the 2 CCX zen4c part. It would save on R&D to re-use that CCX block for Zen4c which is AMD SOP.

I may be wrong of course but it makes sense strategically and it fits with how AMD have operated for the last 5 years.

The interesting thing to me is the impact for Zen5 and Zen5c. It would mean the desktop Zen 5 IO die would need 3 links so it can have 1 per CCX. Has there been a die shot of the Zen 4 IO Die yet? Looking at its size vs genoa (just under 1/3, 122mm vs 397mm) it would make sense if the Zen 4 IO die has 3 links like a quarter of the Genoa die would + a bit extra for the iGPU so it looks like the current IO die may have this already but if not it would need a new one for Zen 5 if the plan is to go with a mix of Zen5 CCDs and Zen5c CCDs.

I’ve mentioned that Zen4c is using HD libraries, and certain folks tried to tell me I was wrong. I suspect it is because they hope Zen4c will come to desktop. A chip using HD libraries is going to top out well under 4ghz, which of course, goes against that narrative. Time will tell who ends up being right.

We will also see if AMD takes the hybrid approach with Zen 5, or if they simply double down on core counts. (8700x -> 16 cores, 8900x -> 24 cores, 8950x -> 32 cores)

If they need to save die area, they will go the hybrid approach. If they can fit twice as many cores into a similar die area, possibly by redesigning the L3 cache into a hybrid L3/L4 stacked cache, I could see them doing that. Half the L3 cache, add a 32-64mb shared L4 cache to the IO die. Profit.

32 Zen 5 cores would likely beat any 8+16 design Intel could come up with. A hybrid approach might not.
 

Timorous

Golden Member
Oct 27, 2008
1,626
2,794
136
I’ve mentioned that Zen4c is using HD libraries, and certain folks tried to tell me I was wrong. I suspect it is because they hope Zen4c will come to desktop. A chip using HD libraries is going to top out well under 4ghz, which of course, goes against that narrative. Time will tell who ends up being right.

We will also see if AMD takes the hybrid approach with Zen 5, or if they simply double down on core counts. (8700x -> 16 cores, 8900x -> 24 cores, 8950x -> 32 cores)

If they need to save die area, they will go the hybrid approach. If they can fit twice as many cores into a similar die area, possibly by redesigning the L3 cache into a hybrid L3/L4 stacked cache, I could see them doing that. Half the L3 cache, add a 32-64mb shared L4 cache to the IO die. Profit.

32 Zen 5 cores would likely beat any 8+16 design Intel could come up with. A hybrid approach might not.

The APU tops out at 5.2Ghz on the cores and 3Ghz on GPU at 140M xtors per mm²

That seems dense enough and fast enough to be the basis for Zen4c and I see no reason why AMD would use a different library. The CCD will have a minimum size to fit the physical connections required to wire up the CCXs to the IO die.

Obviously when you put 128 cores onto a chip clock speeds are going to be lower due to the power per core (360W / 128c is 2.8w per core before you factor in what the IO Die uses).

If that CCD were to be used in a desktop Zen4 hybrid product then with about 3x the power per core I don't see why it can't clock like the APU can.

I expect the same to be true for Zen 5.
 

eek2121

Platinum Member
Aug 2, 2005
2,930
4,026
136
The APU tops out at 5.2Ghz on the cores and 3Ghz on GPU at 140M xtors per mm²

That seems dense enough and fast enough to be the basis for Zen4c and I see no reason why AMD would use a different library. The CCD will have a minimum size to fit the physical connections required to wire up the CCXs to the IO die.

Obviously when you put 128 cores onto a chip clock speeds are going to be lower due to the power per core (360W / 128c is 2.8w per core before you factor in what the IO Die uses).

If that CCD were to be used in a desktop Zen4 hybrid product then with about 3x the power per core I don't see why it can't clock like the APU can.

I expect the same to be true for Zen 5.

You are assuming the APU uses Zen4c. I think that may be a flawed assumption. Has AMD confirmed this info?
 

Timorous

Golden Member
Oct 27, 2008
1,626
2,794
136
You are assuming the APU uses Zen4c. I think that may be a flawed assumption. Has AMD confirmed this info?

Why would AMD design a density / power optimised CCX twice when AMDs modus operandi is to copy paste stuff everywhere they can?

EDIT: there may be some slight differences between the Phoenix CCX and the one in Zen4c but I expect they are probably related to integrating them into the overall design rather than anything else. I expect the bulk of it to be the same because I see no reason why they would do the same work twice.
 

nicalandia

Diamond Member
Jan 10, 2019
3,330
5,281
136
Why would AMD design a density / power optimised CCX twice when AMDs modus operandi is to copy paste stuff everywhere they can?
That would make sense, but I am afraid that Zen4 on APUs are just Zen4 with half L3.

Here is the info page, I suspect that Bergamo will say Zen4c
1673277150631.png
 

Timorous

Golden Member
Oct 27, 2008
1,626
2,794
136
That would make sense, but I am afraid that Zen4 on APUs are just Zen4 with half L3.

Here is the info page, I suspect that Bergamo will say Zen4c
View attachment 74282

I suspect that may be correct but I also suspect there will be no functional difference.

EDIT: Just checked and AMD don't state architecture for their EPYC systems. https://www.amd.com/en/product/12191 so chances are Bergamo won't it will just have a different L3 amount and maybe a different family than EPYC 9004
 
Last edited:

eek2121

Platinum Member
Aug 2, 2005
2,930
4,026
136
Why would AMD design a density / power optimised CCX twice when AMDs modus operandi is to copy paste stuff everywhere they can?

EDIT: there may be some slight differences between the Phoenix CCX and the one in Zen4c but I expect they are probably related to integrating them into the overall design rather than anything else. I expect the bulk of it to be the same because I see no reason why they would do the same work twice.

Because high density chips don’t scale in frequency. Also, laptop and cloud computing are AMD’s biggest potential market opportunities. It makes sense to optimize for each. AMD has stated a few different times that Zen4c is cloud only. The c literally stands for cloud.

AMD has done mobile specific SKUs for every generation, so this isn’t something new. They built a new chip for cloud providers because they will sell bucket loads of them.
 
  • Like
Reactions: lightmanek

Timorous

Golden Member
Oct 27, 2008
1,626
2,794
136
Because high density chips don’t scale in frequency. Also, laptop and cloud computing are AMD’s biggest potential market opportunities. It makes sense to optimize for each. AMD has stated a few different times that Zen4c is cloud only. The c literally stands for cloud.

AMD has done mobile specific SKUs for every generation, so this isn’t something new. They built a new chip for cloud providers because they will sell bucket loads of them.

Really? So the 7040 range does not exist then because they increased density vs the Zen4 CCD by 49% in a power limited design while maintaining upto 5.4Ghz boost clocks and 3Ghz GPU clocks. Or is 140M xtors / mm^2 not high density?

Seems perfectly fine frequency wise to me. So I ask again. If AMD already have this CCX design lying around why design another one for Zen4c? What is the business case to do so? Also what makes you think Zen4c is nothing more than the moniker AMD have given to the 16c CCD to differentiate it from the 8c CCD?

I agree that Zen4c is nothing new in terms of CCX design. Renoir was a Zen2c CCX, Cezanne was a Zen3c CCX. The difference is purely in taking that CCX design, putting 2 of them into a CCD and creating a product to sell them. Really 2 products because the 64c 4CCD Siena is going to be using Zen4c as well and that is probably why it exists. The top core count parts are no where near as popular as the lower core count ones so Zen4c was never just a high end play. It was a play to have a 64c Milan beating product in a cheaper package. That is where the volume and the business case for the Zen4c CCD will lie IMO.

EDIT to add. Now will Zen4c CCDs stay in cloud. Probably but with Zen 5 and the e-core spam from Intel I think AMD will go for a hybrid setup with a zen5 CCD and zen5c CCD. This gives them 24cores from 2 CCDs, maxes out single core clock speeds with the Zen5 CCD and allows for strong nT performance.
 
Last edited:

eek2121

Platinum Member
Aug 2, 2005
2,930
4,026
136
Really? So the 7040 range does not exist then because they increased density vs the Zen4 CCD by 49% in a power limited design while maintaining upto 5.4Ghz boost clocks and 3Ghz GPU clocks. Or is 140M xtors / mm^2 not high density?

Seems perfectly fine frequency wise to me. So I ask again. If AMD already have this CCX design lying around why design another one for Zen4c? What is the business case to do so? Also what makes you think Zen4c is nothing more than the moniker AMD have given to the 16c CCD to differentiate it from the 8c CCD?

I agree that Zen4c is nothing new in terms of CCX design. Renoir was a Zen2c CCX, Cezanne was a Zen3c CCX. The difference is purely in taking that CCX design, putting 2 of them into a CCD and creating a product to sell them. Really 2 products because the 64c 4CCD Siena is going to be using Zen4c as well and that is probably why it exists. The top core count parts are no where near as popular as the lower core count ones so Zen4c was never just a high end play. It was a play to have a 64c Milan beating product in a cheaper package. That is where the volume and the business case for the Zen4c CCD will lie IMO.

EDIT to add. Now will Zen4c CCDs stay in cloud. Probably but with Zen 5 and the e-core spam from Intel I think AMD will go for a hybrid setup with a zen5 CCD and zen5c CCD. This gives them 24cores from 2 CCDs, maxes out single core clock speeds with the Zen5 CCD and allows for strong nT performance.

You will note it has roughly the same density as the M2.
 

yuri69

Senior member
Jul 16, 2013
389
624
136
  • Like
  • Wow
Reactions: Exist50 and Kaluan

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
Yes because I am trying to get to a 'is this doable or is something else different' position, not this is the Zen4c transistor count and die size to +/- 1%.
There's a difference between theorizing that a dense library is used and/or contributes, vs proclaiming that that must be what they're doing. In particular, I'm mentioning the constant transistor assumption as one to break.
In addition there is Siena which also uses Zen4c to make the package smaller.
Where did you see that Siena uses Zen 4c?
Well the APU CCXs have always been density and power optimised because they are usually built on the best node AMD are currently using and because they are designed for laptops where power efficiency and battery life are key metrics.
The APU CCXs have been near identical to the desktop ones, being ultimately reused from server with similar power profiles to mobile. A throughput core with half the area is a different beast entirely.
 

BorisTheBlade82

Senior member
May 1, 2020
664
1,015
106
Where did you see that Siena uses Zen 4c?
Was about to ask the same, but found out in the meantime that this comes from MLID - he sounded very certain, whatever that means.

I always thought that Siena was a SMB Server platform as well as Workstation or HEDT. So Zen4c with maybe only 3.x GHz would not make much sense for me. But maybe this is just confirmation bias from my side.
 
  • Like
Reactions: Vattila

Timorous

Golden Member
Oct 27, 2008
1,626
2,794
136
Was about to ask the same, but found out in the meantime that this comes from MLID - he sounded very certain, whatever that means.

I always thought that Siena was a SMB Server platform as well as Workstation or HEDT. So Zen4c with maybe only 3.x GHz would not make much sense for me. But maybe this is just confirmation bias from my side.

I think it will be capable of clocking higher in lower core count configurations. 140M xtors per mm² will allow 16 cores to fit in less than 80mm² and higher clocks will provide more viable use cases.
 

Timorous

Golden Member
Oct 27, 2008
1,626
2,794
136
There's a difference between theorizing that a dense library is used and/or contributes, vs proclaiming that that must be what they're doing. In particular, I'm mentioning the constant transistor assumption as one to break.

Where did you see that Siena uses Zen 4c?

The APU CCXs have been near identical to the desktop ones, being ultimately reused from server with similar power profiles to mobile. A throughput core with half the area is a different beast entirely.

Where is the evidence it has half the area? A 50% density increase + half the L3 cache Vs a standard Zen 4 CCD gets you almost all of the way there and AMD have already designed that CCX for that node and library so I will ask you. What is the business case to spend millions on another design that is almost identical and won't even allow for > 128c designs?
 

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
Where is the evidence it has half the area?
That's the claim AMD themselves made. Probably talking about core + L2.
A 50% density increase + half the L3 cache Vs a standard Zen 4 CCD gets you almost all of the way there and AMD have already designed that CCX for that node and library
Again, where are all of these numbers you keep throwing out coming from? No, Zen 4 in Phoenix is basically identical to Zen 4 elsewhere, but all current indications.