Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 7 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DisEnchantment

Golden Member
Mar 3, 2017
1,774
6,757
136
big.BIG is confusing. I know what you mean but suppose Zen 5 is heterogenous the big cores that were in Zen 4 will be little in Zen 5 and even bigger cores will come to Zen 5. It's either big.LITTLE or hybrid.
It is only supposed to denote the difference between various CPU designers approach to Heterogeneous cores.

AMD's manual call it heterogeneous, so that is the term that should apply to their designs. Indications are that they only differ from Performance and Efficiency standpoint, not from ISA standpoint (Unlike Intel's P/E cores which have ISA differences) a la Zen 4 vs Zen 4c
Like many have pointed out, Zen 5 officially being declared as concurrently designed on N3 and N4 suggests that N3 based density and efficiency optimized Zen 5c CCDs may find their way in the same package as N4 frequency and performance optimized Zen 5 CCDs. (Of course there may be other CCDs using N3 as well as N3 based mobile Zen 5 which likely use same physical implementation of Zen 5c, in the same likelihood as Zen 4 mobile using Zen 4c physical implementation )

Which makes sense considering that the Performance CCDs may end up clocking higher than Zen 4, and heat dissipation/frequency being a thing so lots of tradeoff going to N3.
So a secondary compact and efficient server derived Zen 5c CCD using N3 would seem a very probable way to add MT performance instead of triple CCD designs on the AM5 desktop.

Remains to be seen, but seeing how my 7950X behaves in MT loads, it makes too much sense to be something else.
On AM5, when thermal and power limited, all core frequency is around 4.7-4.9 GHz on the 7950X.
If they have 8 Core Zen 5 at 6+ GHz on N4 and then another 16 core Zen 5c CCD with at 4.5 GHz on N3 you can have a 24 core Ryzen 8000 part that could operate all core at 4.5+ GHz

You could even argue that such a part could be implemented even with Zen 4. 8 Core N5 Zen 4 CCD + 16 Core N4 Zen 4C CCD. And they are exact instruction compatible but different performance and efficiency characteristics. If RTL was putting lots of MT heat, I wouldn't be surprise to hear new 24 core parts from AMD once Bergamo is out.
An evidence supporting this concept is that Raphael CCDs have dual GMIs but only one used. So we know two CCXs could have been put inside one CCD, therefore they got the wiring already in place for >2 CCXs in AM5 Raphael
And 7950X/7900X users already are aware the of the CCD0 and CCD1 frequency differences, it can be 300MHz+ on my sample. I hear Windows is not managing this properly, but it is already part of ACPI CPPC Preferred cores spec.

Besides that, not sure if > 32 high performance Cores can be fed with 4 independent channels of DDR5 (but at least way better than DDR4 2 channels).
 
Last edited:

DisEnchantment

Golden Member
Mar 3, 2017
1,774
6,757
136
You could even argue that such a part could be implemented even with Zen 4. 8 Core N5 Zen 4 CCD + 16 Core N4 Zen 4C CCD. And they are exact instruction compatible but different performance and efficiency characteristics. If RTL was putting lots of MT heat, I wouldn't be surprise to hear new 24 core parts from AMD once Bergamo is out.
Well... This is interesting.
1666044963683.png

1666045399631.png
Looks like this CPUID is supported on Zen 4 Raphael (tried on my 7950X) already. And EDX is giving the APIC ID of the core on which the code is running. Tried on Zen 3 (5950X) but it is not supported.
Of course the CPU returns with reserved values at Level 0, according to manual at that point the detection can be stopped.
 

Joe NYC

Diamond Member
Jun 26, 2021
3,065
4,472
106
Last edited:
  • Like
Reactions: Kaluan

RnR_au

Platinum Member
Jun 6, 2021
2,476
5,856
136
There was another image shared by TSMC in 2021 that came with the one you posted;

1666588047272.png

Its clear on the qualifications available and the time expected for each. At the time Andreas Schilling shared this image on twitter and was directly asked about N7/N6 on N5;


I guess that anyone's guess is as good as anyone's elses. MLID is saying N6 on N5... which to me is enough to say that mixed configs don't exist atm :p
 

Joe NYC

Diamond Member
Jun 26, 2021
3,065
4,472
106
There was another image shared by TSMC in 2021 that came with the one you posted;

View attachment 69765

Its clear on the qualifications available and the time expected for each. At the time Andreas Schilling shared this image on twitter and was directly asked about N7/N6 on N5;


I guess that anyone's guess is as good as anyone's elses. MLID is saying N6 on N5... which to me is enough to say that mixed configs don't exist atm :p

If mixed stacking is not available, it would explain RDNA3 approach of having Infinity cache on a separate die.

Or mixed stacking just not ready at time of launch
 

uzzi38

Platinum Member
Oct 16, 2019
2,746
6,637
146
There was another image shared by TSMC in 2021 that came with the one you posted;

View attachment 69765

Its clear on the qualifications available and the time expected for each. At the time Andreas Schilling shared this image on twitter and was directly asked about N7/N6 on N5;


I guess that anyone's guess is as good as anyone's elses. MLID is saying N6 on N5... which to me is enough to say that mixed configs don't exist atm :p
Mixed configurations are definitely possible and are coming as far as I'm aware.

No comment on which product I'm referring to specifically :)

(Although realistically you could probably guess because if it's not RDNA3...)
 

RnR_au

Platinum Member
Jun 6, 2021
2,476
5,856
136
Mixed configurations are definitely possible and are coming as far as I'm aware.

No comment on which product I'm referring to specifically :)

(Although realistically you could probably guess because if it's not RDNA3...)
Fair enough. Perhaps I've read too much into the roadmaps - just find it odd that they wouldn't mention N6-N5 but they do mention N5-N3.
 

Timorous

Golden Member
Oct 27, 2008
1,944
3,784
136
Mixed configurations are definitely possible and are coming as far as I'm aware.

No comment on which product I'm referring to specifically :)

(Although realistically you could probably guess because if it's not RDNA3...)

It is obvious. RDNA3 stacking is N6 cache on N6 MCD so the only other part is Zen 4 3D with N6 cache on N5 CCD.
 
  • Like
Reactions: Tlh97 and Joe NYC

BorisTheBlade82

Senior member
May 1, 2020
688
1,085
136
I admit that I belong to the camp Replacement of IFOP interconnect.
This I had already checked in the Zen 4 thread and I also had dreamed of it being replaced already before Zen 3 launched.
Since then I have been asking myself how an EFB-based solution could look like from a Layout-perspective and if there will be a need to employ some rather crazy geometries. So I finally decided to generate some layouts based on the following assumptions:
  • AMD has to keep the AM5 socket and might want to keep the SP5 socket as well. So there are restrictions to the allowed area. For AM5 I assumed that a square of 21.4x21.4mm² should be realistic - by reordering some of the "bird food". For SP5 I am sticking to the area used by the 12 CCD Genoa SKU - which is well known through the Gigabyte Leaks.
  • AMD will want to stick with using the same CCD(s) for Server as well as Client and maybe even Mobile in the future.
  • AMD will want to minimize the amount of different dies - so a solution like SPR were 2 different dies are needed for 4 tiles might not be sufficient for them.
  • There will be no daisy-chaining of CCDs. There will be no tunneling an EFB under a CCD to reach another CCD. So all CCDs will have to share a common border to the IOD.
  • As a guideline I assumed that the IOD-area of Server and Client Zen 5 will be in the same ballpark as Zen 4. Of course this could be totally wrong. For example I would assume that EFB takes less area than an IFOP with the same bandwith, but who knows.
I am basing my mockups on the measurements of this article https://wccftech.com/amd-epyc-genoa...-sp5-lga-6096-server-platform-details-leaked/ and this picture:
Desktop Zen4 2 CCD.jpg

So if I assume only a moderate increase of the CCD the following layouts could be possible for the speculated 128c EPYC Turin and Client Zen 5 with up to three CCD.

Zen 5 Client
Zen Geometries-Zen5 Client.png
There is still some area left to increase the height of the CCD.

EPYC Turin 128c 16 CCD
Zen Geometries-EPYC Turin.png
  • I decided for the IOD to be cut in half and reconnected by EFB as well. As the IOD is basically a cross-bar I do not know if EFB could provide enough bandwith at <7mm width.
  • Also here there is about 0.5mm height-increase possible for the CCD increasing it to a bit over 80mm².

Of course this is all just armchair-engineering. But somehow it nicely fits.
Excitedly awaiting your comments...
 

SteinFG

Senior member
Dec 29, 2021
713
845
106
Excitedly awaiting your comments...

nice rectangles 👍that's all I can say about this.

neatly fitting chiplets is the last thing that comes up when designing a substrate, so idk why people keep making such edits.

EFB is very costly, they're going to use Fan-out insted, most likely. If they even want to use it.
 

Exist50

Platinum Member
Aug 18, 2016
2,452
3,105
136
I admit that I belong to the camp Replacement of IFOP interconnect.
This I had already checked in the Zen 4 thread and I also had dreamed of it being replaced already before Zen 3 launched.
Since then I have been asking myself how an EFB-based solution could look like from a Layout-perspective and if there will be a need to employ some rather crazy geometries. So I finally decided to generate some layouts based on the following assumptions:
  • AMD has to keep the AM5 socket and might want to keep the SP5 socket as well. So there are restrictions to the allowed area. For AM5 I assumed that a square of 21.4x21.4mm² should be realistic - by reordering some of the "bird food". For SP5 I am sticking to the area used by the 12 CCD Genoa SKU - which is well known through the Gigabyte Leaks.
  • AMD will want to stick with using the same CCD(s) for Server as well as Client and maybe even Mobile in the future.
  • AMD will want to minimize the amount of different dies - so a solution like SPR were 2 different dies are needed for 4 tiles might not be sufficient for them.
  • There will be no daisy-chaining of CCDs. There will be no tunneling an EFB under a CCD to reach another CCD. So all CCDs will have to share a common border to the IOD.
  • As a guideline I assumed that the IOD-area of Server and Client Zen 5 will be in the same ballpark as Zen 4. Of course this could be totally wrong. For example I would assume that EFB takes less area than an IFOP with the same bandwith, but who knows.
I am basing my mockups on the measurements of this article https://wccftech.com/amd-epyc-genoa...-sp5-lga-6096-server-platform-details-leaked/ and this picture:
View attachment 69848

So if I assume only a moderate increase of the CCD the following layouts could be possible for the speculated 128c EPYC Turin and Client Zen 5 with up to three CCD.

Zen 5 Client
View attachment 69849
There is still some area left to increase the height of the CCD.

EPYC Turin 128c 16 CCD
View attachment 69850
  • I decided for the IOD to be cut in half and reconnected by EFB as well. As the IOD is basically a cross-bar I do not know if EFB could provide enough bandwith at <7mm width.
  • Also here there is about 0.5mm height-increase possible for the CCD increasing it to a bit over 80mm².

Of course this is all just armchair-engineering. But somehow it nicely fits.
Excitedly awaiting your comments...
So just a very quick observation. Given your die size assumptions, your desktop layout might be plausible, but server definitely not. Or at least not without a billion package layers. You need a lot more shoreline for DDR and PCIe, and can't really route those under the compute dies.
 

BorisTheBlade82

Senior member
May 1, 2020
688
1,085
136
EFB is very costly, they're going to use Fan-out insted, most likely. If they even want to use it.
Which kind of fan out would you guess? I mean EFB is also a fan out variation - IIRC based on TSMC InFo-L(SI). Do you mean InFo-R(DL)? If so: I have so far failed to really understand the concept. Is the RDL silicon-based? If not, what is the significant difference to the current IFOP?

So just a very quick observation. Given your die size assumptions, your desktop layout might be plausible, but server definitely not. Or at least not without a billion package layers. You need a lot more shoreline for DDR and PCIe, and can't really route those under the compute dies.
Please be aware that I did not use the total package area. I only used the area where dies are placed on already in this generation. So wouldn't the factors mentioned already apply to this generation?
 

DisEnchantment

Golden Member
Mar 3, 2017
1,774
6,757
136
I admit that I belong to the camp Replacement of IFOP interconnect.
1666720512112.png

From LinkedIn somewhere in the past (now nuked)
Seems they are still planning SerDes PHY for 3nm nodes, but at least double the speed of Zen 4 GMI3 (32Gbps). GMI2 had 25 Gbps speed.

But the core layout would be different I suppose compared to Z4, they could 3D stack cores on L3 for instance in one CCD.
Meaning more cores in 1 CCD/CCX. Already Mike Clark alluded to this.
We do see core counts growing, and we will continue to increase the number of cores in our core complex that are shared under an L3. As you point out, communicating through that has both latency problems, and coherency problems, but though that's what architecture is, and that's what we signed up for. It’s what we live for - solving those problems. So I'll just say that the team is already looking at what it takes to grow to a complex far beyond where we are today, and how to deliver that in the future.

There will be no tunneling an EFB under a CCD to reach another CCD. So all CCDs will have to share a common border to the IOD.
BTW, they have done routing of IFOP signal lines from underneath the CCD. EPYC Milan has 14 layers in the substrate to handle all of this.
1666721159177.png

Which kind of fan out would you guess? I mean EFB is also a fan out variation - IIRC based on TSMC InFo-L(SI). Do you mean InFo-R(DL)? If so: I have so far failed to really understand the concept. Is the RDL silicon-based? If not, what is the significant difference to the current IFOP?
LSI has an active interconnect together with RDL. By active I mean repeaters, transceivers, routers, switch, mux/demux etc. which has transistors switching on and off.
RDL is passive, just Cu interconnects similar to Metal Layers on the core, but obviously nowhere as dense. But far denser than routing in package substrate. You can fab RDL with old litho tech.

I think it should be something that AMD can manufacture in house without relying on TSMC for assembly. They have a massive new plant in Malaysia dedicated to advanced packaging which will come online in 1H23.
RDNA3 and MI300 could probably give some hints.
 

Joe NYC

Diamond Member
Jun 26, 2021
3,065
4,472
106
I admit that I belong to the camp Replacement of IFOP interconnect.
This I had already checked in the Zen 4 thread and I also had dreamed of it being replaced already before Zen 3 launched.
Since then I have been asking myself how an EFB-based solution could look like from a Layout-perspective and if there will be a need to employ some rather crazy geometries. So I finally decided to generate some layouts based on the following assumptions:
  • AMD has to keep the AM5 socket and might want to keep the SP5 socket as well. So there are restrictions to the allowed area. For AM5 I assumed that a square of 21.4x21.4mm² should be realistic - by reordering some of the "bird food". For SP5 I am sticking to the area used by the 12 CCD Genoa SKU - which is well known through the Gigabyte Leaks.
  • AMD will want to stick with using the same CCD(s) for Server as well as Client and maybe even Mobile in the future.
  • AMD will want to minimize the amount of different dies - so a solution like SPR were 2 different dies are needed for 4 tiles might not be sufficient for them.
  • There will be no daisy-chaining of CCDs. There will be no tunneling an EFB under a CCD to reach another CCD. So all CCDs will have to share a common border to the IOD.
  • As a guideline I assumed that the IOD-area of Server and Client Zen 5 will be in the same ballpark as Zen 4. Of course this could be totally wrong. For example I would assume that EFB takes less area than an IFOP with the same bandwith, but who knows.
I am basing my mockups on the measurements of this article https://wccftech.com/amd-epyc-genoa...-sp5-lga-6096-server-platform-details-leaked/ and this picture:
View attachment 69848

So if I assume only a moderate increase of the CCD the following layouts could be possible for the speculated 128c EPYC Turin and Client Zen 5 with up to three CCD.

Zen 5 Client
View attachment 69849
There is still some area left to increase the height of the CCD.

EPYC Turin 128c 16 CCD
View attachment 69850
  • I decided for the IOD to be cut in half and reconnected by EFB as well. As the IOD is basically a cross-bar I do not know if EFB could provide enough bandwith at <7mm width.
  • Also here there is about 0.5mm height-increase possible for the CCD increasing it to a bit over 80mm².

Of course this is all just armchair-engineering. But somehow it nicely fits.
Excitedly awaiting your comments...

With that many EFBs, it may be more economical to go with interposer, or something like SoIC_H base, that can add some of additional functionality.

Also, AMD has a new socket, I think called SH5. I wonder if the highest end Turin chips may end up in that socket. That socket is apparently quite big, bigger than SP5. It is set to accommodate full Mi300 chip on a socket, to be plugged to motherboard.
 

Joe NYC

Diamond Member
Jun 26, 2021
3,065
4,472
106
View attachment 69855

From LinkedIn somewhere in the past (now nuked)
Seems they are still planning SerDes PHY for 3nm nodes, but at least double the speed of Zen 4 GMI3 (32Gbps). GMI2 had 25 Gbps speed.

But the core layout would be different I suppose compared to Z4, they could 3D stack cores on L3 for instance in one CCD.
Meaning more cores in 1 CCD/CCX. Already Mike Clark alluded to this.



BTW, they have done routing of IFOP signal lines from underneath the CCD. EPYC Milan has 14 layers in the substrate to handle all of this.
View attachment 69856


LSI has an active interconnect together with RDL. By active I mean repeaters, transceivers, routers, switch, mux/demux etc. which has transistors switching on and off.
RDL is passive, just Cu interconnects similar to Metal Layers on the core, but obviously nowhere as dense. But far denser than routing in package substrate. You can fab RDL with old litho tech.

I think it should be something that AMD can manufacture in house without relying on TSMC for assembly. They have a massive new plant in Malaysia dedicated to advanced packaging which will come online in 1H23.
RDNA3 and MI300 could probably give some hints.

RDNA3 really lends itself to some of these new technologies. Amazing that AMD managed to keep it under wraps this long. We are 9 days from launch.
 

SteinFG

Senior member
Dec 29, 2021
713
845
106
Which kind of fan out would you guess? I mean EFB is also a fan out variation - IIRC based on TSMC InFo-L(SI). Do you mean InFo-R(DL)? If so: I have so far failed to really understand the concept. Is the RDL silicon-based? If not, what is the significant difference to the current IFOP?


Please be aware that I did not use the total package area. I only used the area where dies are placed on already in this generation. So wouldn't the factors mentioned already apply to this generation?
RDL is not a silicon-based thing. It's like an advanced substrate, placed on top of standard substrate, made for routing lines inbetween usual substrate and the die.
Die to Usual substrate connection pitch is around 130 micrometers (standard thing)
Die to RDL connection pitch is around 40 micrometers (what some think rdna3 will use)
Die to Silicon Bridge pitch is around 25 micromemtes (M1 ultra uses this).
So around 10 times less area for RDL-based interconnect between dies (40²/130²=0.095), and all without using silicon die bridges.
Info taken from
 

Exist50

Platinum Member
Aug 18, 2016
2,452
3,105
136
Please be aware that I did not use the total package area. I only used the area where dies are placed on already in this generation. So wouldn't the factors mentioned already apply to this generation?
I'm going to copy @DisEnchantment's image above to use as reference.
1666752718989.png
See how AMD routes the PCIe in that central column between the CCDs, and the DDR channels into the open space on the sides? That's the issue I'm talking about. Basically, high speed IO (mainly PCIe/CXL and DDR) is both demanding on physical routing area in the package and very sensitive to noise. And the biggest sources of noise are from power delivery and other high speed signals. In practice, this means you need unobstructed area from the edge of your die (shoreline) to the package pins where the IO is ultimately broken out. Honestly surprised it was viable for them to route IFOP under CCD.

Now, you can add extra layers to the package to help with this issue (to a point), but that can get shockingly expensive, so there's a lot of incentive to keep things simple.
 

RTX

Member
Nov 5, 2020
166
121
116
I admit that I belong to the camp Replacement of IFOP interconnect.
This I had already checked in the Zen 4 thread and I also had dreamed of it being replaced already before Zen 3 launched.
Since then I have been asking myself how an EFB-based solution could look like from a Layout-perspective and if there will be a need to employ some rather crazy geometries. So I finally decided to generate some layouts based on the following assumptions:
  • AMD has to keep the AM5 socket and might want to keep the SP5 socket as well. So there are restrictions to the allowed area. For AM5 I assumed that a square of 21.4x21.4mm² should be realistic - by reordering some of the "bird food". For SP5 I am sticking to the area used by the 12 CCD Genoa SKU - which is well known through the Gigabyte Leaks.
  • AMD will want to stick with using the same CCD(s) for Server as well as Client and maybe even Mobile in the future.
  • AMD will want to minimize the amount of different dies - so a solution like SPR were 2 different dies are needed for 4 tiles might not be sufficient for them.
  • There will be no daisy-chaining of CCDs. There will be no tunneling an EFB under a CCD to reach another CCD. So all CCDs will have to share a common border to the IOD.
  • As a guideline I assumed that the IOD-area of Server and Client Zen 5 will be in the same ballpark as Zen 4. Of course this could be totally wrong. For example I would assume that EFB takes less area than an IFOP with the same bandwith, but who knows.
I am basing my mockups on the measurements of this article https://wccftech.com/amd-epyc-genoa...-sp5-lga-6096-server-platform-details-leaked/ and this picture:
View attachment 69848

So if I assume only a moderate increase of the CCD the following layouts could be possible for the speculated 128c EPYC Turin and Client Zen 5 with up to three CCD.

Zen 5 Client
View attachment 69849
There is still some area left to increase the height of the CCD.

EPYC Turin 128c 16 CCD
View attachment 69850
  • I decided for the IOD to be cut in half and reconnected by EFB as well. As the IOD is basically a cross-bar I do not know if EFB could provide enough bandwith at <7mm width.
  • Also here there is about 0.5mm height-increase possible for the CCD increasing it to a bit over 80mm².

Of course this is all just armchair-engineering. But somehow it nicely fits.
Excitedly awaiting your comments...
AMD can just save costs by using 8P chip + 16E chip in the form of Zen5 + Zen4c. The down bins from Turin and Bergamo sold to consumers to save money. TSMC is no longer offering discounts to even their largest customers.
 

Abwx

Lifer
Apr 2, 2011
11,818
4,740
136
Slight is Zen 1+ increase (just ~3%). Anything above 10% is not a small jump in IPC.

Zen+ has actually 5% better IPC than Zen.

If we exclude games and using Zen 1 as basis the evolution is 21%, 34% and 51% respectively for Zen 2/3/4, Zen 4 brought a slightly better IPC improvement than Zen 3 vs Zen 2.

Btw Zen 4 has 2% better MT IPC than ADL/RPL on a 8C comparison basis.

 
  • Like
Reactions: Tlh97

DrMrLordX

Lifer
Apr 27, 2000
22,586
12,467
136
AMD can just save costs by using 8P chip + 16E chip in the form of Zen5 + Zen4c. The down bins from Turin and Bergamo sold to consumers to save money. TSMC is no longer offering discounts to even their largest customers.

Going from 16 Zen4 cores on the desktop to 8 Zen5 cores would not be a great look for AMD.