Discussion Zen 5 Discussion (EPYC Turin and Strix Point/Granite Ridge - Ryzen 8000)

Page 16 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DisEnchantment

Golden Member
Mar 3, 2017
1,590
5,722
136
Well, since many folks already got their hands (or at least going to get) on Zen 4 CPUs , time to discuss about Zen 5 (Zen 4 already old news :D)

We already got roadmaps and key technologies like AIE
1664493390795.png

1664493471118.png

1664493514999.png

1664493556491.png
1681912883215.png
Some things we already knew
  • Dr. Lisa Su and Forrest Norrod already mentioned at FAD 2022 on May 9th, during Q&A that Zen 5 will come in N3 and N4/5 variants so it will be on multiple nodes.
  • Mark Papermaster highlighted that it will be a grounds up architecture, Also mentioned last para here
  • Mike Clark mentioned that they started to work on Zen 5 already in 2018. This means Zen 5 by the time it launches would have been under conception and planning and development for much longer than the original Zen program
For a CPU architecture launching in early 2024 in the form of Strix Point for OEM notebook refresh, tape out should be happening in the next few months already.
Share your thoughts


"I just wanted to close my eyes, go to sleep, and then wake up and buy this thing. I want to be in the future, this thing is awesome and it's going be so great - I can't wait for it." - Mike Clark
 
Last edited:

BorisTheBlade82

Senior member
May 1, 2020
660
1,003
106
This is a continuation to what @Glo. posted here about Strix Point possibly having 24CU.
I don't really believe in such a big IGP, because I don't think there would be a big enough market for It, but let's imagine It's real.
I will use LLC instead of L3 and IC and It will have Its own chiplet.

There could be 10 different chiplets and 5 chiplets per APU:
1.) Zen5 chiplet in 2 versions, 4core and 8core (3nm)
2.) Zen4c chiplet in 2 versions, 4core and 8core (3nm)
3.) IGP chiplet in 2 versions, 16CU and 24CU (3nm)
4.) Cache chiplet in 3 versions, 36MB, 64MB and 96MB (6nm)
5.) A single IO chiplet with 2*64bit DDR5(4x32bit LPDDR5), video stuff, PHY etc. (5-6nm)

This is how different models could look like in real life, but these sizes are just for show.
Width would stay the same and only height would change depending on which chiplet you use.
View attachment 74727View attachment 74728View attachment 74729 View attachment 74730View attachment 74731

You could make a lot of combinations with different chiplets while not cutting them down much excluding LLC chiplet. Power limit and LLC was based on the IGP's needs.
Zen 5Zen 4cIGPIGP gaming frequencyLast Level CachePower Limit -> gaming
6C16T Model3 Cores; L2: 6MB3 Cores; L2: 1.5MB12 CU; 48TMU; 24 ROP2400 MHzTotal: 21MB
CPU: 9 + IGP: 12MB
30W
CPU: 15W + IGP: 15W
6C16T Model3 Cores; L2: 6MB3 Cores; L2: 1.5MB16 CU; 64TMU; 32 ROP2400 MHzTotal: 33MB
CPU: 9 + IGP: 24MB
35W
CPU: 15W + IGP: 20W
8C16T Model4 Cores; L2: 4MB4 Cores; L2: 2MB16 CU; 64TMU; 32 ROP2400 MHzTotal: 36MB
CPU: 12 + IGP: 24MB
35W
CPU: 15W + IGP: 20W
8C16T Model4 Cores; L2: 4MB4 Cores; L2: 2MB20 CU; 80TMU; 40 ROP2400 MHzTotal: 48MB
CPU: 12 + IGP: 36MB
40W
CPU: 15W + IGP: 25W
8C16T Model4 Cores; L2: 4MB4 Cores; L2: 2MB24 CU; 96TMU; 48 ROP2400 MHzTotal: 60MB
CPU: 12 + IGP: 48MB
45W
CPU: 15W + IGP: 30W
10C20T model4 Cores; L2: 4MB6 Cores; L2: 3MB16 CU; 64TMU; 32 ROP2800 MHzTotal: 46 MB
CPU: 14 + IGP: 32MB
50W
CPU: 20W + IGP: 30W
10C20T model4 Cores; L2: 4MB6 Cores; L2: 3MB20 CU; 80TMU; 40 ROP2800 MHzTotal: 60MB
CPU: 14 + IGP: 46MB
57W
CPU: 20W + IGP: 37W
12C24T model4 Cores; L2: 4MB8 Cores; L2: 4MB16 CU; 64TMU; 32 ROP2800 MHzTotal: 48MB
CPU: 16 + IGP: 32MB
55W
CPU: 25W + IGP: 30W
12C24T model4 Cores; L2: 4MB8 Cores; L2: 4MB20 CU; 80TMU; 40 ROP2800 MHzTotal: 62MB
CPU: 16 + IGP: 46MB
62W
CPU: 25W + IGP: 37W
12C24T model4 Cores; L2: 4MB8 Cores; L2: 4MB24 CU; 96TMU; 48 ROP2800 MHzTotal: 76MB
CPU: 16 + IGP: 60MB
70W
CPU: 25W + IGP: 45W
14C28T model6 Cores; L2: 6MB8 Cores; L2: 4MB16 CU; 64TMU; 32 ROP2800 MHzTotal: 52MB
CPU: 20 + IGP: 32MB
60W
CPU: 30W + IGP: 30W
14C28T model6 Cores; L2: 6MB8 Cores; L2: 4MB20 CU; 80TMU; 40 ROP2800 MHzTotal: 66MB
CPU: 20 + IGP: 46MB
67W
CPU: 30W + IGP: 37W
16C32T model8 Cores; L2: 8MB8 Cores; L2: 4MB16 CU; 64TMU; 32 ROP3200 MHzTotal: 64MB
CPU: 24 + IGP: 40MB
75W
CPU: 35W + IGP: 40W
16C32T model8 Cores; L2: 8MB8 Cores; L2: 4MB20 CU; 80TMU; 40 ROP3200 MHzTotal: 80MB
CPU: 24 + IGP: 56MB
85W
CPU: 35W + IGP: 50W
16C32T model8 Cores; L2: 8MB8 Cores; L2: 4MB24 CU; 96TMU; 48 ROP3200 MHzTotal: 96MB
CPU: 24 + IGP: 72MB
95W
CPU: 35W + IGP: 60W

Don't want to shoot down your line if thinking entirely, but from all we know AMD tries to minimize the amount of individual chips for the entirety of their markets as much as possible. Bring-up-costs for only one 7nm chip are said to be in the ballpark of 50-75 Mio. USD - see the linked article.
No way that AMD will create that many dies for just a small part of their portfolio.

 

Tigerick

Senior member
Apr 1, 2022
571
446
96
This is a continuation to what @Glo. posted here about Strix Point possibly having 24CU.
I don't really believe in such a big IGP, because I don't think there would be a big enough market for It, but let's imagine It's real.
I will use LLC instead of L3 and IC and It will have Its own chiplet.

There could be 10 different chiplets and 5 chiplets per APU:
1.) Zen5 chiplet in 2 versions, 4core and 8core (3nm)
2.) Zen4c chiplet in 2 versions, 4core and 8core (3nm)
3.) IGP chiplet in 2 versions, 16CU and 24CU (3nm)
4.) Cache chiplet in 3 versions, 36MB, 64MB and 96MB (6nm)
5.) A single IO chiplet with 2*64bit DDR5(4x32bit LPDDR5), video stuff, PHY etc. (5-6nm)

This is how different models could look like in real life. Width would stay the same and only height would change depending on which chiplet you use.
View attachment 74727View attachment 74728View attachment 74729 View attachment 74730View attachment 74731

You could make a lot of combinations with different chiplets while not cutting them down much excluding LLC chiplet. Power limit and LLC was based on the IGP's needs.
Zen 5Zen 4cIGPIGP gaming frequencyLast Level CachePower Limit -> gaming
6C16T Model3 Cores; L2: 6MB3 Cores; L2: 1.5MB12 CU; 48TMU; 24 ROP2400 MHzTotal: 21MB
CPU: 9 + IGP: 12MB
30W
CPU: 15W + IGP: 15W
6C16T Model3 Cores; L2: 6MB3 Cores; L2: 1.5MB16 CU; 64TMU; 32 ROP2400 MHzTotal: 33MB
CPU: 9 + IGP: 24MB
35W
CPU: 15W + IGP: 20W
8C16T Model4 Cores; L2: 4MB4 Cores; L2: 2MB16 CU; 64TMU; 32 ROP2400 MHzTotal: 36MB
CPU: 12 + IGP: 24MB
35W
CPU: 15W + IGP: 20W
8C16T Model4 Cores; L2: 4MB4 Cores; L2: 2MB20 CU; 80TMU; 40 ROP2400 MHzTotal: 48MB
CPU: 12 + IGP: 36MB
40W
CPU: 15W + IGP: 25W
8C16T Model4 Cores; L2: 4MB4 Cores; L2: 2MB24 CU; 96TMU; 48 ROP2400 MHzTotal: 60MB
CPU: 12 + IGP: 48MB
45W
CPU: 15W + IGP: 30W
10C20T model4 Cores; L2: 4MB6 Cores; L2: 3MB16 CU; 64TMU; 32 ROP2800 MHzTotal: 46 MB
CPU: 14 + IGP: 32MB
50W
CPU: 20W + IGP: 30W
10C20T model4 Cores; L2: 4MB6 Cores; L2: 3MB20 CU; 80TMU; 40 ROP2800 MHzTotal: 60MB
CPU: 14 + IGP: 46MB
57W
CPU: 20W + IGP: 37W
12C24T model4 Cores; L2: 4MB8 Cores; L2: 4MB16 CU; 64TMU; 32 ROP2800 MHzTotal: 48MB
CPU: 16 + IGP: 32MB
55W
CPU: 25W + IGP: 30W
12C24T model4 Cores; L2: 4MB8 Cores; L2: 4MB20 CU; 80TMU; 40 ROP2800 MHzTotal: 62MB
CPU: 16 + IGP: 46MB
62W
CPU: 25W + IGP: 37W
12C24T model4 Cores; L2: 4MB8 Cores; L2: 4MB24 CU; 96TMU; 48 ROP2800 MHzTotal: 76MB
CPU: 16 + IGP: 60MB
70W
CPU: 25W + IGP: 45W
14C28T model6 Cores; L2: 6MB8 Cores; L2: 4MB16 CU; 64TMU; 32 ROP2800 MHzTotal: 52MB
CPU: 20 + IGP: 32MB
60W
CPU: 30W + IGP: 30W
14C28T model6 Cores; L2: 6MB8 Cores; L2: 4MB20 CU; 80TMU; 40 ROP2800 MHzTotal: 66MB
CPU: 20 + IGP: 46MB
67W
CPU: 30W + IGP: 37W
16C32T model8 Cores; L2: 8MB8 Cores; L2: 4MB16 CU; 64TMU; 32 ROP3200 MHzTotal: 64MB
CPU: 24 + IGP: 40MB
75W
CPU: 35W + IGP: 40W
16C32T model8 Cores; L2: 8MB8 Cores; L2: 4MB20 CU; 80TMU; 40 ROP3200 MHzTotal: 80MB
CPU: 24 + IGP: 56MB
85W
CPU: 35W + IGP: 50W
16C32T model8 Cores; L2: 8MB8 Cores; L2: 4MB24 CU; 96TMU; 48 ROP3200 MHzTotal: 96MB
CPU: 24 + IGP: 72MB
95W
CPU: 35W + IGP: 60W
Wow, that is a lot of combinations. I have two theories to discuss:

1. STX2 should be similar to PHX2 but made by N3E. So it could still maintain monolithic design in order to target lower power TDP. STX2 mainly targeting upcoming Intel Lunar Lake which might be monolithic design.

2. I also don't believe in such a big iGPU, most likely STX1 will be using N32 chiplet design with 768SP and 1536ALU which provided around 9 TF. The RDNA3+ most likely refer to N32 graphics engine whereas RDNA3 in PP uses N33 graphics engine.

My point is AMD split the design into two: STX1 uses chiplet design and STX2 uses monolithic design. Intel might be doing the same thing by spliting design with Lunar Lake and Panther Lake...we wil see, too many unknown atm. If you guys has any different ideas, you are welcome to pinch in. :p
 
Last edited:

Glo.

Diamond Member
Apr 25, 2015
5,659
4,419
136
Every APU is monolithic.

What is chiplet based are only the caches. This is AMDs approach, not Intel's.

Remember this for every analysis.

Its absolutely obvious WHY would this be.

And no, I don't believe there would not big enough market, especially considering that EVERYBODY will integrate system on chips for future tech: wearables, cars infotainment, AR/VR, laptops, desktops, SFF computing.

Thats the reason why INTEL is developing GPUs that they can integrate into their own CPUs. THATS the reason why Nvidia tried to buy ARM. AMD cannot be behind Intel on this front.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,329
2,811
106
That's essentially where I'd like to see AMD go in the future and really is the ultimate end-game of a chiplet based approach.

There's probably always going to be some market for a monolithic design, but I could see that being relegated to niche markets over time.

I suspect that we potentially get some dual-GPU designs as well. As long as the physical size doesn't interfere, there's no reason note to offer a 36/48 CU option for a gaming APU that provides pretty good performance without the need to add in a discrete card.

I also could imagine AMD doing something like Intel where it designs a smaller core that's built around providing more throughput for CPU compute. The Zen core is already considerably smaller than Intel's performance core so there's not as much pressure for them to do this. You could also say that the more densely packed Zen 4c is already them doing this, but I wonder how much further they could take it.
I didn't consider making a Dual IGP(GPU) chiplet, because of low BW. Even with this I had to use 96MB LLC. If they put a single HBM there, this would be very possible.

Don't want to shoot down your line if thinking entirely, but from all we know AMD tries to minimize the amount of individual chips for the entirety of their markets as much as possible. Bring-up-costs for only one 7nm chip are said to be in the ballpark of 50-75 Mio. USD - see the linked article.
No way that AMD will create that many dies for just a small part of their portfolio.

This could be used for the whole portfolio. You want a 32 core CPU without a large IGP?
Put 4 CPU chiplets, 96MB LLC and remove IGP chiplet. IO chiplet could have a very small 2CU IGP inside, which would be deactivated If you also have a separate IGP(GPU) chiplet.

This was just an example, so I didn't try to minimize It as much as possible.
I can combine 4C8T Zen5 and 4C8T Zen4c into one and just add more CPU chiplets for higher core count If needed.
I can use only the 24CU GPU chiplet.
LLC chiplet could be also reduced to just 2 versions -> 48MB and 96MB.
Now you only have 5 different chiplets to design.
The problem with this approach is you will have to cut(deactivate) a lot of die space for some models.
 

BorisTheBlade82

Senior member
May 1, 2020
660
1,003
106
It should be "Every APU in Phoenix, Strix Point lineup is monolithic".

Does it clear now, for you?
I personally would not rule out Strix Point to be chiplet-based (this is by the way how to state an opinion). I am pretty sure that AMD will either react to MTL or already planned going chiplet in the Mobile space without being as vocal about it as Intel.
 
  • Like
Reactions: igor_kavinski

Glo.

Diamond Member
Apr 25, 2015
5,659
4,419
136
I personally would not rule out Strix Point to be chiplet-based (this is by the way how to state an opinion). I am pretty sure that AMD will either react to MTL or already planned going chiplet in the Mobile space without being as vocal about it as Intel.
AMD doesn't need to react to Intel, and go chiplet, especially when they won't have the benefit in it.

Go ahead and explain to everybody why its beneficial for AMD to design separate chiplets for: 4nm Zen 5, 4 nm Zen 4C, 4 nm RDNA3+ iGPU, and 4 nm AI engines, 6 nm Cache+memory controller instead of designing ONE monolithic APU, with 6 nm cache chiplets, for specific purpose? Why is for AMD more beneficial to design something that costs more, in the beginning, than design ONE part that costs from the start less?

Intel has two separate process nodes to work with. Their own fabs, and their own CPUs and the GPU adventure they went on. AMD doesn't.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,329
2,811
106
AMD doesn't need to react to Intel, and go chiplet, especially when they won't have the benefit in it.

Go ahead and explain to everybody why its beneficial for AMD to design separate chiplets for: 4nm Zen 5, 4 nm Zen 4C, 4 nm RDNA3+ iGPU, and 4 nm AI engines, 6 nm Cache+memory controller instead of designing ONE monolithic APU, with 6 nm cache chiplets, for specific purpose? Why is for AMD more beneficial to design something that costs more, in the beginning, than design ONE part that costs from the start less?

Intel has two separate process nodes to work with. Their own fabs, and their own CPUs and the GPU adventure they went on. AMD doesn't.
What I wrote is an extreme example, It's not like It's based on leaks or anything. It doesn't even need to be Strix Point but a future APU and I already reduced the amount of different chiplets.

If Strix Point has cache+memory controller chiplets like N31 or N32 then It's not a monolith.
Why only one Strix Point should be designed? If they don't increase the core count, then I can understand that. If there are more cores which I expect, then a single design is not such a good option, especially with a big 24CU IGP. 2 designs should be enough.

P.S. You didn't need to make a separate chiplet for everything just to make your point. ;)
 
Last edited:

BorisTheBlade82

Senior member
May 1, 2020
660
1,003
106
AMD doesn't need to react to Intel, and go chiplet, especially when they won't have the benefit in it.

Go ahead and explain to everybody why its beneficial for AMD to design separate chiplets for: 4nm Zen 5, 4 nm Zen 4C, 4 nm RDNA3+ iGPU, and 4 nm AI engines, 6 nm Cache+memory controller instead of designing ONE monolithic APU, with 6 nm cache chiplets, for specific purpose? Why is for AMD more beneficial to design something that costs more, in the beginning, than design ONE part that costs from the start less?

Intel has two separate process nodes to work with. Their own fabs, and their own CPUs and the GPU adventure they went on. AMD doesn't.
Please do not put words in my mouth. I argued against that many chiplets. I could imagine them to use one ZenX and one ZenXc chiplet for all markets. This could be coupled to few IODs incl. iGPU again for all markets. And maybe somewhere some Cache (although I still do not see a wide market for powerful iGPUs just as you do).
 
  • Like
Reactions: Kaluan

Glo.

Diamond Member
Apr 25, 2015
5,659
4,419
136
What I wrote is an extreme example, It's not like It's based on leaks or anything. It doesn't even need to be Strix Point but a future APU.

If Strix Point has cache+memory controller chiplets like N31 or N32 then It's not a monolith.
Why only one Strix Point should be designed? If they don't increase the core count, then I can understand that. If there are more cores, then a single design is not such a good option, especially with a big 24CU IGP.

P.S. You didn't need to make a separate chiplet for everything just to make your point. ;)
Please do not put words in my mouth. I argued against that many chiplets. I could imagine them to use one ZenX and one ZenXc chiplet for all markets. This could be coupled to few IODs incl. iGPU again for all markets. And maybe somewhere some Cache (although I still do not see a wide market for powerful iGPUs just as you do).
I thought we have a discussion about SP :)

On smaller nodes, like N3, on which SP will be, its more beneficial for sub 200mm2 die sizes to remain monolithic in design because of costs of the design. Yielding IP on 3 nm will be eye watering, especially if we break, very small die, into even smaller dies.

The ONLY place, where it remains beneficial to break monolithic designs into smaller pieces - is when you get larger die, you lower the design costs, and manufacturing costs, by breaking apart each IP. So larger, more performant/powerful designs.

So IF there is larger Strix Point on the tables, like 8P/16E/48 CU - then yes, its much more beneficial to break this design into pieces, because that way - those pieces you can use in other places, like for example - smaller/normal Strix Point APU, that is supposed to be lower cost. You scale your design from the top and you go lower down(assuming you are able to make two GPU chiplets to be seen as ONE GPU).

In this scenario - yes, its possible that AMD could break each IP into separate chiplets. If there is no scenario like this - there is zero financial incentive, because of astronomic design costs, and then manufacturing.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,590
5,722
136
The market is there for both to coexist, say a monolithic Strix Point for 10W-25W and DT derived chiplet based APU for 25W-65W. Just like with Zen 4 for instance; Dragon Range, Phoenix and Phoenix2.
AMD cannot ignore the low power but price sensitive sensitive segment, wherein the likes of Qualcomm SoCs will play and they will blow their trumpet hard.
Likely for Low Power Mobile it may not even be a clear cut advantage to go with chiplets if you consider the fact that TSMC is now offering FinFlex, unless AMD is planning 220mm2+ dies for the ultra low power and we know that is not happening.

With FinFlex they could use 2-2 Fins for the P cores and 2-1 Fins for E Core, cache and CUs since they can clock them lower. Should help with density and efficiency greatly.
If PHX2 is an indicator of future direction, likely the efficiency cores would be something like 8C Zen 5 with half the L3 using the 2-1 Fins and 4C performance cores on 2-2 Fins with full L3, for a total of 12C/24T Zen 5 cores. Or something like this arrangement. Putting Zen 4c in a Zen 5 SoC would land AMD in a similar situation of AVX512 support in ADL, since we know AMD hinted of new AI instructions coming for Zen 5.
N3E 2-1 fins don't have a regression vs N5 2-2 fins in performance. Good fit for CUs and E Cores. And the density gains are very significant vs N5 2-2 fins, 1.56x vs 1.38x for N3 2-2 Fins for logic, for which there are lots of within the CUs

The high end mobile though would be covered by the mainstream DT APUs, this is apparent that AMD needs to address this following the footsteps of Intel.

But lets see if the new interconnects can be energy frugal enough that AMD can address the 10W-150W market with a single unified chiplet approach. If this is the case, then Zen 5 would be formidable across the entire TDP range and provide AMD with an amazing product flexibility. This is why I feel the interconnects and packaging are the most interesting part of MI300 and can be a key lever for AMD to scale performance and efficiency across all segments.
 

BorisTheBlade82

Senior member
May 1, 2020
660
1,003
106
The market is there for both to coexist, say a monolithic Strix Point for 10W-25W and DT derived chiplet based APU for 25W-65W. Just like with Zen 4 for instance; Dragon Range, Phoenix and Phoenix2.
AMD cannot ignore the low power but price sensitive sensitive segment, wherein the likes of Qualcomm SoCs will play and they will blow their trumpet hard.
Likely for Low Power Mobile it may not even be a clear cut advantage to go with chiplets if you consider the fact that TSMC is now offering FinFlex, unless AMD is planning 220mm2+ dies for the ultra low power and we know that is not happening.

With FinFlex they could use 2-2 Fins for the P cores and 2-1 Fins for E Core, cache and CUs since they can clock them lower. Should help with density and efficiency greatly.
If PHX2 is an indicator of future direction, likely the efficiency cores would be something like 8C Zen 5 with half the L3 using the 2-1 Fins and 4C performance cores on 2-2 Fins with full L3, for a total of 12C/24T Zen 5 cores. Or something like this arrangement. Putting Zen 4c in a Zen 5 SoC would land AMD in a similar situation of AVX512 support in ADL, since we know AMD hinted of new AI instructions coming for Zen 5.
N3E 2-1 fins don't have a regression vs N5 2-2 fins in performance. Good fit for CUs and E Cores. And the density gains are very significant vs N5 2-2 fins, 1.56x vs 1.38x for N3 2-2 Fins for logic, for which there are lots of within the CUs

The high end mobile though would be covered by the mainstream DT APUs, this is apparent that AMD needs to address this following the footsteps of Intel.

But lets see if the new interconnects can be energy frugal enough that AMD can address the 10W-150W market with a single unified chiplet approach. If this is the case, then Zen 5 would be formidable across the entire TDP range and provide AMD with an amazing product flexibility. This is why I feel the interconnects and packaging are the most interesting part of MI300 and can be a key lever for AMD to scale performance and efficiency across all segments.
I am totally aware of everything you wrote and I absolutely agree with you. I am especially excited for MI300 for the very same reasons as well and I hopefully pointed that out in the specific thread 😊
 
  • Like
Reactions: Joe NYC

moinmoin

Diamond Member
Jun 1, 2017
4,934
7,619
136
I wonder if one solution combining both monolithic and chiplet approaches could be having a power optimized self sufficient IOD essentially improving on Mendocino with 2 channels, newer dense Zen cores, AI engine and sufficient IO to be expandable with additional CCDs and/or IGPs. This could also be the kind of hybrid design enabling the choice between maximum possible efficiency and maximum performance.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,329
2,811
106
I thought we have a discussion about SP :)

On smaller nodes, like N3, on which SP will be, its more beneficial for sub 200mm2 die sizes to remain monolithic in design because of costs of the design. Yielding IP on 3 nm will be eye watering, especially if we break, very small die, into even smaller dies.

The ONLY place, where it remains beneficial to break monolithic designs into smaller pieces - is when you get larger die, you lower the design costs, and manufacturing costs, by breaking apart each IP. So larger, more performant/powerful designs.

So IF there is larger Strix Point on the tables, like 8P/16E/48 CU - then yes, its much more beneficial to break this design into pieces, because that way - those pieces you can use in other places, like for example - smaller/normal Strix Point APU, that is supposed to be lower cost. You scale your design from the top and you go lower down(assuming you are able to make two GPU chiplets to be seen as ONE GPU).

In this scenario - yes, its possible that AMD could break each IP into separate chiplets. If there is no scenario like this - there is zero financial incentive, because of astronomic design costs, and then manufacturing.
Yes, we are talking about Strix Point, but I made a lego APU, which not everyone liked, so I added that It could be a next gen APU. :D
It was a mistake to make separate Zen5 and Zen4c chiplets, because they would be ridiculously small.
Now there are only 5 different chiplets, but still 4-5 chiplets per APU.

$22,000 N3 wafer; $15,000 N5 wafer; $7,000 N6 wafer, defect density 0.1#/sq. cm
CPU chiplet 3nmGPU chiplet 3nmLLC chiplet: 40MB 6nmLLC chiplet: 72MB 6nmIO chiplet 5nm3nm ~175mm2
monolith
8P/8E/24CU/72MB
Good dies1648143219341130785271
Prices per die$13.35$15.36$3.62$6.2$19.11$81.18

CPU chiplet 3nm:
4P 8MB L2 /4E 4MB L2 + 16MB L3
GPU chiplet 3nm:
24 CU; 96TMU; 48 ROP
LLC chiplet 6nm:
40MB or 72MB
IO chiplet 5nm:
2CU IGP, memory PHY, PCI-x, media etc.
Total die sizeCost for dies
4P/4E/16CU/24MB LLC35 mm240 mm230 mm270 mm2175 mm2$51.44
4P/8E/16CU/32MB LLC2 * 35 mm240 mm230 mm270 mm2210 mm2$64.79
8P/8E/16CU/40MB LLC2 * 35 mm240 mm230 mm270 mm2210 mm2$64.79
4P/4E/24CU/48MB LLC35 mm240 mm250 mm270 mm2195 mm2$54.02
4P/8E/24CU/60MB LLC2 * 35 mm240 mm250 mm270 mm2230 mm2$67.37
8P/8E/24CU/72MB LLC2 * 35 mm240 mm250 mm270 mm2230 mm2$67.37
4P/4E/2CU/24MB LLC35 mm2030 mm270 mm2135 mm2$38.66
8P/8E/2CU/48MB LLC2 * 35 mm2050 mm270 mm2170 mm2$52.01
12P/12E/2CU/72MB LLC3 * 35 mm2050 mm270 mm2225 mm2$52.01

Monolith 8P/8E/24CU/72MB would be $81.18 + $10 packaging = $91.18
Chiplet 8P/8E/24CU/72MB would be $67.37 + $30 packaging = $97.37
Monolith should be cheaper, but you still need 2 extra 3nm monoliths, or chiplet APU would be cheaper.
One would be 140mm2 4P/4E/24CU/48MB and the other 170mm2 12P/12E/2CU/72MB.

Chiplet has a pretty big advantage compared to monolith, you can make 1432 triplets(CPU+CPU+GPU chiplet) from three 3nm wafers and still 432 CPU dies will be left.
From a monolith, you can do 3*271 = 813 complete dies.
This would help a lot in making more APUs If there is not enough 3nm wafers.

I think designing 5 chiplets for APU should be easier and cheaper than designing 3 monolithic APUs.
 

Mopetar

Diamond Member
Jan 31, 2011
7,797
5,899
136
They easily can and will, but the consumer parts might be rather limited. Look at what AMD, NVidia, and Intel charge for the data center parts they make and tell me that a $20,000 wafer prevents them from making money.
 
  • Like
Reactions: ftt

HurleyBird

Platinum Member
Apr 22, 2003
2,670
1,250
136
This is a continuation to what @Glo. posted here about Strix Point possibly having 24CU.
I don't really believe in such a big IGP, because I don't think there would be a big enough market for It, but let's imagine It's real.

Assuming we're still at least 2 generations out from stacked everything, I think that what really makes sense is merging the mobile parts and the IODs.

For instance, on the next gen normal IOD, add 4 dense, low clockspeed, low power cores to get competitive on idle power consumption (and the bonus is that you get more compute) and pair that with 4 CUs. Manufacture it on an affordable node.

Then for the "super IOD," have 8 dense, low clockspeed, low power cores paired with, say, 24 CUs. Manufacture it on an advanced node.

Then, just by mixing and matching (and before even disabling anything for yields), you get this combination of parts:

4-core 4 CU
12-core 4 CU
20-core 4 CU
8-core 24 CU
16-core 24 CU
24-core 24 CU

And all of these combinations work for both mobile and desktop. You could easily pull 100 SKUs from just three chips, although you probably wouldn't. With so much flexibility, you can counter pretty much anything Intel can come up with. We're not even talking about V-cache yet either.

Of course, there's a lot of room between 4 CUs and 24 CUs, but these specs are just for the purpose of illustration. You might also want to create a specific chip for the ultra low end that wastes less space on I/O than the normal IOD would (although there's still probably a niche for specific embedded applications that need a lot of I/O relative to compute).
 

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
Likely for Low Power Mobile it may not even be a clear cut advantage to go with chiplets if you consider the fact that TSMC is now offering FinFlex
I wouldn't lean so heavily on TSMC's talk of FinFlex. The reality is that while it makes for some convenient libraries at times, it's a real PIA to use, perhaps more than it's worth in many cases. Would hate to make a full SoC out of it.
 

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
All this "super IOD" talk sounds a lot like what Intel's doing with MTL, tbh. Maybe with different tech and balance between the pieces, but it does make some sense. I like the idea of something like 4x Zen 5c on the super-IOD. They could sell that as-is for the low end market where they've been struggling to provide options, and for many day to day scenarios (battery life), it would let them turn off the compute die entirely. That might free them up to use lower cost interconnect chiplet tech without trashing battery life. That said, having the memory controller on a separate die, and (presumably) an older node at at, would be less than ideal.

GPU, I think, also makes sense to split off. They're trying to do one-size-fits-all right now, and it's a compromise. For stuff like office/home machines, the current APU iGPUs are overkill, but gaming and creative could demand even more. Even just two different dies would probably go a long way.

Just throwing a crazy idea out for the sake of discussion, but what about a split super-IOD? I'm thinking the most important stuff (mostly memory controller) and any CPU cores or accelerators on the leading node, and a separate die (with a cheaper, lower-bandwidth interconnect) for all the slower stuff that's fine with N-1. So 4 dies in total: IOD-fast (N), IOD-slow (N-1), compute (optional, N), GPU (optional?, N). Then that IOD-slow could basically be reused for the desktop chipsets. Might also be possible to reuse the IOD-fast between mobile and desktop as well, but that's a stretch.
 
Last edited:

BorisTheBlade82

Senior member
May 1, 2020
660
1,003
106
All this "super IOD" talk sounds a lot like what Intel's doing with MTL, tbh. Maybe with different tech and balance between the pieces, but it does make some sense. I like the idea of something like 4x Zen 5c on the super-IOD. They could sell that as-is for the low end market where they've been struggling to provide options, and for many day to day scenarios (battery life), it would let them turn off the compute die entirely. That might free them up to use lower cost interconnect chiplet tech without trashing battery life. That said, having the memory controller on a separate die, and (presumably) an older node at at, would be less than ideal.

GPU, I think, also makes sense to split off. They're trying to do one-side-fits-all right now, and it's a compromise. For stuff like office/home machines, the current APU iGPUs are overkill, but gaming and creative could demand even more. Even just two different dies would probably go a long way.

Just throwing a crazy idea out for the sake of discussion, but what about a split super-IOD? I'm thinking the most important stuff (mostly memory controller) and any CPU cores or accelerators on the leading node, and a separate die (with a cheaper, lower-bandwidth interconnect) for all the slower stuff that's fine with N-1. So 4 dies in total: IOD-fast (N), IOD-slow (N-1), compute (optional, N), GPU (optional?, N). Then that IOD-slow could basically be reused for the desktop chipsets. Might also be possible to reuse the IOD-fast between mobile and desktop as well, but that's a stretch.
While I find that idea interesting, I see one big problem: As the IOD works as a very fat crossbar internally, there is a huge amount of bandwidth needed in order to join two of them. With a cheap Interconnect you might shoot yourself in the foot regarding consumption. An efficient interconnect like EFB might be rather expensive.
 

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
While I find that idea interesting, I see one big problem: As the IOD works as a very fat crossbar internally, there is a huge amount of bandwidth needed in order to join two of them. With a cheap Interconnect you might shoot yourself in the foot regarding consumption. An efficient interconnect like EFB might be rather expensive.
So my thought was that the only die-to-die connections are from the IOD-fast to each other die, so that would take care of the big-ticket items like feeding the CPU and GPU. Then the IOD-slow would only have to support enough bandwidth to run whatever IO you put on there — namely, stuff like PCIe, USB4, and a few other bits and pieces.

I think the big question would actually be the GPU. It's likely to be the most bandwidth-intensive die, so that link might need special treatment. Perhaps IFOP for the CPU/IOD-slow and "Infinity Fanout Links" or EFB for the GPU?
 
  • Like
Reactions: ftt