Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

BorisTheBlade82 · Jan 16, 2023

TESKATLIPOKA said:
This is a continuation to what @Glo. posted here about Strix Point possibly having 24CU.
I don't really believe in such a big IGP, because I don't think there would be a big enough market for It, but let's imagine It's real.
I will use LLC instead of L3 and IC and It will have Its own chiplet.

There could be 10 different chiplets and 5 chiplets per APU:
1.) Zen5 chiplet in 2 versions, 4core and 8core (3nm)
2.) Zen4c chiplet in 2 versions, 4core and 8core (3nm)
3.) IGP chiplet in 2 versions, 16CU and 24CU (3nm)
4.) Cache chiplet in 3 versions, 36MB, 64MB and 96MB (6nm)
5.) A single IO chiplet with 2*64bit DDR5(4x32bit LPDDR5), video stuff, PHY etc. (5-6nm)

This is how different models could look like in real life, but these sizes are just for show.
Width would stay the same and only height would change depending on which chiplet you use.
View attachment 74727 View attachment 74728 View attachment 74729 View attachment 74730 View attachment 74731

You could make a lot of combinations with different chiplets while not cutting them down much excluding LLC chiplet. Power limit and LLC was based on the IGP's needs.

Zen 5 Zen 4c IGP IGP gaming frequency Last Level Cache Power Limit -> gaming
6C16T Model 3 Cores; L2: 6MB 3 Cores; L2: 1.5MB 12 CU; 48TMU; 24 ROP 2400 MHz Total: 21MB
CPU: 9 + IGP: 12MB 30W
CPU: 15W + IGP: 15W
6C16T Model 3 Cores; L2: 6MB 3 Cores; L2: 1.5MB 16 CU; 64TMU; 32 ROP 2400 MHz Total: 33MB
CPU: 9 + IGP: 24MB 35W
CPU: 15W + IGP: 20W
8C16T Model 4 Cores; L2: 4MB 4 Cores; L2: 2MB 16 CU; 64TMU; 32 ROP 2400 MHz Total: 36MB
CPU: 12 + IGP: 24MB 35W
CPU: 15W + IGP: 20W
8C16T Model 4 Cores; L2: 4MB 4 Cores; L2: 2MB 20 CU; 80TMU; 40 ROP 2400 MHz Total: 48MB
CPU: 12 + IGP: 36MB 40W
CPU: 15W + IGP: 25W
8C16T Model 4 Cores; L2: 4MB 4 Cores; L2: 2MB 24 CU; 96TMU; 48 ROP 2400 MHz Total: 60MB
CPU: 12 + IGP: 48MB 45W
CPU: 15W + IGP: 30W
10C20T model 4 Cores; L2: 4MB 6 Cores; L2: 3MB 16 CU; 64TMU; 32 ROP 2800 MHz Total: 46 MB
CPU: 14 + IGP: 32MB 50W
CPU: 20W + IGP: 30W
10C20T model 4 Cores; L2: 4MB 6 Cores; L2: 3MB 20 CU; 80TMU; 40 ROP 2800 MHz Total: 60MB
CPU: 14 + IGP: 46MB 57W
CPU: 20W + IGP: 37W
12C24T model 4 Cores; L2: 4MB 8 Cores; L2: 4MB 16 CU; 64TMU; 32 ROP 2800 MHz Total: 48MB
CPU: 16 + IGP: 32MB 55W
CPU: 25W + IGP: 30W
12C24T model 4 Cores; L2: 4MB 8 Cores; L2: 4MB 20 CU; 80TMU; 40 ROP 2800 MHz Total: 62MB
CPU: 16 + IGP: 46MB 62W
CPU: 25W + IGP: 37W
12C24T model 4 Cores; L2: 4MB 8 Cores; L2: 4MB 24 CU; 96TMU; 48 ROP 2800 MHz Total: 76MB
CPU: 16 + IGP: 60MB 70W
CPU: 25W + IGP: 45W
14C28T model 6 Cores; L2: 6MB 8 Cores; L2: 4MB 16 CU; 64TMU; 32 ROP 2800 MHz Total: 52MB
CPU: 20 + IGP: 32MB 60W
CPU: 30W + IGP: 30W
14C28T model 6 Cores; L2: 6MB 8 Cores; L2: 4MB 20 CU; 80TMU; 40 ROP 2800 MHz Total: 66MB
CPU: 20 + IGP: 46MB 67W
CPU: 30W + IGP: 37W
16C32T model 8 Cores; L2: 8MB 8 Cores; L2: 4MB 16 CU; 64TMU; 32 ROP 3200 MHz Total: 64MB
CPU: 24 + IGP: 40MB 75W
CPU: 35W + IGP: 40W
16C32T model 8 Cores; L2: 8MB 8 Cores; L2: 4MB 20 CU; 80TMU; 40 ROP 3200 MHz Total: 80MB
CPU: 24 + IGP: 56MB 85W
CPU: 35W + IGP: 50W
16C32T model 8 Cores; L2: 8MB 8 Cores; L2: 4MB 24 CU; 96TMU; 48 ROP 3200 MHz Total: 96MB
CPU: 24 + IGP: 72MB 95W
CPU: 35W + IGP: 60W

Don't want to shoot down your line if thinking entirely, but from all we know AMD tries to minimize the amount of individual chips for the entirety of their markets as much as possible. Bring-up-costs for only one 7nm chip are said to be in the ballpark of 50-75 Mio. USD - see the linked article.
No way that AMD will create that many dies for just a small part of their portfolio.

The Dark Side Of The Semiconductor Design Renaissance – Fixed Costs Soaring Due To Photomask Sets, Verification, and Validation

As the semiconductor design renaissance flourishes, fixed costs and risk soar. Semiconductors will march forward, but how many dead bodies will litter the ground?

www.semianalysis.com

Tigerick · Jan 16, 2023

TESKATLIPOKA said:
This is a continuation to what @Glo. posted here about Strix Point possibly having 24CU.
I don't really believe in such a big IGP, because I don't think there would be a big enough market for It, but let's imagine It's real.
I will use LLC instead of L3 and IC and It will have Its own chiplet.

There could be 10 different chiplets and 5 chiplets per APU:
1.) Zen5 chiplet in 2 versions, 4core and 8core (3nm)
2.) Zen4c chiplet in 2 versions, 4core and 8core (3nm)
3.) IGP chiplet in 2 versions, 16CU and 24CU (3nm)
4.) Cache chiplet in 3 versions, 36MB, 64MB and 96MB (6nm)
5.) A single IO chiplet with 2*64bit DDR5(4x32bit LPDDR5), video stuff, PHY etc. (5-6nm)

This is how different models could look like in real life. Width would stay the same and only height would change depending on which chiplet you use.
View attachment 74727 View attachment 74728 View attachment 74729 View attachment 74730 View attachment 74731

You could make a lot of combinations with different chiplets while not cutting them down much excluding LLC chiplet. Power limit and LLC was based on the IGP's needs.

Zen 5 Zen 4c IGP IGP gaming frequency Last Level Cache Power Limit -> gaming
6C16T Model 3 Cores; L2: 6MB 3 Cores; L2: 1.5MB 12 CU; 48TMU; 24 ROP 2400 MHz Total: 21MB
CPU: 9 + IGP: 12MB 30W
CPU: 15W + IGP: 15W
6C16T Model 3 Cores; L2: 6MB 3 Cores; L2: 1.5MB 16 CU; 64TMU; 32 ROP 2400 MHz Total: 33MB
CPU: 9 + IGP: 24MB 35W
CPU: 15W + IGP: 20W
8C16T Model 4 Cores; L2: 4MB 4 Cores; L2: 2MB 16 CU; 64TMU; 32 ROP 2400 MHz Total: 36MB
CPU: 12 + IGP: 24MB 35W
CPU: 15W + IGP: 20W
8C16T Model 4 Cores; L2: 4MB 4 Cores; L2: 2MB 20 CU; 80TMU; 40 ROP 2400 MHz Total: 48MB
CPU: 12 + IGP: 36MB 40W
CPU: 15W + IGP: 25W
8C16T Model 4 Cores; L2: 4MB 4 Cores; L2: 2MB 24 CU; 96TMU; 48 ROP 2400 MHz Total: 60MB
CPU: 12 + IGP: 48MB 45W
CPU: 15W + IGP: 30W
10C20T model 4 Cores; L2: 4MB 6 Cores; L2: 3MB 16 CU; 64TMU; 32 ROP 2800 MHz Total: 46 MB
CPU: 14 + IGP: 32MB 50W
CPU: 20W + IGP: 30W
10C20T model 4 Cores; L2: 4MB 6 Cores; L2: 3MB 20 CU; 80TMU; 40 ROP 2800 MHz Total: 60MB
CPU: 14 + IGP: 46MB 57W
CPU: 20W + IGP: 37W
12C24T model 4 Cores; L2: 4MB 8 Cores; L2: 4MB 16 CU; 64TMU; 32 ROP 2800 MHz Total: 48MB
CPU: 16 + IGP: 32MB 55W
CPU: 25W + IGP: 30W
12C24T model 4 Cores; L2: 4MB 8 Cores; L2: 4MB 20 CU; 80TMU; 40 ROP 2800 MHz Total: 62MB
CPU: 16 + IGP: 46MB 62W
CPU: 25W + IGP: 37W
12C24T model 4 Cores; L2: 4MB 8 Cores; L2: 4MB 24 CU; 96TMU; 48 ROP 2800 MHz Total: 76MB
CPU: 16 + IGP: 60MB 70W
CPU: 25W + IGP: 45W
14C28T model 6 Cores; L2: 6MB 8 Cores; L2: 4MB 16 CU; 64TMU; 32 ROP 2800 MHz Total: 52MB
CPU: 20 + IGP: 32MB 60W
CPU: 30W + IGP: 30W
14C28T model 6 Cores; L2: 6MB 8 Cores; L2: 4MB 20 CU; 80TMU; 40 ROP 2800 MHz Total: 66MB
CPU: 20 + IGP: 46MB 67W
CPU: 30W + IGP: 37W
16C32T model 8 Cores; L2: 8MB 8 Cores; L2: 4MB 16 CU; 64TMU; 32 ROP 3200 MHz Total: 64MB
CPU: 24 + IGP: 40MB 75W
CPU: 35W + IGP: 40W
16C32T model 8 Cores; L2: 8MB 8 Cores; L2: 4MB 20 CU; 80TMU; 40 ROP 3200 MHz Total: 80MB
CPU: 24 + IGP: 56MB 85W
CPU: 35W + IGP: 50W
16C32T model 8 Cores; L2: 8MB 8 Cores; L2: 4MB 24 CU; 96TMU; 48 ROP 3200 MHz Total: 96MB
CPU: 24 + IGP: 72MB 95W
CPU: 35W + IGP: 60W

Wow, that is a lot of combinations. I have two theories to discuss:

1. STX2 should be similar to PHX2 but made by N3E. So it could still maintain monolithic design in order to target lower power TDP. STX2 mainly targeting upcoming Intel Lunar Lake which might be monolithic design.

2. I also don't believe in such a big iGPU, most likely STX1 will be using N32 chiplet design with 768SP and 1536ALU which provided around 9 TF. The RDNA3+ most likely refer to N32 graphics engine whereas RDNA3 in PP uses N33 graphics engine.

My point is AMD split the design into two: STX1 uses chiplet design and STX2 uses monolithic design. Intel might be doing the same thing by spliting design with Lunar Lake and Panther Lake...we wil see, too many unknown atm. If you guys has any different ideas, you are welcome to pinch in.

Glo. · Jan 16, 2023

Every APU is monolithic.

What is chiplet based are only the caches. This is AMDs approach, not Intel's.

Remember this for every analysis.

Its absolutely obvious WHY would this be.

And no, I don't believe there would not big enough market, especially considering that EVERYBODY will integrate system on chips for future tech: wearables, cars infotainment, AR/VR, laptops, desktops, SFF computing.

Thats the reason why INTEL is developing GPUs that they can integrate into their own CPUs. THATS the reason why Nvidia tried to buy ARM. AMD cannot be behind Intel on this front.

BorisTheBlade82 · Jan 16, 2023

Glo. said:
Every APU is monolithic.

What is chiplet based are only the caches. This is AMDs approach, not Intel's.

Remember this for every analysis.

Again you are stating your own humble opinion as a fact. What is Raphael to you, if not an APU? What is Meteor Lake to you, if not an APU? What is MI300?

Glo. · Jan 16, 2023

BorisTheBlade82 said:
Again you are stating your own humble opinion as a fact. What is Raphael to you, if not an APU? What is Meteor Lake to you, if not an APU? What is MI300?

It should be "Every APU in Phoenix, Strix Point lineup is monolithic".

Does it clear now, for you?

TESKATLIPOKA · Jan 16, 2023

Mopetar said:
That's essentially where I'd like to see AMD go in the future and really is the ultimate end-game of a chiplet based approach.

There's probably always going to be some market for a monolithic design, but I could see that being relegated to niche markets over time.

I suspect that we potentially get some dual-GPU designs as well. As long as the physical size doesn't interfere, there's no reason note to offer a 36/48 CU option for a gaming APU that provides pretty good performance without the need to add in a discrete card.

I also could imagine AMD doing something like Intel where it designs a smaller core that's built around providing more throughput for CPU compute. The Zen core is already considerably smaller than Intel's performance core so there's not as much pressure for them to do this. You could also say that the more densely packed Zen 4c is already them doing this, but I wonder how much further they could take it.

I didn't consider making a Dual IGP(GPU) chiplet, because of low BW. Even with this I had to use 96MB LLC. If they put a single HBM there, this would be very possible.

BorisTheBlade82 said:
Don't want to shoot down your line if thinking entirely, but from all we know AMD tries to minimize the amount of individual chips for the entirety of their markets as much as possible. Bring-up-costs for only one 7nm chip are said to be in the ballpark of 50-75 Mio. USD - see the linked article.
No way that AMD will create that many dies for just a small part of their portfolio.

The Dark Side Of The Semiconductor Design Renaissance – Fixed Costs Soaring Due To Photomask Sets, Verification, and Validation

As the semiconductor design renaissance flourishes, fixed costs and risk soar. Semiconductors will march forward, but how many dead bodies will litter the ground?

www.semianalysis.com

This could be used for the whole portfolio. You want a 32 core CPU without a large IGP?
Put 4 CPU chiplets, 96MB LLC and remove IGP chiplet. IO chiplet could have a very small 2CU IGP inside, which would be deactivated If you also have a separate IGP(GPU) chiplet.

This was just an example, so I didn't try to minimize It as much as possible.
I can combine 4C8T Zen5 and 4C8T Zen4c into one and just add more CPU chiplets for higher core count If needed.
I can use only the 24CU GPU chiplet.
LLC chiplet could be also reduced to just 2 versions -> 48MB and 96MB.
Now you only have 5 different chiplets to design.
The problem with this approach is you will have to cut(deactivate) a lot of die space for some models.

BorisTheBlade82 · Jan 16, 2023

Glo. said:
It should be "Every APU in Phoenix, Strix Point lineup is monolithic".

Does it clear now, for you?

I personally would not rule out Strix Point to be chiplet-based (this is by the way how to state an opinion). I am pretty sure that AMD will either react to MTL or already planned going chiplet in the Mobile space without being as vocal about it as Intel.

Glo. · Jan 16, 2023

BorisTheBlade82 said:
I personally would not rule out Strix Point to be chiplet-based (this is by the way how to state an opinion). I am pretty sure that AMD will either react to MTL or already planned going chiplet in the Mobile space without being as vocal about it as Intel.

AMD doesn't need to react to Intel, and go chiplet, especially when they won't have the benefit in it.

Go ahead and explain to everybody why its beneficial for AMD to design separate chiplets for: 4nm Zen 5, 4 nm Zen 4C, 4 nm RDNA3+ iGPU, and 4 nm AI engines, 6 nm Cache+memory controller instead of designing ONE monolithic APU, with 6 nm cache chiplets, for specific purpose? Why is for AMD more beneficial to design something that costs more, in the beginning, than design ONE part that costs from the start less?

Intel has two separate process nodes to work with. Their own fabs, and their own CPUs and the GPU adventure they went on. AMD doesn't.

TESKATLIPOKA · Jan 16, 2023

Glo. said:
AMD doesn't need to react to Intel, and go chiplet, especially when they won't have the benefit in it.

Go ahead and explain to everybody why its beneficial for AMD to design separate chiplets for: 4nm Zen 5, 4 nm Zen 4C, 4 nm RDNA3+ iGPU, and 4 nm AI engines, 6 nm Cache+memory controller instead of designing ONE monolithic APU, with 6 nm cache chiplets, for specific purpose? Why is for AMD more beneficial to design something that costs more, in the beginning, than design ONE part that costs from the start less?

Intel has two separate process nodes to work with. Their own fabs, and their own CPUs and the GPU adventure they went on. AMD doesn't.

What I wrote is an extreme example, It's not like It's based on leaks or anything. It doesn't even need to be Strix Point but a future APU and I already reduced the amount of different chiplets.

If Strix Point has cache+memory controller chiplets like N31 or N32 then It's not a monolith.
Why only one Strix Point should be designed? If they don't increase the core count, then I can understand that. If there are more cores which I expect, then a single design is not such a good option, especially with a big 24CU IGP. 2 designs should be enough.

P.S. You didn't need to make a separate chiplet for everything just to make your point.

BorisTheBlade82 · Jan 16, 2023

Glo. said:
AMD doesn't need to react to Intel, and go chiplet, especially when they won't have the benefit in it.

Go ahead and explain to everybody why its beneficial for AMD to design separate chiplets for: 4nm Zen 5, 4 nm Zen 4C, 4 nm RDNA3+ iGPU, and 4 nm AI engines, 6 nm Cache+memory controller instead of designing ONE monolithic APU, with 6 nm cache chiplets, for specific purpose? Why is for AMD more beneficial to design something that costs more, in the beginning, than design ONE part that costs from the start less?

Intel has two separate process nodes to work with. Their own fabs, and their own CPUs and the GPU adventure they went on. AMD doesn't.

Please do not put words in my mouth. I argued against that many chiplets. I could imagine them to use one ZenX and one ZenXc chiplet for all markets. This could be coupled to few IODs incl. iGPU again for all markets. And maybe somewhere some Cache (although I still do not see a wide market for powerful iGPUs just as you do).

Glo. · Jan 16, 2023

TESKATLIPOKA said:
What I wrote is an extreme example, It's not like It's based on leaks or anything. It doesn't even need to be Strix Point but a future APU.

If Strix Point has cache+memory controller chiplets like N31 or N32 then It's not a monolith.
Why only one Strix Point should be designed? If they don't increase the core count, then I can understand that. If there are more cores, then a single design is not such a good option, especially with a big 24CU IGP.

P.S. You didn't need to make a separate chiplet for everything just to make your point.

BorisTheBlade82 said:
Please do not put words in my mouth. I argued against that many chiplets. I could imagine them to use one ZenX and one ZenXc chiplet for all markets. This could be coupled to few IODs incl. iGPU again for all markets. And maybe somewhere some Cache (although I still do not see a wide market for powerful iGPUs just as you do).

I thought we have a discussion about SP

On smaller nodes, like N3, on which SP will be, its more beneficial for sub 200mm2 die sizes to remain monolithic in design because of costs of the design. Yielding IP on 3 nm will be eye watering, especially if we break, very small die, into even smaller dies.

The ONLY place, where it remains beneficial to break monolithic designs into smaller pieces - is when you get larger die, you lower the design costs, and manufacturing costs, by breaking apart each IP. So larger, more performant/powerful designs.

So IF there is larger Strix Point on the tables, like 8P/16E/48 CU - then yes, its much more beneficial to break this design into pieces, because that way - those pieces you can use in other places, like for example - smaller/normal Strix Point APU, that is supposed to be lower cost. You scale your design from the top and you go lower down(assuming you are able to make two GPU chiplets to be seen as ONE GPU).

In this scenario - yes, its possible that AMD could break each IP into separate chiplets. If there is no scenario like this - there is zero financial incentive, because of astronomic design costs, and then manufacturing.

DisEnchantment · Jan 16, 2023

The market is there for both to coexist, say a monolithic Strix Point for 10W-25W and DT derived chiplet based APU for 25W-65W. Just like with Zen 4 for instance; Dragon Range, Phoenix and Phoenix2.
AMD cannot ignore the low power but price sensitive sensitive segment, wherein the likes of Qualcomm SoCs will play and they will blow their trumpet hard.
Likely for Low Power Mobile it may not even be a clear cut advantage to go with chiplets if you consider the fact that TSMC is now offering FinFlex, unless AMD is planning 220mm2+ dies for the ultra low power and we know that is not happening.

With FinFlex they could use 2-2 Fins for the P cores and 2-1 Fins for E Core, cache and CUs since they can clock them lower. Should help with density and efficiency greatly.
If PHX2 is an indicator of future direction, likely the efficiency cores would be something like 8C Zen 5 with half the L3 using the 2-1 Fins and 4C performance cores on 2-2 Fins with full L3, for a total of 12C/24T Zen 5 cores. Or something like this arrangement. Putting Zen 4c in a Zen 5 SoC would land AMD in a similar situation of AVX512 support in ADL, since we know AMD hinted of new AI instructions coming for Zen 5.
N3E 2-1 fins don't have a regression vs N5 2-2 fins in performance. Good fit for CUs and E Cores. And the density gains are very significant vs N5 2-2 fins, 1.56x vs 1.38x for N3 2-2 Fins for logic, for which there are lots of within the CUs

The high end mobile though would be covered by the mainstream DT APUs, this is apparent that AMD needs to address this following the footsteps of Intel.

But lets see if the new interconnects can be energy frugal enough that AMD can address the 10W-150W market with a single unified chiplet approach. If this is the case, then Zen 5 would be formidable across the entire TDP range and provide AMD with an amazing product flexibility. This is why I feel the interconnects and packaging are the most interesting part of MI300 and can be a key lever for AMD to scale performance and efficiency across all segments.

BorisTheBlade82 · Jan 16, 2023

DisEnchantment said:
The market is there for both to coexist, say a monolithic Strix Point for 10W-25W and DT derived chiplet based APU for 25W-65W. Just like with Zen 4 for instance; Dragon Range, Phoenix and Phoenix2.
AMD cannot ignore the low power but price sensitive sensitive segment, wherein the likes of Qualcomm SoCs will play and they will blow their trumpet hard.
Likely for Low Power Mobile it may not even be a clear cut advantage to go with chiplets if you consider the fact that TSMC is now offering FinFlex, unless AMD is planning 220mm2+ dies for the ultra low power and we know that is not happening.

With FinFlex they could use 2-2 Fins for the P cores and 2-1 Fins for E Core, cache and CUs since they can clock them lower. Should help with density and efficiency greatly.
If PHX2 is an indicator of future direction, likely the efficiency cores would be something like 8C Zen 5 with half the L3 using the 2-1 Fins and 4C performance cores on 2-2 Fins with full L3, for a total of 12C/24T Zen 5 cores. Or something like this arrangement. Putting Zen 4c in a Zen 5 SoC would land AMD in a similar situation of AVX512 support in ADL, since we know AMD hinted of new AI instructions coming for Zen 5.
N3E 2-1 fins don't have a regression vs N5 2-2 fins in performance. Good fit for CUs and E Cores. And the density gains are very significant vs N5 2-2 fins, 1.56x vs 1.38x for N3 2-2 Fins for logic, for which there are lots of within the CUs

The high end mobile though would be covered by the mainstream DT APUs, this is apparent that AMD needs to address this following the footsteps of Intel.

But lets see if the new interconnects can be energy frugal enough that AMD can address the 10W-150W market with a single unified chiplet approach. If this is the case, then Zen 5 would be formidable across the entire TDP range and provide AMD with an amazing product flexibility. This is why I feel the interconnects and packaging are the most interesting part of MI300 and can be a key lever for AMD to scale performance and efficiency across all segments.

I am totally aware of everything you wrote and I absolutely agree with you. I am especially excited for MI300 for the very same reasons as well and I hopefully pointed that out in the specific thread 😊

moinmoin · Jan 16, 2023

I wonder if one solution combining both monolithic and chiplet approaches could be having a power optimized self sufficient IOD essentially improving on Mendocino with 2 channels, newer dense Zen cores, AI engine and sufficient IO to be expandable with additional CCDs and/or IGPs. This could also be the kind of hybrid design enabling the choice between maximum possible efficiency and maximum performance.

BorisTheBlade82 · Jan 16, 2023

I would think so. IIRC this is something that Intel wants to do with MTL as well.

TESKATLIPOKA · Jan 16, 2023

Glo. said:
I thought we have a discussion about SP

On smaller nodes, like N3, on which SP will be, its more beneficial for sub 200mm2 die sizes to remain monolithic in design because of costs of the design. Yielding IP on 3 nm will be eye watering, especially if we break, very small die, into even smaller dies.

The ONLY place, where it remains beneficial to break monolithic designs into smaller pieces - is when you get larger die, you lower the design costs, and manufacturing costs, by breaking apart each IP. So larger, more performant/powerful designs.

So IF there is larger Strix Point on the tables, like 8P/16E/48 CU - then yes, its much more beneficial to break this design into pieces, because that way - those pieces you can use in other places, like for example - smaller/normal Strix Point APU, that is supposed to be lower cost. You scale your design from the top and you go lower down(assuming you are able to make two GPU chiplets to be seen as ONE GPU).

In this scenario - yes, its possible that AMD could break each IP into separate chiplets. If there is no scenario like this - there is zero financial incentive, because of astronomic design costs, and then manufacturing.

Yes, we are talking about Strix Point, but I made a lego APU, which not everyone liked, so I added that It could be a next gen APU.

It was a mistake to make separate Zen5 and Zen4c chiplets, because they would be ridiculously small.
Now there are only 5 different chiplets, but still 4-5 chiplets per APU.

$22,000 N3 wafer; $15,000 N5 wafer; $7,000 N6 wafer, defect density 0.1#/sq. cm

	CPU chiplet 3nm	GPU chiplet 3nm	LLC chiplet: 40MB 6nm	LLC chiplet: 72MB 6nm	IO chiplet 5nm	3nm ~175mm2 monolith 8P/8E/24CU/72MB
Good dies	1648	1432	1934	1130	785	271
Prices per die	$13.35	$15.36	$3.62	$6.2	$19.11	$81.18

	CPU chiplet 3nm: 4P 8MB L2 /4E 4MB L2 + 16MB L3	GPU chiplet 3nm: 24 CU; 96TMU; 48 ROP	LLC chiplet 6nm: 40MB or 72MB	IO chiplet 5nm: 2CU IGP, memory PHY, PCI-x, media etc.	Total die size	Cost for dies
4P/4E/16CU/24MB LLC	35 mm2	40 mm2	30 mm2	70 mm2	175 mm2	$51.44
4P/8E/16CU/32MB LLC	2 * 35 mm2	40 mm2	30 mm2	70 mm2	210 mm2	$64.79
8P/8E/16CU/40MB LLC	2 * 35 mm2	40 mm2	30 mm2	70 mm2	210 mm2	$64.79
4P/4E/24CU/48MB LLC	35 mm2	40 mm2	50 mm2	70 mm2	195 mm2	$54.02
4P/8E/24CU/60MB LLC	2 * 35 mm2	40 mm2	50 mm2	70 mm2	230 mm2	$67.37
8P/8E/24CU/72MB LLC	2 * 35 mm2	40 mm2	50 mm2	70 mm2	230 mm2	$67.37
4P/4E/2CU/24MB LLC	35 mm2	0	30 mm2	70 mm2	135 mm2	$38.66
8P/8E/2CU/48MB LLC	2 * 35 mm2	0	50 mm2	70 mm2	170 mm2	$52.01
12P/12E/2CU/72MB LLC	3 * 35 mm2	0	50 mm2	70 mm2	225 mm2	$52.01

Monolith 8P/8E/24CU/72MB would be $81.18 + $10 packaging = $91.18
Chiplet 8P/8E/24CU/72MB would be $67.37 + $30 packaging = $97.37
Monolith should be cheaper, but you still need 2 extra 3nm monoliths, or chiplet APU would be cheaper.
One would be 140mm2 4P/4E/24CU/48MB and the other 170mm2 12P/12E/2CU/72MB.

Chiplet has a pretty big advantage compared to monolith, you can make 1432 triplets(CPU+CPU+GPU chiplet) from three 3nm wafers and still 432 CPU dies will be left.
From a monolith, you can do 3*271 = 813 complete dies.
This would help a lot in making more APUs If there is not enough 3nm wafers.

I think designing 5 chiplets for APU should be easier and cheaper than designing 3 monolithic APUs.

Kepler_L2 · Jan 16, 2023

Glo. said:
It should be "Every APU in Phoenix, Strix Point lineup is monolithic".

Does it clear now, for you?

I don't think STX lineup is monolithic.

Kaluan · Jan 16, 2023

Something is telling me AMD (or any of the big players for that matter) WON'T be paying $20.000 for a N3E wafer, let alone $22.000...

Mopetar · Jan 16, 2023

They easily can and will, but the consumer parts might be rather limited. Look at what AMD, NVidia, and Intel charge for the data center parts they make and tell me that a $20,000 wafer prevents them from making money.

Khanan · Jan 16, 2023

BorisTheBlade82 said:
What is Raphael to you, if not an APU?

AMD explicitly stated to not call that weak IGP a APU (in a interview, not long ago), everything else you're spot on. APUs = CPUs combined with a GPU that has some relatively good performance.

HurleyBird · Jan 16, 2023

TESKATLIPOKA said:
This is a continuation to what @Glo. posted here about Strix Point possibly having 24CU.
I don't really believe in such a big IGP, because I don't think there would be a big enough market for It, but let's imagine It's real.

Assuming we're still at least 2 generations out from stacked everything, I think that what really makes sense is merging the mobile parts and the IODs.

For instance, on the next gen normal IOD, add 4 dense, low clockspeed, low power cores to get competitive on idle power consumption (and the bonus is that you get more compute) and pair that with 4 CUs. Manufacture it on an affordable node.

Then for the "super IOD," have 8 dense, low clockspeed, low power cores paired with, say, 24 CUs. Manufacture it on an advanced node.

Then, just by mixing and matching (and before even disabling anything for yields), you get this combination of parts:

4-core 4 CU
12-core 4 CU
20-core 4 CU
8-core 24 CU
16-core 24 CU
24-core 24 CU

And all of these combinations work for both mobile and desktop. You could easily pull 100 SKUs from just three chips, although you probably wouldn't. With so much flexibility, you can counter pretty much anything Intel can come up with. We're not even talking about V-cache yet either.

Of course, there's a lot of room between 4 CUs and 24 CUs, but these specs are just for the purpose of illustration. You might also want to create a specific chip for the ultra low end that wastes less space on I/O than the normal IOD would (although there's still probably a niche for specific embedded applications that need a lot of I/O relative to compute).

Exist50 · Jan 17, 2023

DisEnchantment said:
Likely for Low Power Mobile it may not even be a clear cut advantage to go with chiplets if you consider the fact that TSMC is now offering FinFlex

I wouldn't lean so heavily on TSMC's talk of FinFlex. The reality is that while it makes for some convenient libraries at times, it's a real PIA to use, perhaps more than it's worth in many cases. Would hate to make a full SoC out of it.

Exist50 · Jan 17, 2023

All this "super IOD" talk sounds a lot like what Intel's doing with MTL, tbh. Maybe with different tech and balance between the pieces, but it does make some sense. I like the idea of something like 4x Zen 5c on the super-IOD. They could sell that as-is for the low end market where they've been struggling to provide options, and for many day to day scenarios (battery life), it would let them turn off the compute die entirely. That might free them up to use lower cost interconnect chiplet tech without trashing battery life. That said, having the memory controller on a separate die, and (presumably) an older node at at, would be less than ideal.

GPU, I think, also makes sense to split off. They're trying to do one-size-fits-all right now, and it's a compromise. For stuff like office/home machines, the current APU iGPUs are overkill, but gaming and creative could demand even more. Even just two different dies would probably go a long way.

Just throwing a crazy idea out for the sake of discussion, but what about a split super-IOD? I'm thinking the most important stuff (mostly memory controller) and any CPU cores or accelerators on the leading node, and a separate die (with a cheaper, lower-bandwidth interconnect) for all the slower stuff that's fine with N-1. So 4 dies in total: IOD-fast (N), IOD-slow (N-1), compute (optional, N), GPU (optional?, N). Then that IOD-slow could basically be reused for the desktop chipsets. Might also be possible to reuse the IOD-fast between mobile and desktop as well, but that's a stretch.

BorisTheBlade82 · Jan 17, 2023

Exist50 said:
All this "super IOD" talk sounds a lot like what Intel's doing with MTL, tbh. Maybe with different tech and balance between the pieces, but it does make some sense. I like the idea of something like 4x Zen 5c on the super-IOD. They could sell that as-is for the low end market where they've been struggling to provide options, and for many day to day scenarios (battery life), it would let them turn off the compute die entirely. That might free them up to use lower cost interconnect chiplet tech without trashing battery life. That said, having the memory controller on a separate die, and (presumably) an older node at at, would be less than ideal.

GPU, I think, also makes sense to split off. They're trying to do one-side-fits-all right now, and it's a compromise. For stuff like office/home machines, the current APU iGPUs are overkill, but gaming and creative could demand even more. Even just two different dies would probably go a long way.

Just throwing a crazy idea out for the sake of discussion, but what about a split super-IOD? I'm thinking the most important stuff (mostly memory controller) and any CPU cores or accelerators on the leading node, and a separate die (with a cheaper, lower-bandwidth interconnect) for all the slower stuff that's fine with N-1. So 4 dies in total: IOD-fast (N), IOD-slow (N-1), compute (optional, N), GPU (optional?, N). Then that IOD-slow could basically be reused for the desktop chipsets. Might also be possible to reuse the IOD-fast between mobile and desktop as well, but that's a stretch.

While I find that idea interesting, I see one big problem: As the IOD works as a very fat crossbar internally, there is a huge amount of bandwidth needed in order to join two of them. With a cheap Interconnect you might shoot yourself in the foot regarding consumption. An efficient interconnect like EFB might be rather expensive.

Exist50 · Jan 17, 2023

BorisTheBlade82 said:
While I find that idea interesting, I see one big problem: As the IOD works as a very fat crossbar internally, there is a huge amount of bandwidth needed in order to join two of them. With a cheap Interconnect you might shoot yourself in the foot regarding consumption. An efficient interconnect like EFB might be rather expensive.

So my thought was that the only die-to-die connections are from the IOD-fast to each other die, so that would take care of the big-ticket items like feeding the CPU and GPU. Then the IOD-slow would only have to support enough bandwidth to run whatever IO you put on there — namely, stuff like PCIe, USB4, and a few other bits and pieces.

I think the big question would actually be the GPU. It's likely to be the most bandwidth-intensive die, so that link might need special treatment. Perhaps IFOP for the CPU/IOD-slow and "Infinity Fanout Links" or EFB for the GPU?

	Zen 5	Zen 4c	IGP	IGP gaming frequency	Last Level Cache	Power Limit -> gaming
6C16T Model	3 Cores; L2: 6MB	3 Cores; L2: 1.5MB	12 CU; 48TMU; 24 ROP	2400 MHz	Total: 21MB CPU: 9 + IGP: 12MB	30W CPU: 15W + IGP: 15W
6C16T Model	3 Cores; L2: 6MB	3 Cores; L2: 1.5MB	16 CU; 64TMU; 32 ROP	2400 MHz	Total: 33MB CPU: 9 + IGP: 24MB	35W CPU: 15W + IGP: 20W
8C16T Model	4 Cores; L2: 4MB	4 Cores; L2: 2MB	16 CU; 64TMU; 32 ROP	2400 MHz	Total: 36MB CPU: 12 + IGP: 24MB	35W CPU: 15W + IGP: 20W
8C16T Model	4 Cores; L2: 4MB	4 Cores; L2: 2MB	20 CU; 80TMU; 40 ROP	2400 MHz	Total: 48MB CPU: 12 + IGP: 36MB	40W CPU: 15W + IGP: 25W
8C16T Model	4 Cores; L2: 4MB	4 Cores; L2: 2MB	24 CU; 96TMU; 48 ROP	2400 MHz	Total: 60MB CPU: 12 + IGP: 48MB	45W CPU: 15W + IGP: 30W
10C20T model	4 Cores; L2: 4MB	6 Cores; L2: 3MB	16 CU; 64TMU; 32 ROP	2800 MHz	Total: 46 MB CPU: 14 + IGP: 32MB	50W CPU: 20W + IGP: 30W
10C20T model	4 Cores; L2: 4MB	6 Cores; L2: 3MB	20 CU; 80TMU; 40 ROP	2800 MHz	Total: 60MB CPU: 14 + IGP: 46MB	57W CPU: 20W + IGP: 37W
12C24T model	4 Cores; L2: 4MB	8 Cores; L2: 4MB	16 CU; 64TMU; 32 ROP	2800 MHz	Total: 48MB CPU: 16 + IGP: 32MB	55W CPU: 25W + IGP: 30W
12C24T model	4 Cores; L2: 4MB	8 Cores; L2: 4MB	20 CU; 80TMU; 40 ROP	2800 MHz	Total: 62MB CPU: 16 + IGP: 46MB	62W CPU: 25W + IGP: 37W
12C24T model	4 Cores; L2: 4MB	8 Cores; L2: 4MB	24 CU; 96TMU; 48 ROP	2800 MHz	Total: 76MB CPU: 16 + IGP: 60MB	70W CPU: 25W + IGP: 45W
14C28T model	6 Cores; L2: 6MB	8 Cores; L2: 4MB	16 CU; 64TMU; 32 ROP	2800 MHz	Total: 52MB CPU: 20 + IGP: 32MB	60W CPU: 30W + IGP: 30W
14C28T model	6 Cores; L2: 6MB	8 Cores; L2: 4MB	20 CU; 80TMU; 40 ROP	2800 MHz	Total: 66MB CPU: 20 + IGP: 46MB	67W CPU: 30W + IGP: 37W
16C32T model	8 Cores; L2: 8MB	8 Cores; L2: 4MB	16 CU; 64TMU; 32 ROP	3200 MHz	Total: 64MB CPU: 24 + IGP: 40MB	75W CPU: 35W + IGP: 40W
16C32T model	8 Cores; L2: 8MB	8 Cores; L2: 4MB	20 CU; 80TMU; 40 ROP	3200 MHz	Total: 80MB CPU: 24 + IGP: 56MB	85W CPU: 35W + IGP: 50W
16C32T model	8 Cores; L2: 8MB	8 Cores; L2: 4MB	24 CU; 96TMU; 48 ROP	3200 MHz	Total: 96MB CPU: 24 + IGP: 72MB	95W CPU: 35W + IGP: 60W

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Senior member

Senior member

Diamond Member

Senior member

Diamond Member

Platinum Member

Senior member

Diamond Member

Platinum Member

Senior member

Diamond Member

Golden Member

Senior member

Diamond Member

Senior member

Platinum Member

Senior member

Senior member

Diamond Member

Senior member

Platinum Member

Platinum Member

Platinum Member

Senior member

Platinum Member