- Mar 3, 2017
- 1,602
- 5,788
- 136
This is a continuation to what @Glo. posted here about Strix Point possibly having 24CU.
I don't really believe in such a big IGP, because I don't think there would be a big enough market for It, but let's imagine It's real.
I will use LLC instead of L3 and IC and It will have Its own chiplet.
There could be 10 different chiplets and 5 chiplets per APU:
1.) Zen5 chiplet in 2 versions, 4core and 8core (3nm)
2.) Zen4c chiplet in 2 versions, 4core and 8core (3nm)
3.) IGP chiplet in 2 versions, 16CU and 24CU (3nm)
4.) Cache chiplet in 3 versions, 36MB, 64MB and 96MB (6nm)
5.) A single IO chiplet with 2*64bit DDR5(4x32bit LPDDR5), video stuff, PHY etc. (5-6nm)
This is how different models could look like in real life, but these sizes are just for show.
Width would stay the same and only height would change depending on which chiplet you use.
View attachment 74727View attachment 74728View attachment 74729 View attachment 74730View attachment 74731
You could make a lot of combinations with different chiplets while not cutting them down much excluding LLC chiplet. Power limit and LLC was based on the IGP's needs.
Zen 5 Zen 4c IGP IGP gaming frequency Last Level Cache Power Limit -> gaming 6C16T Model 3 Cores; L2: 6MB 3 Cores; L2: 1.5MB 12 CU; 48TMU; 24 ROP 2400 MHz Total: 21MB
CPU: 9 + IGP: 12MB30W
CPU: 15W + IGP: 15W6C16T Model 3 Cores; L2: 6MB 3 Cores; L2: 1.5MB 16 CU; 64TMU; 32 ROP 2400 MHz Total: 33MB
CPU: 9 + IGP: 24MB35W
CPU: 15W + IGP: 20W8C16T Model 4 Cores; L2: 4MB 4 Cores; L2: 2MB 16 CU; 64TMU; 32 ROP 2400 MHz Total: 36MB
CPU: 12 + IGP: 24MB35W
CPU: 15W + IGP: 20W8C16T Model 4 Cores; L2: 4MB 4 Cores; L2: 2MB 20 CU; 80TMU; 40 ROP 2400 MHz Total: 48MB
CPU: 12 + IGP: 36MB40W
CPU: 15W + IGP: 25W8C16T Model 4 Cores; L2: 4MB 4 Cores; L2: 2MB 24 CU; 96TMU; 48 ROP 2400 MHz Total: 60MB
CPU: 12 + IGP: 48MB45W
CPU: 15W + IGP: 30W10C20T model 4 Cores; L2: 4MB 6 Cores; L2: 3MB 16 CU; 64TMU; 32 ROP 2800 MHz Total: 46 MB
CPU: 14 + IGP: 32MB50W
CPU: 20W + IGP: 30W10C20T model 4 Cores; L2: 4MB 6 Cores; L2: 3MB 20 CU; 80TMU; 40 ROP 2800 MHz Total: 60MB
CPU: 14 + IGP: 46MB57W
CPU: 20W + IGP: 37W12C24T model 4 Cores; L2: 4MB 8 Cores; L2: 4MB 16 CU; 64TMU; 32 ROP 2800 MHz Total: 48MB
CPU: 16 + IGP: 32MB55W
CPU: 25W + IGP: 30W12C24T model 4 Cores; L2: 4MB 8 Cores; L2: 4MB 20 CU; 80TMU; 40 ROP 2800 MHz Total: 62MB
CPU: 16 + IGP: 46MB62W
CPU: 25W + IGP: 37W12C24T model 4 Cores; L2: 4MB 8 Cores; L2: 4MB 24 CU; 96TMU; 48 ROP 2800 MHz Total: 76MB
CPU: 16 + IGP: 60MB70W
CPU: 25W + IGP: 45W14C28T model 6 Cores; L2: 6MB 8 Cores; L2: 4MB 16 CU; 64TMU; 32 ROP 2800 MHz Total: 52MB
CPU: 20 + IGP: 32MB60W
CPU: 30W + IGP: 30W14C28T model 6 Cores; L2: 6MB 8 Cores; L2: 4MB 20 CU; 80TMU; 40 ROP 2800 MHz Total: 66MB
CPU: 20 + IGP: 46MB67W
CPU: 30W + IGP: 37W16C32T model 8 Cores; L2: 8MB 8 Cores; L2: 4MB 16 CU; 64TMU; 32 ROP 3200 MHz Total: 64MB
CPU: 24 + IGP: 40MB75W
CPU: 35W + IGP: 40W16C32T model 8 Cores; L2: 8MB 8 Cores; L2: 4MB 20 CU; 80TMU; 40 ROP 3200 MHz Total: 80MB
CPU: 24 + IGP: 56MB85W
CPU: 35W + IGP: 50W16C32T model 8 Cores; L2: 8MB 8 Cores; L2: 4MB 24 CU; 96TMU; 48 ROP 3200 MHz Total: 96MB
CPU: 24 + IGP: 72MB95W
CPU: 35W + IGP: 60W
Wow, that is a lot of combinations. I have two theories to discuss:This is a continuation to what @Glo. posted here about Strix Point possibly having 24CU.
I don't really believe in such a big IGP, because I don't think there would be a big enough market for It, but let's imagine It's real.
I will use LLC instead of L3 and IC and It will have Its own chiplet.
There could be 10 different chiplets and 5 chiplets per APU:
1.) Zen5 chiplet in 2 versions, 4core and 8core (3nm)
2.) Zen4c chiplet in 2 versions, 4core and 8core (3nm)
3.) IGP chiplet in 2 versions, 16CU and 24CU (3nm)
4.) Cache chiplet in 3 versions, 36MB, 64MB and 96MB (6nm)
5.) A single IO chiplet with 2*64bit DDR5(4x32bit LPDDR5), video stuff, PHY etc. (5-6nm)
This is how different models could look like in real life. Width would stay the same and only height would change depending on which chiplet you use.
View attachment 74727View attachment 74728View attachment 74729 View attachment 74730View attachment 74731
You could make a lot of combinations with different chiplets while not cutting them down much excluding LLC chiplet. Power limit and LLC was based on the IGP's needs.
Zen 5 Zen 4c IGP IGP gaming frequency Last Level Cache Power Limit -> gaming 6C16T Model 3 Cores; L2: 6MB 3 Cores; L2: 1.5MB 12 CU; 48TMU; 24 ROP 2400 MHz Total: 21MB
CPU: 9 + IGP: 12MB30W
CPU: 15W + IGP: 15W6C16T Model 3 Cores; L2: 6MB 3 Cores; L2: 1.5MB 16 CU; 64TMU; 32 ROP 2400 MHz Total: 33MB
CPU: 9 + IGP: 24MB35W
CPU: 15W + IGP: 20W8C16T Model 4 Cores; L2: 4MB 4 Cores; L2: 2MB 16 CU; 64TMU; 32 ROP 2400 MHz Total: 36MB
CPU: 12 + IGP: 24MB35W
CPU: 15W + IGP: 20W8C16T Model 4 Cores; L2: 4MB 4 Cores; L2: 2MB 20 CU; 80TMU; 40 ROP 2400 MHz Total: 48MB
CPU: 12 + IGP: 36MB40W
CPU: 15W + IGP: 25W8C16T Model 4 Cores; L2: 4MB 4 Cores; L2: 2MB 24 CU; 96TMU; 48 ROP 2400 MHz Total: 60MB
CPU: 12 + IGP: 48MB45W
CPU: 15W + IGP: 30W10C20T model 4 Cores; L2: 4MB 6 Cores; L2: 3MB 16 CU; 64TMU; 32 ROP 2800 MHz Total: 46 MB
CPU: 14 + IGP: 32MB50W
CPU: 20W + IGP: 30W10C20T model 4 Cores; L2: 4MB 6 Cores; L2: 3MB 20 CU; 80TMU; 40 ROP 2800 MHz Total: 60MB
CPU: 14 + IGP: 46MB57W
CPU: 20W + IGP: 37W12C24T model 4 Cores; L2: 4MB 8 Cores; L2: 4MB 16 CU; 64TMU; 32 ROP 2800 MHz Total: 48MB
CPU: 16 + IGP: 32MB55W
CPU: 25W + IGP: 30W12C24T model 4 Cores; L2: 4MB 8 Cores; L2: 4MB 20 CU; 80TMU; 40 ROP 2800 MHz Total: 62MB
CPU: 16 + IGP: 46MB62W
CPU: 25W + IGP: 37W12C24T model 4 Cores; L2: 4MB 8 Cores; L2: 4MB 24 CU; 96TMU; 48 ROP 2800 MHz Total: 76MB
CPU: 16 + IGP: 60MB70W
CPU: 25W + IGP: 45W14C28T model 6 Cores; L2: 6MB 8 Cores; L2: 4MB 16 CU; 64TMU; 32 ROP 2800 MHz Total: 52MB
CPU: 20 + IGP: 32MB60W
CPU: 30W + IGP: 30W14C28T model 6 Cores; L2: 6MB 8 Cores; L2: 4MB 20 CU; 80TMU; 40 ROP 2800 MHz Total: 66MB
CPU: 20 + IGP: 46MB67W
CPU: 30W + IGP: 37W16C32T model 8 Cores; L2: 8MB 8 Cores; L2: 4MB 16 CU; 64TMU; 32 ROP 3200 MHz Total: 64MB
CPU: 24 + IGP: 40MB75W
CPU: 35W + IGP: 40W16C32T model 8 Cores; L2: 8MB 8 Cores; L2: 4MB 20 CU; 80TMU; 40 ROP 3200 MHz Total: 80MB
CPU: 24 + IGP: 56MB85W
CPU: 35W + IGP: 50W16C32T model 8 Cores; L2: 8MB 8 Cores; L2: 4MB 24 CU; 96TMU; 48 ROP 3200 MHz Total: 96MB
CPU: 24 + IGP: 72MB95W
CPU: 35W + IGP: 60W
Again you are stating your own humble opinion as a fact. What is Raphael to you, if not an APU? What is Meteor Lake to you, if not an APU? What is MI300?Every APU is monolithic.
What is chiplet based are only the caches. This is AMDs approach, not Intel's.
Remember this for every analysis.
It should be "Every APU in Phoenix, Strix Point lineup is monolithic".Again you are stating your own humble opinion as a fact. What is Raphael to you, if not an APU? What is Meteor Lake to you, if not an APU? What is MI300?
I didn't consider making a Dual IGP(GPU) chiplet, because of low BW. Even with this I had to use 96MB LLC. If they put a single HBM there, this would be very possible.That's essentially where I'd like to see AMD go in the future and really is the ultimate end-game of a chiplet based approach.
There's probably always going to be some market for a monolithic design, but I could see that being relegated to niche markets over time.
I suspect that we potentially get some dual-GPU designs as well. As long as the physical size doesn't interfere, there's no reason note to offer a 36/48 CU option for a gaming APU that provides pretty good performance without the need to add in a discrete card.
I also could imagine AMD doing something like Intel where it designs a smaller core that's built around providing more throughput for CPU compute. The Zen core is already considerably smaller than Intel's performance core so there's not as much pressure for them to do this. You could also say that the more densely packed Zen 4c is already them doing this, but I wonder how much further they could take it.
This could be used for the whole portfolio. You want a 32 core CPU without a large IGP?Don't want to shoot down your line if thinking entirely, but from all we know AMD tries to minimize the amount of individual chips for the entirety of their markets as much as possible. Bring-up-costs for only one 7nm chip are said to be in the ballpark of 50-75 Mio. USD - see the linked article.
No way that AMD will create that many dies for just a small part of their portfolio.
The Dark Side Of The Semiconductor Design Renaissance – Fixed Costs Soaring Due To Photomask Sets, Verification, and Validation
As the semiconductor design renaissance flourishes, fixed costs and risk soar. Semiconductors will march forward, but how many dead bodies will litter the ground?www.semianalysis.com
I personally would not rule out Strix Point to be chiplet-based (this is by the way how to state an opinion). I am pretty sure that AMD will either react to MTL or already planned going chiplet in the Mobile space without being as vocal about it as Intel.It should be "Every APU in Phoenix, Strix Point lineup is monolithic".
Does it clear now, for you?
AMD doesn't need to react to Intel, and go chiplet, especially when they won't have the benefit in it.I personally would not rule out Strix Point to be chiplet-based (this is by the way how to state an opinion). I am pretty sure that AMD will either react to MTL or already planned going chiplet in the Mobile space without being as vocal about it as Intel.
What I wrote is an extreme example, It's not like It's based on leaks or anything. It doesn't even need to be Strix Point but a future APU and I already reduced the amount of different chiplets.AMD doesn't need to react to Intel, and go chiplet, especially when they won't have the benefit in it.
Go ahead and explain to everybody why its beneficial for AMD to design separate chiplets for: 4nm Zen 5, 4 nm Zen 4C, 4 nm RDNA3+ iGPU, and 4 nm AI engines, 6 nm Cache+memory controller instead of designing ONE monolithic APU, with 6 nm cache chiplets, for specific purpose? Why is for AMD more beneficial to design something that costs more, in the beginning, than design ONE part that costs from the start less?
Intel has two separate process nodes to work with. Their own fabs, and their own CPUs and the GPU adventure they went on. AMD doesn't.
Please do not put words in my mouth. I argued against that many chiplets. I could imagine them to use one ZenX and one ZenXc chiplet for all markets. This could be coupled to few IODs incl. iGPU again for all markets. And maybe somewhere some Cache (although I still do not see a wide market for powerful iGPUs just as you do).AMD doesn't need to react to Intel, and go chiplet, especially when they won't have the benefit in it.
Go ahead and explain to everybody why its beneficial for AMD to design separate chiplets for: 4nm Zen 5, 4 nm Zen 4C, 4 nm RDNA3+ iGPU, and 4 nm AI engines, 6 nm Cache+memory controller instead of designing ONE monolithic APU, with 6 nm cache chiplets, for specific purpose? Why is for AMD more beneficial to design something that costs more, in the beginning, than design ONE part that costs from the start less?
Intel has two separate process nodes to work with. Their own fabs, and their own CPUs and the GPU adventure they went on. AMD doesn't.
What I wrote is an extreme example, It's not like It's based on leaks or anything. It doesn't even need to be Strix Point but a future APU.
If Strix Point has cache+memory controller chiplets like N31 or N32 then It's not a monolith.
Why only one Strix Point should be designed? If they don't increase the core count, then I can understand that. If there are more cores, then a single design is not such a good option, especially with a big 24CU IGP.
P.S. You didn't need to make a separate chiplet for everything just to make your point.
I thought we have a discussion about SPPlease do not put words in my mouth. I argued against that many chiplets. I could imagine them to use one ZenX and one ZenXc chiplet for all markets. This could be coupled to few IODs incl. iGPU again for all markets. And maybe somewhere some Cache (although I still do not see a wide market for powerful iGPUs just as you do).
I am totally aware of everything you wrote and I absolutely agree with you. I am especially excited for MI300 for the very same reasons as well and I hopefully pointed that out in the specific thread 😊The market is there for both to coexist, say a monolithic Strix Point for 10W-25W and DT derived chiplet based APU for 25W-65W. Just like with Zen 4 for instance; Dragon Range, Phoenix and Phoenix2.
AMD cannot ignore the low power but price sensitive sensitive segment, wherein the likes of Qualcomm SoCs will play and they will blow their trumpet hard.
Likely for Low Power Mobile it may not even be a clear cut advantage to go with chiplets if you consider the fact that TSMC is now offering FinFlex, unless AMD is planning 220mm2+ dies for the ultra low power and we know that is not happening.
With FinFlex they could use 2-2 Fins for the P cores and 2-1 Fins for E Core, cache and CUs since they can clock them lower. Should help with density and efficiency greatly.
If PHX2 is an indicator of future direction, likely the efficiency cores would be something like 8C Zen 5 with half the L3 using the 2-1 Fins and 4C performance cores on 2-2 Fins with full L3, for a total of 12C/24T Zen 5 cores. Or something like this arrangement. Putting Zen 4c in a Zen 5 SoC would land AMD in a similar situation of AVX512 support in ADL, since we know AMD hinted of new AI instructions coming for Zen 5.
N3E 2-1 fins don't have a regression vs N5 2-2 fins in performance. Good fit for CUs and E Cores. And the density gains are very significant vs N5 2-2 fins, 1.56x vs 1.38x for N3 2-2 Fins for logic, for which there are lots of within the CUs
The high end mobile though would be covered by the mainstream DT APUs, this is apparent that AMD needs to address this following the footsteps of Intel.
But lets see if the new interconnects can be energy frugal enough that AMD can address the 10W-150W market with a single unified chiplet approach. If this is the case, then Zen 5 would be formidable across the entire TDP range and provide AMD with an amazing product flexibility. This is why I feel the interconnects and packaging are the most interesting part of MI300 and can be a key lever for AMD to scale performance and efficiency across all segments.
Yes, we are talking about Strix Point, but I made a lego APU, which not everyone liked, so I added that It could be a next gen APU.I thought we have a discussion about SP
On smaller nodes, like N3, on which SP will be, its more beneficial for sub 200mm2 die sizes to remain monolithic in design because of costs of the design. Yielding IP on 3 nm will be eye watering, especially if we break, very small die, into even smaller dies.
The ONLY place, where it remains beneficial to break monolithic designs into smaller pieces - is when you get larger die, you lower the design costs, and manufacturing costs, by breaking apart each IP. So larger, more performant/powerful designs.
So IF there is larger Strix Point on the tables, like 8P/16E/48 CU - then yes, its much more beneficial to break this design into pieces, because that way - those pieces you can use in other places, like for example - smaller/normal Strix Point APU, that is supposed to be lower cost. You scale your design from the top and you go lower down(assuming you are able to make two GPU chiplets to be seen as ONE GPU).
In this scenario - yes, its possible that AMD could break each IP into separate chiplets. If there is no scenario like this - there is zero financial incentive, because of astronomic design costs, and then manufacturing.
CPU chiplet 3nm | GPU chiplet 3nm | LLC chiplet: 40MB 6nm | LLC chiplet: 72MB 6nm | IO chiplet 5nm | 3nm ~175mm2 monolith 8P/8E/24CU/72MB | |
Good dies | 1648 | 1432 | 1934 | 1130 | 785 | 271 |
Prices per die | $13.35 | $15.36 | $3.62 | $6.2 | $19.11 | $81.18 |
CPU chiplet 3nm: 4P 8MB L2 /4E 4MB L2 + 16MB L3 | GPU chiplet 3nm: 24 CU; 96TMU; 48 ROP | LLC chiplet 6nm: 40MB or 72MB | IO chiplet 5nm: 2CU IGP, memory PHY, PCI-x, media etc. | Total die size | Cost for dies | |
4P/4E/16CU/24MB LLC | 35 mm2 | 40 mm2 | 30 mm2 | 70 mm2 | 175 mm2 | $51.44 |
4P/8E/16CU/32MB LLC | 2 * 35 mm2 | 40 mm2 | 30 mm2 | 70 mm2 | 210 mm2 | $64.79 |
8P/8E/16CU/40MB LLC | 2 * 35 mm2 | 40 mm2 | 30 mm2 | 70 mm2 | 210 mm2 | $64.79 |
4P/4E/24CU/48MB LLC | 35 mm2 | 40 mm2 | 50 mm2 | 70 mm2 | 195 mm2 | $54.02 |
4P/8E/24CU/60MB LLC | 2 * 35 mm2 | 40 mm2 | 50 mm2 | 70 mm2 | 230 mm2 | $67.37 |
8P/8E/24CU/72MB LLC | 2 * 35 mm2 | 40 mm2 | 50 mm2 | 70 mm2 | 230 mm2 | $67.37 |
4P/4E/2CU/24MB LLC | 35 mm2 | 0 | 30 mm2 | 70 mm2 | 135 mm2 | $38.66 |
8P/8E/2CU/48MB LLC | 2 * 35 mm2 | 0 | 50 mm2 | 70 mm2 | 170 mm2 | $52.01 |
12P/12E/2CU/72MB LLC | 3 * 35 mm2 | 0 | 50 mm2 | 70 mm2 | 225 mm2 | $52.01 |
I don't think STX lineup is monolithic.It should be "Every APU in Phoenix, Strix Point lineup is monolithic".
Does it clear now, for you?
AMD explicitly stated to not call that weak IGP a APU (in a interview, not long ago), everything else you're spot on. APUs = CPUs combined with a GPU that has some relatively good performance.What is Raphael to you, if not an APU?
I wouldn't lean so heavily on TSMC's talk of FinFlex. The reality is that while it makes for some convenient libraries at times, it's a real PIA to use, perhaps more than it's worth in many cases. Would hate to make a full SoC out of it.Likely for Low Power Mobile it may not even be a clear cut advantage to go with chiplets if you consider the fact that TSMC is now offering FinFlex
While I find that idea interesting, I see one big problem: As the IOD works as a very fat crossbar internally, there is a huge amount of bandwidth needed in order to join two of them. With a cheap Interconnect you might shoot yourself in the foot regarding consumption. An efficient interconnect like EFB might be rather expensive.All this "super IOD" talk sounds a lot like what Intel's doing with MTL, tbh. Maybe with different tech and balance between the pieces, but it does make some sense. I like the idea of something like 4x Zen 5c on the super-IOD. They could sell that as-is for the low end market where they've been struggling to provide options, and for many day to day scenarios (battery life), it would let them turn off the compute die entirely. That might free them up to use lower cost interconnect chiplet tech without trashing battery life. That said, having the memory controller on a separate die, and (presumably) an older node at at, would be less than ideal.
GPU, I think, also makes sense to split off. They're trying to do one-side-fits-all right now, and it's a compromise. For stuff like office/home machines, the current APU iGPUs are overkill, but gaming and creative could demand even more. Even just two different dies would probably go a long way.
Just throwing a crazy idea out for the sake of discussion, but what about a split super-IOD? I'm thinking the most important stuff (mostly memory controller) and any CPU cores or accelerators on the leading node, and a separate die (with a cheaper, lower-bandwidth interconnect) for all the slower stuff that's fine with N-1. So 4 dies in total: IOD-fast (N), IOD-slow (N-1), compute (optional, N), GPU (optional?, N). Then that IOD-slow could basically be reused for the desktop chipsets. Might also be possible to reuse the IOD-fast between mobile and desktop as well, but that's a stretch.
So my thought was that the only die-to-die connections are from the IOD-fast to each other die, so that would take care of the big-ticket items like feeding the CPU and GPU. Then the IOD-slow would only have to support enough bandwidth to run whatever IO you put on there — namely, stuff like PCIe, USB4, and a few other bits and pieces.While I find that idea interesting, I see one big problem: As the IOD works as a very fat crossbar internally, there is a huge amount of bandwidth needed in order to join two of them. With a cheap Interconnect you might shoot yourself in the foot regarding consumption. An efficient interconnect like EFB might be rather expensive.