Question Zen 6 Speculation Thread

Page 355 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

basix

Senior member
Oct 4, 2024
303
601
96
It's relevant for performance because each Intel core got dedicated AMX unit, where as what you are trying to say about AMD is that they will have 1 per cluster (of multiple cores) - this will run like a dog, maybe ok in 2 core cluster, but who will make that for economy-cores.
We are talking about Zen 6 LP cores. So one AMX unit per 2C/4C cluster sounds very reasonable to me. I assume the big ones will have dedicated AMX units per core (or at least more AMX units per cluster than just 1). Kepler_L2 just said one unit per 2 cores. OK for me.
 
Last edited:

adroc_thurston

Diamond Member
Jul 2, 2023
8,361
11,122
106
They means AMD so we will get only APUs or they means nVidia which will admit that spending time on gamers is a waste of time?;)
Means the discrete presence is whatever dies from other markets they have on hand.
RDNA5 has 4 parts, but 6 can have 1. Or 3. And none are built to intercept any Nvidia configs.
 
  • Like
Reactions: madtronik and MS_AT

basix

Senior member
Oct 4, 2024
303
601
96
For AMD it makes sense to build GPUs like AT3 and AT4. Dual-Use for Mobile and Desktop (and small form factor ML/AI machines due to large memory pool support). And due to dual-use, AMD gets much better economies of scale compared to historic dGPUs. The large memory pool additionally opens up other market opportunities, where memory capacity is more important than peak FLOPS or memory bandwidth (professional-use dGPUs for scientific simulations, EDA, ML/AI and maybe also rendering).

AT2 and AT0 seem to be leveraging dual-use as well (Microsoft's XBox Next and game streaming). Makes sense as well at only 10-20% dGPU market share and de-risks your chip design project.

So from a HW- and chip-design perspective, AMD seems to act very reasonable for RDNA5.

Looking at RDNA6, AMD could double-down on that philosophy by finally designing a fully chiplet based architecture. One GPU chiplet for all market needs. Scale-up according to market conditions (1...N chiplets). With that you do not directly compete with Nvidia GPU SKUs. But you could scale your GPUs accordingly if necessary. And you get very nice economies of scale.

One important thing is nearly completely independent from HW design: FSR and ROCm support. Besides of designing good HW, AMD needs to put enough effort into its gaming GPU related SW portfolio. Otherwise AMD has good consumer/prosumer HW but due to lacking SW it underperforms.
 
  • Like
Reactions: Joe NYC

adroc_thurston

Diamond Member
Jul 2, 2023
8,361
11,122
106
How can you say that?
Anyone who knows 2026/2027 server comp positioning would tell you that.
Do you have any info on DMR for example?
yes it sucks a fat one.
Anything to back up your "underestimation of the decade" claim?
well, pay me, and I'll tell you.
Looking at RDNA6, AMD could double-down on that philosophy by finally designing a fully chiplet based architecture. One GPU chiplet for all market needs. Scale-up according to market conditions (1...N chiplets).
uh, no, GPU tiling is a win-more scenario.
It's not cost-effective to build GPUs with tiling since SoIC-X d2w costs + AID per config make it unviable.
One important thing is nearly completely independent from HW design: FSR and ROCm support. Besides of designing good HW, AMD needs to put enough effort into its gaming GPU related SW portfolio. Otherwise AMD has good consumer/prosumer HW but due to lacking SW it underperforms.
they need the shills. and the choppa. they have neither; thus they're dead.
 
  • Like
Reactions: Kaffeekenan

basix

Senior member
Oct 4, 2024
303
601
96
uh, no, GPU tiling is a win-more scenario.
It's not cost-effective to build GPUs with tiling since SoIC-X d2w costs + AID per config make it unviable.
Not sure if SoIC is required. Regular 2.5D could also be an option which could be more cost effective.

So no AID + stacked Die, only one GPU chiplet for 2.5D integration.
 

basix

Senior member
Oct 4, 2024
303
601
96
Look at Nvidias B200 and Rubin chips. Do you see any 3D-Stacking there? I do not. Same as M3 Max. Just two chips side by side. The chips contain all you need (SMs, command processor, LLC, memory interface). Glue multiple of them together with 2.5D packaging. But instead of using only 2 Die, use N Die.

This is not much different compared to AT3 and AT4 packaging with a host Die. Just that you now could chain multiple GPU chiplets to build a bigger GPU. Maybe 1...4x GPU Die (e.g. 32/64/96/128 CU in total)
 
Last edited:

marees

Platinum Member
Apr 28, 2024
2,207
2,857
96
Look at Nvidias B200 and Rubin chips. Do you see any 3D-Stacking there? I do not. Same as M3 Max. Just two chips side by side. The chips contain all you need (SMs, command processor, LLC, memory interface). Glue multiple of them together with 2.5D packaging. But instead of using only 2 Die, use N Die.

This is not much different compared to AT3 and AT4 packaging with a host Die. Just that you now could chain multiple GPU chiplets to build a bigger GPU. Maybe 1...4x GPU Die (e.g. 32/64/96/128 CU in total)
rdna 6 will be a tick.

anything major then you are looking at rdna 7
 

adroc_thurston

Diamond Member
Jul 2, 2023
8,361
11,122
106
Look at Nvidias B200 and Rubin chips
compute.
Same as M3 Max.
a stinky mess (and TBDR, not comparable at all).
The chips contain all you need (SMs, command processor, LLC, memory interface). Glue multiple of them together with 2.5D packaging. But instead of using only 2 Die, use N Die.
Doesn't work for modern IMR GPUs doing modern engines.
Split LLCs in particular would be catastrophic.
This is not much different compared to AT3 and AT4 packaging with a host Die.
There is no 'host die' the GPU is self-sufficient.
CCD attach just gives you an APU config.
 

basix

Senior member
Oct 4, 2024
303
601
96
N4C would have already featured split LLCs, my friend. Multiple AID connected with silicon bridges.

The difference between my idea and N4C would simply be, that you do not have separated SED and AID. These two would be merged into one Die and therefore 3D-Stacking is not required. The base concept regarding splitting the GPU in multiple parts is the very same. If N4C would have worked, my idea would work as well.
 

adroc_thurston

Diamond Member
Jul 2, 2023
8,361
11,122
106
N4C would have already featured split LLCs, my friend. Multiple AID connected with silicon bridges.
No, MALL was striped (by the nature of being a memside cache).
The base concept regarding splitting the GPU in multiple parts is the very same. If N4C would have worked, my idea would work as well.
It's not, N4c tile to tile was SoIC everywhere. No 2.5D present.
 

basix

Senior member
Oct 4, 2024
303
601
96
What about this picture? Can you see the CoWoS-L silicon bridges? ;)
93RoGf7DP4SqzVkt.jpg

 

basix

Senior member
Oct 4, 2024
303
601
96
That would be a very weird packaging procedure:
No hybrid bonding of one overlapping chip to two other chips has ever been shown (not to my knowledge).

EMIB / EFB alike packaging would make much more sense to me. Especially for N4C if considering it would have featured MALL. RDNA3 even could afford to split its MALL up into 6 slices with organic 2.5D packaging. Using silicon bridges would be technologically more advanced than that.

A split L2-Cache would be more demanding, I agree on that one. But I see hope there. AMD does revamp its L0/L1 caching with RDNA5. That should lead to reduced bandwidth requirements towards L2$. And the local SE / WGP scheduling paradigm should lead to the fact that each shader engine works on its own as much as possible, mostly working with the locally attached L2$ slice. Data locality paradigms should minimize global L2$ accesses across multiple Die anyways. Yes, L2$ splitting is more difficult. But such problems can be solved by smart ideas and good engineering ;)
 

adroc_thurston

Diamond Member
Jul 2, 2023
8,361
11,122
106
That would be a very weird packaging procedure:
No hybrid bonding of one overlapping chip to two other chips has ever been shown (not to my knowledge).
That's the whole point.
N4c was a novel packaging exercise + system design innovation.
RDNA3 even could afford to split its MALL up into 6 slices with organic 2.5D packaging.
Again, MALL is striped across the address space.
A split L2-Cache would be more demanding, I agree on that one
Not "demanding", unworkable for client.
That should lead to reduced bandwidth requirements towards L2$
Higher than ever.