Question Zen 6 Speculation Thread

511 · Jan 25, 2026

poke01 said:
nobody cares about that, local AI is dead on Windows

Yeah no one but Intel sells desktop silicon in Notebooks maybe NPU is for that

jdubs03 · Jan 25, 2026

511 said:
Yeah no one but Intel sells desktop silicon in Notebooks maybe NPU is for that

Huh? 9955HX(3D) and co would like to have a word.

marees · Jan 25, 2026

511 said:
Yeah no one but Intel sells desktop silicon in Notebooks maybe NPU is for that

medusa point-1, medusa point-3 (bumblebee), medusa premium all have to be copilot+ compliant

basix · Jan 25, 2026

Win2012R2 said:
It's relevant for performance because each Intel core got dedicated AMX unit, where as what you are trying to say about AMD is that they will have 1 per cluster (of multiple cores) - this will run like a dog, maybe ok in 2 core cluster, but who will make that for economy-cores.

We are talking about Zen 6 LP cores. So one AMX unit per 2C/4C cluster sounds very reasonable to me. I assume the big ones will have dedicated AMX units per core (or at least more AMX units per cluster than just 1). Kepler_L2 just said one unit per 2 cores. OK for me.

adroc_thurston · Jan 25, 2026

Claudiovict said:
This means AMD/Radeon will *actually* try to compete with nvidia?

No, they exited client discrete graphics altogether.

MS_AT · Jan 25, 2026

511 said:
don't want silicon wasted on NPU

I wonder if people were saying the same thing about FPUs in the 80s 😉

adroc_thurston said:
No, they exited client discrete graphics altogether.

They means AMD so we will get only APUs or they means nVidia which will admit that spending time on gamers is a waste of time?😉

adroc_thurston · Jan 25, 2026

MS_AT said:
They means AMD so we will get only APUs or they means nVidia which will admit that spending time on gamers is a waste of time?😉

Means the discrete presence is whatever dies from other markets they have on hand.
RDNA5 has 4 parts, but 6 can have 1. Or 3. And none are built to intercept any Nvidia configs.

Kaffeekenan · Jan 25, 2026

adroc_thurston said:
underestimation of the decade man.
we're entering the pain zone, summon the cenobites.

How can you say that? Do you have any info on DMR for example? Anything to back up your "underestimation of the decade" claim? Because as an AMD shareholder I would appreciate that outcome a lot.

basix · Jan 25, 2026

For AMD it makes sense to build GPUs like AT3 and AT4. Dual-Use for Mobile and Desktop (and small form factor ML/AI machines due to large memory pool support). And due to dual-use, AMD gets much better economies of scale compared to historic dGPUs. The large memory pool additionally opens up other market opportunities, where memory capacity is more important than peak FLOPS or memory bandwidth (professional-use dGPUs for scientific simulations, EDA, ML/AI and maybe also rendering).

AT2 and AT0 seem to be leveraging dual-use as well (Microsoft's XBox Next and game streaming). Makes sense as well at only 10-20% dGPU market share and de-risks your chip design project.

So from a HW- and chip-design perspective, AMD seems to act very reasonable for RDNA5.

Looking at RDNA6, AMD could double-down on that philosophy by finally designing a fully chiplet based architecture. One GPU chiplet for all market needs. Scale-up according to market conditions (1...N chiplets). With that you do not directly compete with Nvidia GPU SKUs. But you could scale your GPUs accordingly if necessary. And you get very nice economies of scale.

One important thing is nearly completely independent from HW design: FSR and ROCm support. Besides of designing good HW, AMD needs to put enough effort into its gaming GPU related SW portfolio. Otherwise AMD has good consumer/prosumer HW but due to lacking SW it underperforms.

adroc_thurston · Jan 25, 2026

Kaffeekenan said:
How can you say that?

Anyone who knows 2026/2027 server comp positioning would tell you that.

Kaffeekenan said:
Do you have any info on DMR for example?

yes it sucks a fat one.

Kaffeekenan said:
Anything to back up your "underestimation of the decade" claim?

well, pay me, and I'll tell you.

basix said:
Looking at RDNA6, AMD could double-down on that philosophy by finally designing a fully chiplet based architecture. One GPU chiplet for all market needs. Scale-up according to market conditions (1...N chiplets).

uh, no, GPU tiling is a win-more scenario.
It's not cost-effective to build GPUs with tiling since SoIC-X d2w costs + AID per config make it unviable.

basix said:
One important thing is nearly completely independent from HW design: FSR and ROCm support. Besides of designing good HW, AMD needs to put enough effort into its gaming GPU related SW portfolio. Otherwise AMD has good consumer/prosumer HW but due to lacking SW it underperforms.

they need the shills. and the choppa. they have neither; thus they're dead.

basix · Jan 25, 2026

adroc_thurston said:
uh, no, GPU tiling is a win-more scenario.
It's not cost-effective to build GPUs with tiling since SoIC-X d2w costs + AID per config make it unviable.

Not sure if SoIC is required. Regular 2.5D could also be an option which could be more cost effective.

So no AID + stacked Die, only one GPU chiplet for 2.5D integration.

adroc_thurston · Jan 25, 2026

basix said:
Not sure if SoIC is required

It is.

basix said:
Regular 2.5D could also be an option which could be more cost effective.

a) how would that work
b) where the pJ/b be at

basix said:
So no AID + stacked Die, only one GPU chiplet for 2.5D integration.

How would that work?

basix · Jan 25, 2026

Look at Nvidias B200 and Rubin chips. Do you see any 3D-Stacking there? I do not. Same as M3 Max. Just two chips side by side. The chips contain all you need (SMs, command processor, LLC, memory interface). Glue multiple of them together with 2.5D packaging. But instead of using only 2 Die, use N Die.

This is not much different compared to AT3 and AT4 packaging with a host Die. Just that you now could chain multiple GPU chiplets to build a bigger GPU. Maybe 1...4x GPU Die (e.g. 32/64/96/128 CU in total)

marees · Jan 25, 2026

basix said:
Look at Nvidias B200 and Rubin chips. Do you see any 3D-Stacking there? I do not. Same as M3 Max. Just two chips side by side. The chips contain all you need (SMs, command processor, LLC, memory interface). Glue multiple of them together with 2.5D packaging. But instead of using only 2 Die, use N Die.

This is not much different compared to AT3 and AT4 packaging with a host Die. Just that you now could chain multiple GPU chiplets to build a bigger GPU. Maybe 1...4x GPU Die (e.g. 32/64/96/128 CU in total)

rdna 6 will be a tick.

anything major then you are looking at rdna 7

basix · Jan 25, 2026

Might be, let's see. AMD did conceptualise chiplet GPUs already with N4C.

adroc_thurston · Jan 25, 2026

basix said:
AMD did conceptualise chiplet GPUs already with N4C.

Extremely extensive SoIC-X tour de force.
Tiled gfx is always a win-more solution.

adroc_thurston · Jan 25, 2026

basix said:
Look at Nvidias B200 and Rubin chips

compute.

basix said:
Same as M3 Max.

a stinky mess (and TBDR, not comparable at all).

basix said:
The chips contain all you need (SMs, command processor, LLC, memory interface). Glue multiple of them together with 2.5D packaging. But instead of using only 2 Die, use N Die.

Doesn't work for modern IMR GPUs doing modern engines.
Split LLCs in particular would be catastrophic.

basix said:
This is not much different compared to AT3 and AT4 packaging with a host Die.

There is no 'host die' the GPU is self-sufficient.
CCD attach just gives you an APU config.

basix · Jan 25, 2026

N4C would have already featured split LLCs, my friend. Multiple AID connected with silicon bridges.

The difference between my idea and N4C would simply be, that you do not have separated SED and AID. These two would be merged into one Die and therefore 3D-Stacking is not required. The base concept regarding splitting the GPU in multiple parts is the very same. If N4C would have worked, my idea would work as well.

adroc_thurston · Jan 25, 2026

basix said:
N4C would have already featured split LLCs, my friend. Multiple AID connected with silicon bridges.

No, MALL was striped (by the nature of being a memside cache).

basix said:
The base concept regarding splitting the GPU in multiple parts is the very same. If N4C would have worked, my idea would work as well.

It's not, N4c tile to tile was SoIC everywhere. No 2.5D present.

basix · Jan 25, 2026

What about this picture? Can you see the CoWoS-L silicon bridges? 😉

AMD "Navi 4C" GPU Detailed: Shader Engines are their own Chiplets

"Navi 4C" is a future high-end GPU from AMD that will likely not see the light of day, as the company is pivoting away from the high-end GPU segment with its next RDNA4 generation. For AMD to continue investing in the development of this GPU, the gaming graphics card segment should have posted...

www.techpowerup.com

adroc_thurston · Jan 25, 2026

basix said:
Can you see the CoWoS-L silicon bridges? 😉

well, not oS, there's no 2.5D slab.
It's bridges hybrid bonded to the AIDs on top.
You know, pkg diagrams are trivial to read.

basix · Jan 25, 2026

That would be a very weird packaging procedure:
No hybrid bonding of one overlapping chip to two other chips has ever been shown (not to my knowledge).

EMIB / EFB alike packaging would make much more sense to me. Especially for N4C if considering it would have featured MALL. RDNA3 even could afford to split its MALL up into 6 slices with organic 2.5D packaging. Using silicon bridges would be technologically more advanced than that.

A split L2-Cache would be more demanding, I agree on that one. But I see hope there. AMD does revamp its L0/L1 caching with RDNA5. That should lead to reduced bandwidth requirements towards L2$. And the local SE / WGP scheduling paradigm should lead to the fact that each shader engine works on its own as much as possible, mostly working with the locally attached L2$ slice. Data locality paradigms should minimize global L2$ accesses across multiple Die anyways. Yes, L2$ splitting is more difficult. But such problems can be solved by smart ideas and good engineering 😉

adroc_thurston · Jan 25, 2026

basix said:
That would be a very weird packaging procedure:
No hybrid bonding of one overlapping chip to two other chips has ever been shown (not to my knowledge).

That's the whole point.
N4c was a novel packaging exercise + system design innovation.

basix said:
RDNA3 even could afford to split its MALL up into 6 slices with organic 2.5D packaging.

Again, MALL is striped across the address space.

basix said:
A split L2-Cache would be more demanding, I agree on that one

Not "demanding", unworkable for client.

basix said:
That should lead to reduced bandwidth requirements towards L2$

Higher than ever.

Kepler_L2 · Jan 25, 2026

adroc_thurston said:
Doesn't work for modern IMR GPUs doing modern engines.
Split LLCs in particular would be catastrophic.

Well gfx13+ are TBIMR

adroc_thurston · Jan 25, 2026

Kepler_L2 said:
Well gfx13+ are TBIMR

As is everything Nvidia since Maxwell.
You're still have no real tile storage (or a real tiler) to distribute screenspace across.

Question Zen 6 Speculation Thread

Diamond Member

Golden Member

Platinum Member

Senior member

Diamond Member

Senior member

Diamond Member

Member

Senior member

Diamond Member

Senior member

Diamond Member

Senior member

Platinum Member

Senior member

Diamond Member

Diamond Member

Senior member

Diamond Member

Senior member

Diamond Member

Senior member

Diamond Member

Golden Member

Diamond Member