Question Zen 6 Speculation Thread

adroc_thurston · Sep 30, 2025

Joe NYC said:
More microbumps, wider bus?

I mean yeah moar SDPs is a go.

Joe NYC said:
Another guess would be creating connections to other CCDs, which would lead to a question "what for?"

you guys need to stop with this meme.

Joe NYC · Sep 30, 2025

511 said:
L3<-> L3 transfer perhaps?

I wonder if that ever gets solved that would fit satisfactorily into how AMD handles L3...

It would be a "nice to have" in a desktop, but on server, it would only make sense if the entire CPU worked as unit, rather than various virtualized subsets. So probably too few big use cases in the overall picture.

adroc_thurston · Sep 30, 2025

Joe NYC said:
I wonder if that ever gets solved that would fit satisfactorily into how AMD handles L3...

It would be a "nice to have" in a desktop, but on server, it would only make sense if the entire CPU worked as unit, rather than various virtualized subsets. So probably too few big use cases in the overall picture.

NUCA is nice and elegant.
Forget about that.

itsmydamnation · Sep 30, 2025

yeah i dont get the obsession with making your L3 crappy and making coherency a nightmare.

marees · Sep 30, 2025

Clarifications by high yield

https://twitter.com/x/status/1972835889912136119

https://twitter.com/x/status/1972894903203192902

adroc_thurston · Sep 30, 2025

marees said:
Clarifications by high yield

https://twitter.com/x/status/1972835889912136119

https://twitter.com/x/status/1972894903203192902

They do have the same substrate sizes available for both -R and -L.
The difference is down to bump pitch and line spacing.

Joe NYC · Sep 30, 2025

itsmydamnation said:
yeah i dont get the obsession with making your L3 crappy and making coherency a nightmare.

Also, it is becoming a moot point after AMD moved :
- from 16MB pool of L3 in Bergamo
- to 32 MB pool of L3 in Turin
- to 128MB pool of L3 in Venice
- to maybe > 200 MB pool of L3 in Florence

LightningZ71 · Sep 30, 2025

More bumps = more connections. Either wider data pathways for existing payouts(edit: layouts), or payouts(edit: layouts) are changing and they need more wires to connect more chiplets. Maybe there will be a 4 chiplet package on desktop with a separate iGPU chiplet?

Joe NYC · Sep 30, 2025

LightningZ71 said:
More bumps = more connections. Either wider data pathways for existing payouts, or payouts are changing and they need more wires to connect more chiplets. Maybe there will be a 4 chiplet package on desktop with a separate iGPU chiplet?

Or a separate NPU chiplet.

BorisTheBlade82 · Sep 30, 2025

Or daisy-chaining of CCDs </s>

But yes, they tripled RAM bandwidth to around 1.6 TByte/s for the Top-End. With 8 CCD you'd need an interconnect to be at least as wide as 200 GByte/s in order to saturate this. And that is with each CCD demanding an equal share. Current GMI-Wide delivers 128 GByte/s (read) IIRC.
So 256 GByte/s/CCD or even more don't seem like overkill to me.

marees · Sep 30, 2025

marees said:
Clarifications by high yield

https://twitter.com/x/status/1972835889912136119

https://twitter.com/x/status/1972894903203192902

One more comment by high yield

https://x.com/highyieldYT/status/1973049669749248150

adroc_thurston · Sep 30, 2025

marees said:
One more comment by high yield

https://x.com/highyieldYT/status/1973049669749248150

*loud incorrect buzzer noise*
WRONG, EMIB is embedded into the ABI laminate.

511 · Sep 30, 2025

Cowos-L ain't though

Josh128 · Sep 30, 2025

As long as it doesnt use CoWpAt-Y its cool.

adroc_thurston · Sep 30, 2025

LightningZ71 said:
with a separate iGPU chiplet?

who is this for

mmaenpaa · Sep 30, 2025

basix said:
Afaik NPUs are also better regarding time to first token or in other words execution latency. They work better with small batch sizes. For many applications and single customer use cases this is helpful. But for big number crunching it should be better to move towards the GPU in the longterm. The GPU does also have massive support from a big and wide memory system. Replicate that for an NPU is a waste of sand.

But funnily enough, doesn't Qualcomm add a better link between GPU and NPU to move matrix computations to the NPU (the GPU does not support such acceleration)

Regarding software:
HW differences could be abstracted away by HALs and APIs.

I have been thinking about getting a Copilot+ laptop but frankly are there any "real" uses/programs for NPU yet? So far I have not spotted anything useful. For example I would like Copilot app on windows actually use NPU. Or Copilot addons in office.

marees · Sep 30, 2025

mmaenpaa said:
I have been thinking about getting a Copilot+ laptop but frankly are there any "real" uses/programs for NPU yet? So far I have not spotted anything useful. For example I would like Copilot app on windows actually use NPU. Or Copilot addons in office.

Spellcheck

Josh128 · Sep 30, 2025

marees said:
Spellcheck

So the answer is still no. No real uses as of yet.

mmaenpaa · Sep 30, 2025

marees said:
Spellcheck

Spellcheck as in "To use Microsoft's AI spellcheck in office"? It actually uses NPU?

Doug S · Sep 30, 2025

Spellcheck doesn't need any sort of AI. Checking grammar needs to be a bit smarter (though nowhere near needing 50 TOPS that Copilot requires) but spellcheck has been around since before CPUs went 32 bits.

adroc_thurston · Sep 30, 2025

Doug S said:
Spellcheck doesn't need any sort of AI. Checking grammar needs to be a bit smarter (though nowhere near needing 50 TOPS that Copilot requires) but spellcheck has been around since before CPUs went 32 bits.

yeah but they're gonna do spellcheck using a hugeass xformer eating 4GB of your DRAM just for that.
welcome to the future, gramps.

marees · Sep 30, 2025

adroc_thurston said:
yeah but they're gonna do spellcheck using a hugeass xformer eating 4GB of your DRAM just for that.
welcome to the future, gramps.

What do you mean 4 GB

It can easily gobble up more

Edit: I believe some MacBook ran out of RAM due to spellcheck

basix · Oct 1, 2025

511 said:
L3<-> L3 transfer perhaps?

Compared to IFOP, you can do that now through a wider, faster and lower latency interface to the IOD 😉

BorisTheBlade82 said:
Or daisy-chaining of CCDs </s>

But yes, they tripled RAM bandwidth to around 1.6 TByte/s for the Top-End. With 8 CCD you'd need an interconnect to be at least as wide as 200 GByte/s in order to saturate this. And that is with each CCD demanding an equal share. Current GMI-Wide delivers 128 GByte/s (read) IIRC.
So 256 GByte/s/CCD or even more don't seem like overkill to me.

That is a very interesting idea, indeed. For Zen 6 I do not expect something like that to happen. For Zen 7 I think not as well (16/33C CCDs, bigger L3$ and simply faster cores are already a decent enough update). But Zen 7 could still introduce it (core count mania). Would be sick to see a 512C Zen 7 SKU 😉

As the beachfront of the IOD is limited, daisy-chaining makes very much sense in the mid- to longterm. It are just a few hundred of GByte/s if putting 2x CCDs in series. Such a concept opens up the door to very huge core count scalings without adding too much cost (much bigger CCDs, much more IOD area, ...).

Even with 512 GByte/s it is not an issue, the power draw is still much lower than 128 GByte/s of an existing IFOP interface (~10x less power required)
RDNA3 MCDs already delivered ~900 GByte/s per chiplet
Zen 7 will probably introduce an outsourced L3$ on a bottom 3D-Stacked Die. Adding 2x IF-PHY on two sides of this base Die (for daisy-chaining), which gets manufactured in an older node like N4, would not hurt regarding costs.

511 · Oct 1, 2025

basix said:
That is a very interesting idea, indeed. For Zen 6 I do not expect something like that to happen. For Zen 7 I think not as well (16/33C CCDs, bigger L3$ and simply faster cores are already a decent enough update). But Zen 7 could still introduce it (core count mania). Would be sick to see a 512C Zen 7 SKU 😉

512C totally doubt this with the meager density gains they have to make the package significantly larger 384C seems possible

ToTTenTranz · Oct 1, 2025

mmaenpaa said:
I have been thinking about getting a Copilot+ laptop but frankly are there any "real" uses/programs for NPU yet?

You have AMD's GAIA and Intel's OpenVINO that can integrate Ollama. Both can get you to use an NPU for running LLMs on Windows, with which you can also make agents.

If you're willing to dedicate a couple of hours to set this up, you can get a NPU to run LLMs for you locally with a lower power consumption compared to running it on the iGPU.

EDIT: even easier than using OpenVINO, Intel has the AI Playground app that also makes use of its NPUs:

Introducing AI Playground | Intel Gaming Access

Updated 02/13/2026 AI Playground, an easy-to-use Generative AI app suite for Intel AI PCs, powered by Intel Arc GPUs, either built-in Intel® Core™ Ultra Processors or via Intel® Arc™ discrete GPUs. AI Playground is available as an open-source project and packaged Windows desktop installer. NEW...

game.intel.com

Check out a demo of the app here, at the timestamp:

mmaenpaa said:
For example I would like Copilot app on windows actually use NPU. Or Copilot addons in office.

You could try to implement a local running LLM to work on Outlook and Word, it's supposedly possible.. but IIRC it's not easy. Microsoft isn't super interested in letting people off the hook on paying $30/month for the full Copilot M365 experience.

Question Zen 6 Speculation Thread

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Platinum Member

Diamond Member

Diamond Member

Platinum Member

Diamond Member

Senior member

Platinum Member

Diamond Member

Diamond Member

Banned

Diamond Member

Member

Platinum Member

Banned

Member

Diamond Member

Diamond Member

Platinum Member

Senior member

Diamond Member

Golden Member