Question Zen 6 Speculation Thread

marees · 2025-08-22T04:20:23-0400

Joe NYC said:
The low end to mid dGPUs would be using LPDDR instead of GDDR.

And a professional / AI version of the same card could put in the biggest LPDDR chips that is currently on the market.

I find it quite intriguing that AMD is able to contain so much of the bandwidth requirements using on die L2s, to the point that AMD can get away with LPDDR memory.

Imagine the RDNA 5 version of 3060 12gb could be a 10050xt 64gb !!

Joe NYC · 2025-08-22T04:31:05-0400

marees said:
Imagine the RDNA 5 version of 3060 12gb could be a 10050xt 64gb !!

AMD could be trolling NVidia on low end with big LPDDR5 memory sizes.

What I wonder though, why not doe the same throughout the stack?

If NVidia can go up to 512 bit memory bus (8 channels) why not go to 6 LPDDR6 channels with high end card, which would be 576 bits?

Because then, if the biggest LPDDR5 memory chip is 64 GB, the high end professional / AI card could have 384 GB, which would be maximum trolling.

Or maybe split high end gaming to use GDDR7 and high end professional / AI with LPDDR6.

But, it's also good to keep in mind that NVidia is also doing a lot of work with LPDDR across the product stack, so AMD may not have a monopoly here.

MS_AT · 2025-08-22T07:49:19-0400

Hail The Brain Slug said:
The unreal compilation workload shows similar memory activity in HWINFO to running a memory stress test like karhu or occt memory stress.

I don't deny it. But I guess you don't hit peak bandwidth. At least I don't when compiling llvm with llvm on Windows, which I guess is somewhat comparable load to Unreal. Plus looking at various tests from phoronix compilation does not really scale well with memory bandwidth. https://www.phoronix.com/review/threadripper-9000-ddr5-6400-4800/2 https://www.phoronix.com/review/amd-ryzen9-9950x-ddr5/4 https://www.phoronix.com/review/8-12-channel-epyc-9005/4 (the compilation benchmarks are usually in the middle of the page).

Or maybe it would be better to say that once you satisfy a specific BW threshold (likely related to how many cores you have and how much parallelism exists in the build), compilation workload does not care anymore and the BW requirement is relatively modest in the first place.

blackangus · 2025-08-22T08:02:52-0400

MS_AT said:
I don't deny it. But I guess you don't hit peak bandwidth. At least I don't when compiling llvm with llvm on Windows, which I guess is somewhat comparable load to Unreal. Plus looking at various tests from phoronix compilation does not really scale well with memory bandwidth. https://www.phoronix.com/review/threadripper-9000-ddr5-6400-4800/2 https://www.phoronix.com/review/amd-ryzen9-9950x-ddr5/4 https://www.phoronix.com/review/8-12-channel-epyc-9005/4 (the compilation benchmarks are usually in the middle of the page).

Or maybe it would be better to say that once you satisfy a specific BW threshold (likely related to how many cores you have and how much parallelism exists in the build), compilation workload does not care anymore and the BW requirement is relatively modest in the first place.

I'd GUESS that compilation in both cases is memory ops bound rather than bandwidth bound. But that is just a guess.

Magras00 · 2025-08-22T09:14:58-0400

Kepler_L2 said:
The funny thing about "AMD traded MALL in Strix Point for the NPU" is that both MALL and NPU are deprecated in the future

Cost saving measures right?

NPU + GPU vs GPU only but beefed up ML HW = higher TOPS/mm^2.
Imagination Technologies also deprecating NPU with e-series GPU.

Navi 48 MALL has around 10% area overhead. IDK how much area L2+MALL unification (matching NVIDIA L2) adds vs L2 only.

AMD clearly going the NVIDIA route with no MALL in RDNA 5. Look at how NVIDIA's GB203 can squeeze all non-shader GPU core logic into a area footprint only ~10% higher than Navi 48. Obviously different µarch, but Navi 48 could've been significantly smaller with 64MB L2 instead of 8MB L2 + 64MB MALL.

Magras00 · 2025-08-22T09:40:34-0400

Kepler_L2 said:
For actual chiplet stuff (SED+AID+MID) I think so, but for the quasi-monolithic GPUs like ATx with just GMD+MID it's gone.

What is GMD and AID?

SED = Shader Engine Die

MID = Media Interface Die

GMD = ?

AID = ?

GFX13 and later introduces SE autonomy (L1 cache) so could MALL could effectively be L2 in actual chiplet designs? Just can't see why SED chiplets requires one more cache level when SEs are almost completely autonomous in GFX13 and later. Unless AMD is clustering SEs and need a local larger datastore than L1's within SEs. Makes sense as IIRC one RDNA 4 SE is only ~26-27mm^2.
This assumes SEs aren't completely rearchitected and blown up to massive size with RDNA 6 and later (RDNA 5 isn't going full MCM). Do we even know anything about RDNA 6 rn?

Maybe UDNA is about laying groundwork for heterogenous compute in GFX14+? Hypothetically mixing and matching CDNAx, RDNAx, and ASIC chiplets with one frontend (OS only sees one GPU). Effectively a Zen-like GPU chiplet architecture.
Maybe a few tidbits at AMD's 2025 Financial Analyst day in November?

ToTTenTranz · 2025-08-22T09:44:54-0400

Joe NYC said:
I went to the mountain, and I was given these words of wisdom, carved on marble tablets, to share with you:

View attachment 129070

I guess Medusa Halo Mini is the actual Strix Point successor to go into handhelds and small gaming laptops.
128bit LP5X might be a bottleneck, but LP6 would be a big improvement.

AMD going with LPDDR for their dGPUs sounds crazy, though. How is pJ-per-byte for LP6 compared to GDDR7?
It does lend to very high memory capacities that many will be looking for running AI models locally.

jpiniero · 2025-08-22T09:47:35-0400

ToTTenTranz said:
AMD going with LPDDR for their dGPUs sounds crazy, though.

That's probally just MLID being MLID.

marees · 2025-08-22T09:52:15-0400

ToTTenTranz said:
I guess Medusa Halo Mini is the actual Strix Point successor to go into handhelds and small gaming laptops.
128bit LP5X might be a bottleneck, but LP6 would be a big improvement.

AMD going with LPDDR for their dGPUs sounds crazy, though. How is pJ-per-byte for LP6 compared to GDDR7?
It does lend to very high memory capacities that many will be looking for running AI models locally.

If true then medusa (mini) premium will also go into 2027 version of Xbox rog Ally handhelds

ToTTenTranz · 2025-08-22T11:25:29-0400

marees said:
If true then medusa (mini) premium will also go into 2027 version of Xbox rog Ally handhelds

And Sony's rumored "PS6 handheld" gets DOA unless it's a third of the price.

soresu · 2025-08-22T12:59:28-0400

ToTTenTranz said:
And Sony's rumored "PS6 handheld" gets DOA unless it's a third of the price.

If people can play PS4 and PS5 games (plus maybe PS3 and certainly PS1 + 2 emulation on Zen6) on it then it has something other handhelds do not have legally, or without some measure of technical aptitude.

Don't underestimate the draw of off the shelf legacy game libraries.

jpiniero · 2025-08-22T13:27:57-0400

ToTTenTranz said:
And Sony's rumored "PS6 handheld" gets DOA unless it's a third of the price.

Given that the more expensive Ally is $899, it'll be more interesting to see how much more any Zen 6 Windows handhelds end up being.

marees · 2025-08-22T13:32:47-0400

jpiniero said:
Given that the more expensive Ally is $899, it'll be more interesting to see how much more any Zen 6 Windows handhelds end up being.

My best guess for 2027 handhelds

Sony $500
Xbox $1000

Kepler_L2 · 2025-08-22T14:53:29-0400

Magras00 said:
What is GMD and AID?

SED = Shader Engine Die

MID = Media Interface Die

GMD = ?

AID = ?

GFX13 and later introduces SE autonomy (L1 cache) so could MALL could effectively be L2 in actual chiplet designs? Just can't see why SED chiplets requires one more cache level when SEs are almost completely autonomous in GFX13 and later. Unless AMD is clustering SEs and need a local larger datastore than L1's within SEs. Makes sense as IIRC one RDNA 4 SE is only ~26-27mm^2.
This assumes SEs aren't completely rearchitected and blown up to massive size with RDNA 6 and later (RDNA 5 isn't going full MCM). Do we even know anything about RDNA 6 rn?

Maybe UDNA is about laying groundwork for heterogenous compute in GFX14+? Hypothetically mixing and matching CDNAx, RDNAx, and ASIC chiplets with one frontend (OS only sees one GPU). Effectively a Zen-like GPU chiplet architecture.
Maybe a few tidbits at AMD's 2025 Financial Analyst day in November?

GMD = Graphics Memory Die

AID = Active Interposer Die

adroc_thurston · 2025-08-22T14:54:48-0400

ToTTenTranz said:
How is pJ-per-byte for LP6 compared to GDDR7?

much much better

Search

Question Zen 6 Speculation Thread

marees

Golden Member

Joe NYC

Diamond Member

MS_AT

Senior member

blackangus

Senior member

Magras00

Junior Member

Magras00

Junior Member

ToTTenTranz

Senior member

jpiniero

Lifer

marees

Golden Member

ToTTenTranz

Senior member

soresu

Diamond Member

jpiniero

Lifer

marees

Golden Member

Kepler_L2

Senior member

adroc_thurston

Diamond Member

TRENDING THREADS