Discussion RDNA 5 / UDNA (CDNA Next) speculation

adroc_thurston · Thursday at 6:58 AM

MrMPFR said:
Now with -8CU (-25%), +50% size SE, halved RB and Rasterizer (-50%), no MALL, L2 ?<32MB, LPDDR5X/?LPDDR6 weak memory interface.
1/(-25% WGP (0.75) x +15% freq (1.15)) = +16% IPC when compute bound. Higher in raster games with -50% RB + Rasterizer offset (unless bigger HW).

see assumptions like these are hard for a veeeeery different uarch

MrMPFR · Thursday at 8:28 AM

adroc_thurston said:
see assumptions like these are hard for a veeeeery different uarch

I'm just listing the things the weak points (on paper specs) of AT4 based on what has been said so far. Obviously clean slate approach is mandatory.

+15% clk is high end of TSMC node scaling for N3P. +10% clk= 21% higher raster IPC.

Are the number of vector ALUs per WGP/CU vs RDNA4 WGP different? Any other assumptions that need to be changed?

Kepler_L2 · Thursday at 8:43 AM

MrMPFR said:
I'm just listing the things the weak points (on paper specs) of AT4 based on what has been said so far. Obviously clean slate approach is mandatory.

+15% clk is high end of TSMC node scaling for N3P. +10% clk= 21% higher raster IPC.

Are the number of vector ALUs per WGP/CU vs RDNA4 WGP different? Any other assumptions that need to be changed?

256 for both but RDNA5 has much fewer dual-issue restrictions.

jpiniero · Thursday at 11:29 AM

Kepler_L2 said:
Yeah 9060 XT perf at desktop power levels should be possible. For Medusa Premium it would be lower due to power limits and also sharing mem BW with CPU.

That's quite the stretch. Strix Halo is a good 30% slower, and that's with 256-bit@8000.

luro · Thursday at 12:18 PM

https://twitter.com/x/status/2016698967309111356

Dunno wtaf that means

adroc_thurston · Thursday at 12:22 PM

jpiniero said:
That's quite the stretch. Strix Halo is a good 30% slower, and that's with 256-bit@8000.

This is for discrete option with 192b LP6.

marees · Thursday at 12:38 PM

luro said:
https://twitter.com/x/status/2016698967309111356

Dunno wtaf that means

he is saying it could be prosumer (inference) rather than gaming

it is not CDNA but RDNA 5 for sure. but despite that it could not be a gaming card — which would be expected to release only in 2027 or later

basix · Thursday at 2:56 PM

So a Rubin CPX competitor from AMD?

jpiniero · Thursday at 3:35 PM

adroc_thurston said:
This is for discrete option with 192b LP6.

So it'd be the same as Strix Halo (and much less than the 9060 XT)

adroc_thurston · Thursday at 3:37 PM

jpiniero said:
So it'd be the same as Strix Halo (and much less than the 9060 XT)

Bandwidth? Yeah.
Perf at 130-150W? No.

jpiniero · Thursday at 3:39 PM

adroc_thurston said:
Bandwidth? Yeah.
Perf at 130-150W? No.

It's going to suffocate from lack of bandwidth. Pretty obvious that the marketing's going to entirely be around VRAM. Well, depending on how expensive the 8 GB LPDDR6 is then.

adroc_thurston · Thursday at 3:49 PM

jpiniero said:
It's going to suffocate from lack of bandwidth.

lmao

jpiniero said:
Pretty obvious that the marketing's going to entirely be around VRAM.

The marketing is that DT AICs get scraps from someone else's table. neat.

jpiniero said:
Well, depending on how expensive the 8 GB LPDDR6 is then.

You can't source 8G LP6 on a 192B mem shoreline. 16G is the cutoff.

reaperrr3 · Thursday at 7:22 PM

jpiniero said:
So it'd be the same as Strix Halo (and much less than the 9060 XT)

Don't underestimate how inferior RDNA3(.5) is in terms of bw efficiency vs. RDNA4, let alone RDNA5.

I mean, the 7900 XT already scales worse at 4K than the XTX, meaning even 320bit / 80MB IF$ aren't enough to fully saturate bw at that perf level for RDNA3.

Meanwhile, N48 does more with even less IF$/mem bw, and RDNA5 appears to have several more bw/cache efficiency improvements.

I'd wager the main challenge for desktop AT3/4 will be per-GB-cost of LP5X/LP6 at the planned release window, not bandwidth.

adroc_thurston · Thursday at 7:49 PM

reaperrr3 said:
I'd wager the main challenge for desktop AT3/4 will be per-GB-cost of LP5X/LP6 at the planned release window, not bandwidth.

b/w will still be the limitation if you're trying to juice nice and desktop-y 170W+ into AT3 doe.

branch_suggestion · Thursday at 8:13 PM

jpiniero said:
It's going to suffocate from lack of bandwidth. Pretty obvious that the marketing's going to entirely be around VRAM. Well, depending on how expensive the 8 GB LPDDR6 is then.

Just for the record, AT4 LP6 minimum is 16GB, AT3 is 32GB.
MDSP minimum is realistically 32GB.
LP6 is gonna be a fair bit cheaper per GB than G7.

marees · Thursday at 9:14 PM

branch_suggestion said:
Just for the record, AT4 LP6 minimum is 16GB, AT3 is 32GB.
MDSP minimum is realistically 32GB.
LP6 is gonna be a fair bit cheaper per GB than G7.

lpddr6 or lpddr5x for AT4 & MDSP ?

adroc_thurston · Thursday at 9:17 PM

marees said:
lpddr6 or lpddr5x for AT4 & MDSP ?

This has already been answered

basix · 2026-01-30T03:44:42-0500

16GB for AT4 cards would be really good. 32GB for AT3 looks like overkill and really weird when AT2 will feature only 18/24 GByte of GDDR7 memory.

If using LPDDR5X with 12.7 GT/s (Samsung has announced such memory modules):
- 508 GByte/s should be enough for AT3. AT2 with 192bit GDDR7 running at 32 Gbps will have 768 GB/s of bandwidth, so +50% CU and +50% bandwidth would match perfectly
- And then use 16 or 24 GByte for AT3

adroc_thurston · 2026-01-30T04:43:47-0500

basix said:
And then use 16 or 24 GByte for AT3

You can't.
DRAM densities too high to get that on an octachannel.

MrMPFR · 2026-01-30T06:53:33-0500

adroc_thurston said:
You can't.
DRAM densities too high to get that on an octachannel.

Yeah not happening. Samsung only lists LPDDR5X-10677 x64 at 96-192Gb densities. So 12-24GB x 4 = 48-96GB.

Cadence says LPDDR6 starts at 4GB device densities (x24). 16 of those in AT3 = 64GB. If this is wrong at x48 is device bus width then 32GB is still a lot.

LPDDR6: The Next-Generation LPDDR Device Standard and How It Differs from LPDDR5

Low-power DDR SDRAM has been one of the most widely used memories in the semiconductor market tod ay where it’s used in a diverse set of applications that spans

community.cadence.com

T2098 · 2026-01-30T16:27:30-0500

They certainly do make 512Mx64 (32gbit / 4GiB) LPDDR5X packages though. I've found up to 9600MT/s from Micron, so it's not a huge stretch to think that the 10677 MT/s grade might exist by the time RDNA5 releases.

basix · 2026-01-31T08:03:35-0500

MrMPFR said:
Yeah not happening. Samsung only lists LPDDR5X-10677 x64 at 96-192Gb densities. So 12-24GB x 4 = 48-96GB.

There are many 48 Gbit LPDDR5X-9600 with x64 listed. Depends if Samsung can crank up the speed of such modules.

And the presented 12.7 GT/s module uses only 16 Gbit chips: https://ieeexplore.ieee.org/document/10904794
- 8 * 16 Gbit = 16 GByte

A 16Gb 12.7Gb/s/pin LPDDR5-Ultra-Pro DRAM with 4-Phase Self-Calibration and AC-Coupled Transceiver Equalization in a 5th-Generation 10nm DRAM Process

16 GByte for both AT3 and AT4 would be sufficient for gaming cards.
18 GByte for AT2 would be OK as well, with an optional 24 GByte version on top.

Edit:
I am wondering, what RDNA5's Universal Compression will bring us. If data can be compressed by let's say 1.3x on average, this means net 1.3x more effective DRAM capacity.

reaperrr3 · 2026-01-31T10:01:46-0500

basix said:
32GB for AT3 looks like overkill and really weird when AT2 will feature only 18/24 GByte of GDDR7 memory.

Maybe, but it would also allow AMD to ask a (relatively) high price for the AT3 top dog with LPDDR6 and cut the number of PCIe lanes even on AT3, which they didn't do on N44.
Also would allow to run local LLMs on it.

basix said:
- 508 GByte/s should be enough for AT3. AT2 with 192bit GDDR7 running at 32 Gbps will have 768 GB/s of bandwidth, so +50% CU and +50% bandwidth would match perfectly
- And then use 16 or 24 GByte for AT3

We don't know yet if desktop AT2 will get more than the 64 active CUs the leaked slide from MLID suggested, in that case it'd only be 33% more CUs.

basix said:
There are many 48 Gbit LPDDR5X-9600 with x64 listed. Depends if Samsung can crank up the speed of such modules.

They're lower capacity because they're older generations on older, already fairly mature processes, so there's unlikely to be much of an improvement.

basix said:
And the presented 12.7 GT/s module uses only 16 Gbit chips: https://ieeexplore.ieee.org/document/10904794
- 8 * 16 Gbit = 16 GByte

16 GByte for both AT3 and AT4 would be sufficient for gaming cards.

Reading the description on that 12.7 GT/s LP5X, and considering only Samsung makes them, the question is what their yield, volume, required voltage and - most importantly - price-per-GB will look like in practice.

LP6-10667 offers higher bandwidth per channel (equal to hypothetical 16 GT/s LP5X), probably needs less voltage, and will be produced by all 3 memory manufacturers, so probably be cheaper per GB.
So even cutting the interface of AT3 to 75% width and go with 24GB LP6-10667@288bit might still be a better overall solution than 16GB of this "Ultra-Pro" (probably also Ultra-expensive) Samsung-only LP5X-12700.

LP6-10667@288bit would still give 480 GB/s, and 24GB has less risk of the PCIe-interface ever becoming a bottleneck.
Full config AT3 will likely perform around 9070 and has only 8x PCIe, so putting only 16GB on it may actually be risky.

basix said:
Edit:
I am wondering, what RDNA5's Universal Compression will bring us. If data can be compressed by let's say 1.3x on average, this means net 1.3x more effective DRAM capacity.

Are we sure it works that way?
We've had DeltaColorCompression and internal compression on GPUs with ongoing improvements for over a decade, but it never really reduced VRAM capacity requirements in any noticeable way, only bandwidth efficiency.
The only way it could reduce capacity needs would be if data is stored compressed even in VRAM.

MrMPFR · 2026-01-31T10:51:35-0500

reaperrr3 said:
Reading the description on that 12.7 GT/s LP5X, and considering only Samsung makes them, the question is what their yield, volume, required voltage and - most importantly - price-per-GB will look like in practice.

I doubt it'll even happen. Prob just just Proof of Concept similar to the other insane spec we've seen. Samsung 42.5gbps GDDR7, SK Hynix 48gbps GDDR7. ISSCC announcements is just showing off.
Lab spec =/= final mass produceable spec, unless perhaps major fabrication process changes.

Win2012R2 · 2026-01-31T11:00:59-0500

basix said:
I am wondering, what RDNA5's Universal Compression will bring us.

That should finally make StorageDirect work - via dedicated hardware decompressor on board of GPU, rather than using GPU for that.

reaperrr3 said:
The only way it could reduce capacity needs would be if data is stored compressed even in VRAM.

That's what AMD implies will happen - decompression is a lot quicker than (good) compression, so doing compression once when placing asset in memory makes sense.

Discussion RDNA 5 / UDNA (CDNA Next) speculation

Diamond Member

Senior member

Golden Member

Lifer

Member

Diamond Member

Platinum Member

Senior member

Lifer

Diamond Member

Lifer

Diamond Member

Member

Diamond Member

Senior member

Platinum Member

Diamond Member

Senior member

Diamond Member

Senior member

Junior Member

Senior member

Member

Senior member

Golden Member