Discussion RDNA 5 / UDNA (CDNA Next) speculation

Page 73 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

MrMPFR

Member
Aug 9, 2025
190
386
96
SK Hynix has presented 48Gbps GDDR7 memory die
Just a engineering PoC similar to Samsung's 42.5gbps GDDR7 at ISSCC 2025. At ISSCC 2026 we'll know just how far fetched the design is.

It's 10x3GB
Remember reading something about 48-64Gb densities for GDDR7. Maybe 6GB GDDR7 modules at some point unless they're just placeholders?

Even if you can do AI texture magic, you will probably also have lots of ML/AI models that would need to be loaded in memory.
Repeating past stuff but it's not just NTC but neural asset compression for all kinds of pregenerated assets, neural BRDF and material shader code compression, DGF+DMM (speculative) for BVH overhead and VRAM (non-RT), work graphs for scratchpad and procedural generation, sampler feedback and prob more.

So effective VRAM increased well beyond on paper spec. 24-30GB leaves a ton of VRAM for AI stuff.

I remember seeing it was 192bit LPDDR5X.
That's from MLID. Has Kepler confirmed leveraged mem type?
 

MrMPFR

Member
Aug 9, 2025
190
386
96
Once again following up on prev stuff.
Do we know when AMD dGPU GFX13 IP and Sony's PS6 SoC each were design complete? Kepler said this in Q1 but that leaves a lot of questions unanswered.
This would be useful to eliminate RDNA5 patents moving forward, that's unless they were filed later on.

There was also some info about PS6 GPU being an early fork of GFX13. I thought AMD and Sony were trying to merge IP with Project Amethyst. Is that just a R&D collab?
 
  • Like
Reactions: Tlh97 and marees

CakeMonster

Golden Member
Nov 22, 2012
1,661
841
136
Repeating past stuff but it's not just NTC but neural asset compression for all kinds of pregenerated assets, neural BRDF and material shader code compression, DGF+DMM (speculative) for BVH overhead and VRAM (non-RT), work graphs for scratchpad and procedural generation, sampler feedback and prob more.

So effective VRAM increased well beyond on paper spec. 24-30GB leaves a ton of VRAM for AI stuff.
I'd be overjoyed to be proven 100% wrong and ignorant on this, but how fast can such a transition of the whole pipeline to AI be made? Are any studios working on this for upcoming titles? Don't development of bigger games take like 7 years now? End of next gen will probably be ~10 years from now, if games by then don't need more than a couple GB VRAM max usage for actual game assets and AI does magic on top of that would be a miracle, one I'd love to see. But I guess I'm just old enough to assume things always take more time than most people think.
 

MrMPFR

Member
Aug 9, 2025
190
386
96
I'd be overjoyed to be proven 100% wrong and ignorant on this, but how fast can such a transition of the whole pipeline to AI be made? Are any studios working on this for upcoming titles? Don't development of bigger games take like 7 years now? End of next gen will probably be ~10 years from now, if games by then don't need more than a couple GB VRAM max usage for actual game assets and AI does magic on top of that would be a miracle, one I'd love to see. But I guess I'm just old enough to assume things always take more time than most people think.
No earlier than post-crossgen I think, +2031-2032 likely. TW4 might be the first game to selectively use some of it (thinking NRC, NTC and some other MLPs) as NVIDIA said latest RTX technologies in CES 2025 blog, but all of it prob no earlier than 6-7 years from now.
So plenty of time for tech to mature and HW to become more powerful. Hopefully PS6 GPU can run it all.

That ~10 years probably a stretch and these technologies can be bolted on later as we've seen with stuff like ReSTIR PT games.
Adaption will likely be gradual. Thinking NTC (on feedback fallback for DX12U compliant HW) and DGF+DMM first, then neural materials and other neural code compression, and work graphs (PCG and scratchpad savings) last as it requires complete engine and game design revamp.

NTC has already received a lot of research interest from IHVs and game companies so it'll happen sooner or later. Maybe as soon as nextgen.
Geometry will prob be handled by DGF combined with nextgen DMM.
6-7X compression easily doable here.
For some type of games PCG might make pregen asset storage entirely redundant (see AMD HPG 2025 Tree paper)

BVH side savings can be massive with DMMs on top of DGF. How much IDK, but large gains vs RTX MG for sure.

Workgraphs has already shown massive potential in scratchpad savings. Compute rasterization example from GDC 2024 reduced from 3400MB to 55MB. 98.4% reduction or 62 times smaller.

Neural shader code compression shows great potential as well but it's very early on. Zorah demo claims 46MB -> 16MB despite going from simple standard game material to offline renderer quality materials.

~10X overall savings at iso-asset complexity is not unreasonable. Subject to change and might increase over time given how new the tech is. Devs can then decide if they want more asset variety or spend VRAM on AI and something else.
 
  • Like
Reactions: CakeMonster

MrMPFR

Member
Aug 9, 2025
190
386
96
Excellent question. Maybe it's the rumored LPDDR for the lower end?
No it's just 36gbps 24Gb GDDR7.

Samsung just talked about slow LPDDR6 winning some award at CES, and that's too slow for AT3. Also no Samsung page for it yet so at best sampling rn:
 
Last edited:
  • Like
Reactions: marees

Tigerick

Senior member
Apr 1, 2022
940
852
106
Hoho, the stars have started aligning in Nov 2025 ... ;)

Samsung GDDR7.jpg

FYI, RTX5080 Super is supposed to use 32Gbps 3GB GDDR7 memory die. What is the usage of 36Gbps memory die beside RDNA5? Currently NV is commanding more than 90% of TAM, therefore it is NV who will initial and consume the shipment of next gen GDDR7... :p

PS: MLID still think RDNA5 is two years away, hoho...
 
  • Like
Reactions: RnR_au

marees

Platinum Member
Apr 28, 2024
2,193
2,849
96
RDNA 5 RAM (by samsung) is sampling now — MLID
Details

This next-generation memory chip is built on Samsung's 12 nm (10 nm class) DRAM node and operates at 40 Gbps with a 3 GB (24 Gb) capacity. This announcement follows Samsung's recent sampling of its fastest-ever GDDR7 memory, which runs at 36 Gbps. With a 24 Gb capacity per chip, this translates to 3 GB of capacity, designed for the next generation of graphics cards. These are not Samsung's only 3 GB modules. The South Korean company is also producing 28 Gbps 3 GB modules, which are now in mass production, likely for NVIDIA's upcoming mid-cycle SUPER refresh.

 
  • Like
Reactions: SolidQ

MrMPFR

Member
Aug 9, 2025
190
386
96
- CDNA5 is rumored to use same CU and cache structure as RDNA5 (altough different sizes)
- Universal Compression could be a transparent drop-in IP around caches and memory subsystem. Should not require a major rework of the SoC and memory architecture
- Neural Arrays would be a natural fit for ML accelerators

If I had to guess, CDNA5 will share many similarities to RDNA5.
gfx1250 indicates at least a spin-off from RDNA4. But as RDNA5 will arrive some time after CDNA5, gfx1250 for CDNA5 and gfx1300 for RDNA5 somewhat makes sense.
That would be a boring outcome. If RDNA5 is wiping the slate clean then I would expect a lot more.
AMD patents and research papers introduce forward looking ideas such as flexcache (like Apple), compiler directed dataflow + control flow execution, DynEB guided globally shared L1 and much more. It'll be interesting to see how many of these ideas materialize in shipping products.

The GFX12.5 tag might just be for loosing GCN ISA baggage and aligning closer with RDNA as a half-step gen before the merged R&D pipeline with RDNA5 and CDNA 6 perhaps? Again really dissapointing if RDNA5 is just about merging and iterating on prev best practices from RDNA and CDNA. An ideal outcome would be "no stones left unturned" foundational architecture with gaming side extensions on top for GFX13 and HPC and ML extensions on top for CDNA 6, but that's just speculation for now.
 
  • Like
Reactions: marees

MrMPFR

Member
Aug 9, 2025
190
386
96
I would much rather do 256-bit without clamshell to get the extra memory bandwidth.
32gbps x 160bit / 8 = 640GB/s. 9070XT ~645GB/s
^ 50-55 vs 64 CUs. Universal compression and other RDNA5 mem saving secret sauce acting as effective mem BW multiplier.

256-bit bus is a huge waste. With bog standard 32gbps G7 that's more BW than a RTX 5080, even a RTX 4090. 192 bit would've been better for long term 24GB config without clamshell, but seems like Sony has made up their mind regarding 30GB.
 
  • Like
Reactions: booklib28 and Tlh97

MrMPFR

Member
Aug 9, 2025
190
386
96
Remember there isn't going to be much if any IC on the thing.
Sure but like I said GFX13 has UC and other mem efficiency stuff. It's really impossible to say anything for certain without any leaks or official info, but design is prob fine.
 
  • Like
Reactions: basix