Discussion RDNA 5 / UDNA (CDNA Next) speculation

marees · Dec 7, 2025

When we would actually expect to see next-gen Nvidia Rubin or AMD RDNA 5 is surely going to be 2027 at this point - and if RTX 50-series Super is delayed (Nvidia will argue that it hasn't announced anything, therefore there is no delay) and arrives in, say, Q3 2026, we would expect that to push back Rubin until much later into 2027.

Graphics innovation drives the PC gaming market and updates to existing lines could be a long way off, let alone true next-gen upgrades. How long really depends on what 3nm inventory AMD and Nvidia have secured from chip manufacturer TSMC and if memory is available at a reasonable price to make a sizeable roll-out possible.

https://www.digitalfoundry.net/news/2025/12/the-ai-tech-crunch-are-we-looking-at-a-dark-age-for-gaming-hardware

Win2012R2 · Dec 8, 2025

marees said:
and arrives in, say, Q3 2026

It won't because memory prices will be super high at least all of 2026, it's far more sensible for Nvidia to use GDDR7 for their Rubin CPX thingy.

AMD's unique chance is to launch RDNA 5 next year, hope they don't squander it.

basix · Dec 8, 2025

jpiniero said:
Being stuck with Clamshell the entire generation would be rough.

I'm sure IO is going to be mega expensive on N3... but I would much rather do 256-bit without clamshell to get the extra memory bandwidth.

Well, maybe it is simply not required to have more memory bandwidth:
- Revamped CUs and respective low level caches (bigger capacity)
- Out-of-order execution (increase hardware utilization of ALUs and cache)
- Maybe L0 cache sharing across multiple CUs (reduce wasted SRAM capacity, reduce LLC & DRAM bandwidth requirements)
- Universal compression (smaller memory footprint, reduce bandwidth requirements)
- DGF & DMM (smaller memory footprint, reduce bandwidth requirements)
- Neural techniques like NTC which aim to reduce data fetching from DRAM but rather use more compute from matrix engines (whose performance mostly rely on CU low level caches) to generate or extract data and information
- Work graphs and procedural algorithms with dynamic execution on CU level (reduces code footprints and reduces bandwidth pressure from higher level caches and DRAM)

All those things aim to maximize usage of low level CU resources, increase data locality and reduce load on higher level structures like LLC and DRAM.
It seems that there is much going on regarding rethinking GPU architecture as a whole.

Win2012R2 · Dec 9, 2025

Tigerick said:
What is the usage of 36Gbps memory die beside RDNA5?

It's for Rubin CPX

basix · Dec 10, 2025

As 42 Gbps and 48 Gbps are already announced: Wouldn't Rubin CPX benefit from more bandwidth?

For gaming I do not see bandwidth demands at these levels. 32...36 Gbps seem to be fine on e.g. a 512bit 6090 or AMDs equivalent based on AT0.

adroc_thurston · Dec 10, 2025

basix said:
As 42 Gbps and 48 Gbps are already announced: Wouldn't Rubin CPX benefit from more bandwidth?

yes and no.
prefill is compute-bound, that's why they're even using an HBMless part anywhere GPGPU.

basix · Dec 12, 2025

Sure, no doubt about that. One of the best overviews you get in this paper: https://arxiv.org/pdf/2410.18038

pod-attention-unlocking-full-prefill-decode-overlap-for-faster-llm-inference-0.png

48 Gbps at 512bit results in 3 TB/s bandwidth. "Big Rubin" with HBM4 will feature 20 TB/s or even more. That is still a huge difference.
There will be cases, where higher bandwidth on Rubin CPX will be beneficial. And when I pay millions for a NVL144 setup, a few Dollars more for faster GDDR7 will not matter regarding overall cost.

marees · Dec 12, 2025

Coming back to this long ago leaked roadmap by wccftech, a few observations:

Absolutely no laptop/mobile discrete GPUs. It is all medusa premium/halo if it comes to that
No halo desktop part. Does it mean AT0 is strictly for xcloud, professional use cases & not for gaming 🤔
RDNA 5 desktop replaces not only N44 but also N33. A big clue of AT4 then. (With AT3 & AT2 positioned above it)

https://twitter.com/x/status/1833491928739930330

adroc_thurston · Dec 12, 2025

marees said:
Coming back to this long ago leaked roadmap by wccftech, a few observations

this is horribly antique and stuff changed a few times since

RnR_au · Dec 12, 2025

adroc_thurston said:
this is horribly antique and stuff changed a few times since

Well it is kinda rude of AMD not to leak a new consumer gpu roadmap. I mean, its Christmas and all...

adroc_thurston · Dec 12, 2025

RnR_au said:
Well it is kinda rude of AMD not to leak a new consumer gpu roadmap

They were kind enough to not even mention client graphics in the client section of the FAD.

Thunder 57 · Dec 12, 2025

adroc_thurston said:
They were kind enough to not even mention client graphics in the client section of the FAD.

Not really worth mentioning on FAD, but I guess that was your point.

ToTTenTranz · Dec 12, 2025

marees said:
Coming back to this long ago leaked roadmap by wccftech, a few observations:

Absolutely no laptop/mobile discrete GPUs. It is all medusa premium/halo if it comes to that

No halo desktop part. Does it mean AT0 is strictly for xcloud, professional use cases & not for gaming 🤔

RDNA 5 desktop replaces not only N44 but also N33. A big clue of AT4 then. (With AT3 & AT2 positioned above it)

https://twitter.com/x/status/1833491928739930330

I'm guessing this is from before AMD regained a bit of trust in the GPU guys thanks to RDNA4 and decided to greenlight AT0 consumer variations.

adroc_thurston · Dec 12, 2025

ToTTenTranz said:
I'm guessing this is from before AMD regained a bit of trust in the GPU guys thanks to RDNA4 and decided to greenlight AT0 consumer variations.

This has nothing to do with RDNA4 and everything to do with Nvidia.

basix · Dec 12, 2025

ToTTenTranz said:
I'm guessing this is from before AMD regained a bit of trust in the GPU guys thanks to RDNA4 and decided to greenlight AT0 consumer variations.

AT0 could be used as a Rubin CPX alike SKU. Who knows what the main purpose of that chip is.

Elfear · Dec 12, 2025

basix said:
AT0 could be used as a Rubin CPX alike SKU. Who knows what the main purpose of that chip is.

Or it could go towards glorious gaming rigs. Give it to us AMD! Give it to the masses!! Maybe if we all say it together? 😄

adroc_thurston · Dec 12, 2025

basix said:
AT0 could be used as a Rubin CPX alike SKU.

Nowhere near enough matmul for that.

reaperrr3 · Dec 13, 2025

Win2012R2 said:
AMD's unique chance is to launch RDNA 5 next year, hope they don't squander it.

Track record of recent years suggests that new gens launch either around the most pessimistic rumored time, or even later.
And the latest most pessimistic rumors point at Computex 2027 for RDNA5, and H2 2027 for consumer Rubin... so yeah.

Though I don't think AMD is "squandering" anything, because that would suggest they could rake in tons of money/loads of market share if they launched early, and I highly doubt that.
We don't know the wafer price delta of N3P vs. N4P, and we don't know how expensive the G7/LP5X mem configs of ATx will stack up per GB vs current G6 solutions, or NV's G7 contracts, for that matter.

Anyway, I think 2026 is largely dead as far as real new (GPU) products go, because neither the memory situation nor N3P maturity are where either IHV wants them to be.

Win2012R2 · Dec 14, 2025

reaperrr3 said:
the memory situation

It's dire for sure, but for high end model this can still work - a lot bigger pressure on lower end models, sadly most likely AMD will delay release to 2027 based on all those factors

Gideon · Dec 16, 2025

Slightly OT, but people here asked me why I think that current Vulkan and DX12 are ancient and stupid and should have been replaced with the previous console gen:

Sebbi's fresh blog post is a gem discussing this matter:

No Graphics API — Sebastian Aaltonen

Graphics APIs and shader languages have significantly increased in complexity over the past decade. It’s time to start discussing how to strip down the abstractions to simplify development, improve performance, and prepare for future GPU workloads.

www.sebastianaaltonen.com

marees · Dec 17, 2025

Gideon said:
Slightly OT, but people here asked me why I think that current Vulkan and DX12 are ancient and stupid and should have been replaced with the previous console gen:

Sebbi's fresh blog post is a gem discussing this matter:

No Graphics API — Sebastian Aaltonen

Graphics APIs and shader languages have significantly increased in complexity over the past decade. It’s time to start discussing how to strip down the abstractions to simplify development, improve performance, and prepare for future GPU workloads.

www.sebastianaaltonen.com

Takeaways:

My prototype API shows what is achievable with modern GPU architectures today, if we mix the best bits from all the latest APIs. It is possible to build an API that is simpler to use than DirectX 11 and Metal 1.0, yet it offers better performance and flexibility than DirectX 12 and Vulkan. We should embrace the modern bindless hardware.

HLSL and GLSL shading languages were designed over 20 years ago as a framework of 1:1 elementwise transform functions (vertex, pixel, geometry, hull, domain, etc). Memory access is abstracted and array handling is cumbersome as there’s no support for pointers. Despite 20 years of existence, HLSL and GLSL have failed to accumulate a library ecosystem. CUDA in contrast is a composable language exposing memory directly and new features (such as AI tensor cores) though intrinsics. CUDA has a broad library ecosystem, which has propelled Nvidia into $4T valuation. We should learn from it.

Min spec hardware

Nvidia Turing (RTX 2000 series, 2018) introduced ray-tracing, tensor cores, mesh shaders, low latency raw memory paths, bigger & faster caches, scalar unit, secondary integer pipeline and many other future looking features. Officially PCIe ReBAR support launched with RTX 3000 series, but there exists hacked Turing drivers that support it too, indicating that the hardware is capable of it. This 7 year old GPU supports everything we need. Nvidia just ended GTX 1000 series driver support in fall 2025. All currently supported Nvidia GPUs could be supported by our new API.

AMD RDNA2 (RX 6000 series, 2020) matched Nvidia’s feature set with ray-tracing and mesh shaders. One year earlier, RDNA 1 introduced coherent L2$, new L1$ level, fast L0$, generic DCC read/write paths, fastpath unfiltered loads and a modern SIMD32 architecture. PCIe ReBAR is officially supported (brand name “Smart Access Memory”). This 5 year old GPU supports everything we need. AMD ended GCN driver support already in 2021. Today RDNA 1 & RDNA 2 only receive bug fixes and security updates, RDNA 3 is the oldest GPU receiving game optimizations. All the currently supported AMD GPUs could be supported by our API.

Intel Alchemist / Xe1 (2022) were the first Intel chips with SM 6.6 global indexable heap support. These chips also support ray-tracing, mesh shaders, PCIe ReBAR (discrete) and UMA (integrated). These 3 year old Intel GPUs support everything we need.

Apple M1 / A14 (MacBook M1, iPhone 12, 2020) support Metal 4.0. Metal 4.0 guarantees GPU memory visibility to CPU (UMA on both phones and computers), and allows the user to write 64-bit pointers and 64-bit texture handles directly into GPU memory. Metal 4.0 has a new residency set API, solving a crucial usability issue with bindless resource management in the old useResource/useHeap APIs. iOS 26 still supports iPhone 11. Developers are not allowed to ship apps that require Metal 4.0 just yet. iOS 27 likely deprecates iPhone 11 support next year. On Mac, if you drop Intel Mac support, you have guaranteed Metal 4.0 support. M1-M5 = 5 generations = 5 years.

ARM Mali-G710 (2021) is ARMs first modern architecture. It introduced their new command stream frontend (CSF), reducing the CPU dependency of draw call building and adding crucial features like multi-draw indirect and compute queues. Non-uniform index texture sampling is significantly faster and the AFBC lossless compressor now supports 16-bit floating point targets. G710 supports Vulkan BDA and descriptor buffer extensions and is capable of supporting the new 2025 unified image layout extension with future drivers. The Mali-G715 (2022) introduced support for ray-tracing.

Qualcomm Adreno 650 (2019) supports Vulkan BDA, descriptor buffer and unified image layout extensions, 16-bit storage/math, dynamic rendering and extended dynamic state with the latest Turnip open source drivers. Adreno 740 (2022) introduced support for ray-tracing.

PowerVR DXT (Pixel 10, 2025) is PowerVRs first architecture that supports Vulkan descriptor buffer and buffer device address extensions. It also supports 64-bit atomics, 8-bit and 16-bit storage/math, dynamic rendering, extended dynamic state and all the other features we require.

ToTTenTranz · Dec 17, 2025

marees said:
PowerVR DXT (Pixel 10, 2025) is PowerVRs first architecture that supports Vulkan descriptor buffer and buffer device address extensions. It also supports 64-bit atomics, 8-bit and 16-bit storage/math, dynamic rendering, extended dynamic state and all the other features we require.

Too bad that Pixel 11's Tensor G6 is apparently going back a generation to CXT, though.

randomhero · Dec 17, 2025

Gideon said:
Slightly OT, but people here asked me why I think that current Vulkan and DX12 are ancient and stupid and should have been replaced with the previous console gen:

Sebbi's fresh blog post is a gem discussing this matter:

No Graphics API — Sebastian Aaltonen

Graphics APIs and shader languages have significantly increased in complexity over the past decade. It’s time to start discussing how to strip down the abstractions to simplify development, improve performance, and prepare for future GPU workloads.

www.sebastianaaltonen.com

I remember Sebbi being proponent of "no API" way back in early 2010s when GCN arch came out and consoles with same arch. And he could always back it up with deep expertise.
Thank you for the link, I always enjoy Sebbi's deep dives(even though I'm way out of my depth here, pun intended 😀).

blackangus · Dec 18, 2025

ToTTenTranz said:
Too bad that Pixel 11's Tensor G6 is apparently going back a generation to CXT, though.

Yeah I always enjoyed reading what Sebbi has to say. Smart and well thought out dude.
Thanks for this!

soresu · Dec 18, 2025

ToTTenTranz said:
Too bad that Pixel 11's Tensor G6 is apparently going back a generation to CXT, though.

Probably because DXT has a bug IMG TEC haven't disclosed but Google noticed during initial SoC engineering.

Discussion RDNA 5 / UDNA (CDNA Next) speculation

Platinum Member

Golden Member

Senior member

Golden Member

Senior member

Diamond Member

Senior member

Platinum Member

Diamond Member

Platinum Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Senior member

Diamond Member

Diamond Member

Member

Golden Member

Platinum Member

Platinum Member

Min spec hardware​

Senior member

Member

Senior member

Diamond Member

Min spec hardware