Discussion RDNA 5 / UDNA (CDNA Next) speculation

Page 76 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

reaperrr3

Member
May 31, 2024
156
460
96
It’s been a while since we saw those leaked specs. I doubt AT0 is still alive.
lol, why :laughing:
AT0 has better prospects than (desktop) AT3/4, imo.

We got zero leaks for Rubin other than that "CPX" die shot that suggests GR102 (or whatever the codename is) stays at 192 SM. You think that means Rubin is dead?
I think it's wrong to assume the current status of any chips based on lack of leaks.
If anything, that's usually a good sign; cancellations often get reported more early and reliably than exact specs.
 
  • Like
Reactions: Tlh97

dangerman1337

Senior member
Sep 16, 2010
417
61
91
lol, why :laughing:
AT0 has better prospects than (desktop) AT3/4, imo.

We got zero leaks for Rubin other than that "CPX" die shot that suggests GR102 (or whatever the codename is) stays at 192 SM. You think that means Rubin is dead?
I think it's wrong to assume the current status of any chips based on lack of leaks.
If anything, that's usually a good sign; cancellations often get reported more early and reliably than exact specs.
Do we know that "CPX" is the next-gen RTX GeForce flagship die? I'm 50/50 on that personally.

I mean RTX 60/GeForce Rubin leaks are absolutely threadbare, even the venerable Kopite7Kimi hasn't leaked anything. Just feels all... weird?
 

reaperrr3

Member
May 31, 2024
156
460
96
Do we know that "CPX" is the next-gen RTX GeForce flagship die? I'm 50/50 on that personally.
Technically we don't, no.
But even with all the money they have, I just don't see Nvidia doing both CPX and a separate big gaming-focused chip.
More precisely, I think it would be naive, wishful thinking to believe they would.

Why spend hundreds of millions of dollars extra on a 2nd big die, if you can just make the AI-focused one a bit faster than GB202 in gaming and be done with it?
Gaming cards are too low-margin to bother with an extra chip (from NV's perspective, not ours ofc).
It's more cost-efficient to just make an AI-focused chip that is also capable of running games some 20-30% faster than GB202, and sell the bad salvage bins to gamers.
I could even see them go back to 384bit for the 6090, just with faster 3GB chips (hey, 4GB extra in total!). Good enough for the gamer peasants (again, their perspective), and lets them sell dies with some defective mem interfaces or -controllers, too.

If we get lucky and they get generous, they might even upgrade the Gxxx3 chip to 96 SM, clock it high enough to reach 4090 perf, and throw in a whopping 24GB for the 6080 at launch, for only 1499$. /s
 

luro

Member
Dec 11, 2022
94
124
76
lol, why :laughing:
AT0 has better prospects than (desktop) AT3/4, imo.

We got zero leaks for Rubin other than that "CPX" die shot that suggests GR102 (or whatever the codename is) stays at 192 SM. You think that means Rubin is dead?
I think it's wrong to assume the current status of any chips based on lack of leaks.
If anything, that's usually a good sign; cancellations often get reported more early and reliably than exact specs.
Because it’s AMD who has a fame of not shipping halo cards
 

MrMPFR

Member
Aug 9, 2025
154
312
96
Well, maybe it is simply not required to have more memory bandwidth:
- Revamped CUs and respective low level caches (bigger capacity)
- Out-of-order execution (increase hardware utilization of ALUs and cache)
- Maybe L0 cache sharing across multiple CUs (reduce wasted SRAM capacity, reduce LLC & DRAM bandwidth requirements)
- Universal compression (smaller memory footprint, reduce bandwidth requirements)
- DGF & DMM (smaller memory footprint, reduce bandwidth requirements)
- Neural techniques like NTC which aim to reduce data fetching from DRAM but rather use more compute from matrix engines (whose performance mostly rely on CU low level caches) to generate or extract data and information
- Work graphs and procedural algorithms with dynamic execution on CU level (reduces code footprints and reduces bandwidth pressure from higher level caches and DRAM)

All those things aim to maximize usage of low level CU resources, increase data locality and reduce load on higher level structures like LLC and DRAM.
It seems that there is much going on regarding rethinking GPU architecture as a whole.
This is a great summary but perhaps AMD wants to go even further than this. It really depends on just how clean slate GFX13 is.

#1: Maybe instead of bigger caches a universal M3-esque flexible cache to maximize compute/area and cachemem overhead.
#2: Hopefully without massive area overhead. GhOST paper indicated this is plausible
#3: Yeah like the 2020 AMD paper. Flexible clustering and global/private management based on compiler and other hints. Combined with dataflow execution this could be a gamechanger for ML. Much less pressure on L2 and VRAM. Maybe it could be expanded to other forms of WGP caches and register files. Perhaps the WGP VGPR takeover mode Cerny talked about during Road to PS5 Pro talk could be extended across multiple WGPs.
#5: This is probably a compute-to-cache tradeoff but yeah a considerable benefit on-die cachemem usage.
#7: Hopefully well beyond that.
8 days ago NVIDIA published this research paper suggesting it's possible to basically bolt dataflow execution onto existing architectures with only modest adjustments, although still far from being feature complete (see section 7). Despite this sizeable speedups and reductions in VRAM BW traffic were achieved for inference and training. And interestingly at section 8 it clearly outlines how Kitsune is leveraging tile programming, mirroring recent moves with CUDA Tile, and also how it's generally far more applicable than Work Graphs, that's generally limited to shader pipelines.
I'm bringing this up because AMD is already exploring a dataflow API paradigm shift with Work Graphs, and as a result why not go all the way and implement sweeping changes on the HW and compiler side to fundamentally change how workloads are managed on GPUs. While workgraphs might be a push nextgen graphics API standardization, even with the impressive patent derived optimizations I doubt it gives anywhere close to the full picture of what GFX13 and later could be capable off in terms of compute and ML perf. They would prob need a brand new API and clean slate compiler to fully tap into this.

Some other considerations to reduce memory and cache pressure (far from complete):
Summarizing prev info
- Decentralized and locally autonomous distributed scheduling and dispatch (less pressure on higher level caches)
- Mapping data accesses to exploit ^ (^)
- Leaf nodes (ray/tri): Prefiltering and DGF nodes = parallel fixed-point testers (increased intersections/kB of cache)
- Payload sorting (^^^)
- Deferred any-hit shaders (increased cache and ALU utilization)

New
- TLAS and upper BLAS (ray/box): Sorting rays into coherent bundles to be tested together to reduce redundant calculations (less cachemem overhead)
- Sophisticated lookup tables to reuse expensive calculations (trancendental) math, more general vector calculations (^)

The tail end end of Moore's Law demands every stone to be turned and I just hope AMD has taken the bold route rather than the cautious one. We'll see in ~2027.
 

MrMPFR

Member
Aug 9, 2025
154
312
96
Just read the excellent blogpost by Sebastian Aaltonen shared by @Gideon last week. Shocking how flawed the "modern APIs" are and new ones can't come soon enough. DX12's legacy bloat with Work Graphs bolted on top would hold back post-crossgen releases.
Compare that with a feature complete No Graphics API (DX13 and Vulkan 2.0) with accommodations (native design + extensions) for dataflow execution architecture, as described in my prev comment, that could greatly benefit the 10th era of gaming. Basically sounds like DX13 + WGs on steroids.
Especially true for developers that can't afford wiz SWEs as highlighted by @marees post. The API's design philosophy mean it's "...simpler to use than DirectX 11 and Metal 1.0, yet it offers better performance and flexibility than DirectX 12 and Vulkan." Oh and someone is working on an actual API implementation.


Some hypothetical changes and implications summarized below
- Grain of salt advised, no professional background

Sebbi's No graphics API:
  • Unified memory
  • 64-bit GPU pointers everywhere
  • Shaders = C++ like kernels
  • Bindless everything
  • Raster/RT as libraries and intrinsics
  • No descriptors
  • No PSOs, permutations and pipeline caching
  • No resource types
  • No barriers
  • No stateful driver
  • No heap enumeration
  • No memory type guessing
  • No legacy shader languages

DX12 → DX13 + Dataflow extensions:
- Command buffers → dataflow graphs
- Ressource objects → pointers
- Descriptors → bindless
- PSOs → dynamic pipelines
- CPU orchestration → GPU autonomy
- Fixed pipelines → unified compute
- Legacy bloated APIs → sleek modern API
- Bloated driver -> thin driver


Fingers crossed RDNA5 and PS6 goes all the way architecturally and even if they don't a hypothetical DX13 still sounds much better than DX12 + WGs. Sounds like we're in for an inevitable programming paradigm shift of a similar magnitude to pure fixed function → programmable shaders in the early 2000s. Add ML and PT on top and the 2030s will be truly nextgen.
 

del42sa

Member
May 28, 2013
187
347
136
What is the expectation on the memory situation?
When will it cool down?
Will lpddr5x & lpddr6 be affected?

When will RDNA 5 launch now
What will be the SKUs

Will it be something like this
  1. 50xt 12gb (AT4 24 CU) — $300
  2. 60 16gb (AT3 40? CU) — $400
  3. 60xt 16gb/24gb/32gb (AT3 48 CU) — $500/$550/$600
  4. 70 18gb (AT2 56/60 CU) — $650
  5. 70xt 18gb/24gb (AT2 72 CU) — $700/$800
  6. 80xt 24gb (AT0 128? CU) — $1200+
  7. 90xt 36gb (AT0 144? CU) — $1500+
54abf64dd6ea6085d14264d24785851379af5338f14b228fc2ba5df864dd4bda.png
 

ToTTenTranz

Senior member
Feb 4, 2021
819
1,330
136
MLID... C'mon man.

IMHO, people should cope better about the fact that MLID has been the primary source of rumors that end up being right on AMD for the past 4+ years.


Up until a couple of days ago we had people here swearing up and down that the 9950X3D2 didn't exist, which MLID leaked. Before that, we had people here swearing AMD wasn't going to launch any high-end RDNA5 SKU until MLID showed the whole range with AT0.



His Intel rumors have been weak and pretty much dead wrong, but AMD's are solid at the moment.
 
Last edited:

NTMBK

Lifer
Nov 14, 2011
10,496
5,947
136
Just read the excellent blogpost by Sebastian Aaltonen shared by @Gideon last week. Shocking how flawed the "modern APIs" are and new ones can't come soon enough. DX12's legacy bloat with Work Graphs bolted on top would hold back post-crossgen releases.
Compare that with a feature complete No Graphics API (DX13 and Vulkan 2.0) with accommodations (native design + extensions) for dataflow execution architecture, as described in my prev comment, that could greatly benefit the 10th era of gaming. Basically sounds like DX13 + WGs on steroids.
Especially true for developers that can't afford wiz SWEs as highlighted by @marees post. The API's design philosophy mean it's "...simpler to use than DirectX 11 and Metal 1.0, yet it offers better performance and flexibility than DirectX 12 and Vulkan." Oh and someone is working on an actual API implementation.


Some hypothetical changes and implications summarized below
- Grain of salt advised, no professional background

Sebbi's No graphics API:
  • Unified memory
  • 64-bit GPU pointers everywhere
  • Shaders = C++ like kernels
  • Bindless everything
  • Raster/RT as libraries and intrinsics
  • No descriptors
  • No PSOs, permutations and pipeline caching
  • No resource types
  • No barriers
  • No stateful driver
  • No heap enumeration
  • No memory type guessing
  • No legacy shader languages

DX12 → DX13 + Dataflow extensions:
- Command buffers → dataflow graphs
- Ressource objects → pointers
- Descriptors → bindless
- PSOs → dynamic pipelines
- CPU orchestration → GPU autonomy
- Fixed pipelines → unified compute
- Legacy bloated APIs → sleek modern API
- Bloated driver -> thin driver


Fingers crossed RDNA5 and PS6 goes all the way architecturally and even if they don't a hypothetical DX13 still sounds much better than DX12 + WGs. Sounds like we're in for an inevitable programming paradigm shift of a similar magnitude to pure fixed function → programmable shaders in the early 2000s. Add ML and PT on top and the 2030s will be truly nextgen.
I'm still reading through Sebbi's blog post (it's a big boi), and it definitely sounds interesting. I'm not a graphics programmer, but I've done some CUDA in a past career and had to poke around in Unreal's render code to debug issues, and the mess of shader types, resource types etc in DX12 is pretty daunting. Getting it simplified and cleaned up definitely sounds like a big win for programmer productivity and debugging.

I'm more dubious of any claims of potential performance wins. We've been down this road before with DX12/Mantle, and we didn't see a great deal. Graphics hardware is constantly changing, as is the software running on top of it, and today's beautiful simple API will undoubtedly hold back tomorrow's exciting new idea. I'm sure when we're all doing neural shading and path tracing in 10 years' time we'll all be cursing how unsuitable DX13 is.
 

dangerman1337

Senior member
Sep 16, 2010
417
61
91
Fingers crossed RDNA5 and PS6 goes all the way architecturally and even if they don't a hypothetical DX13 still sounds much better than DX12 + WGs. Sounds like we're in for an inevitable programming paradigm shift of a similar magnitude to pure fixed function → programmable shaders in the early 2000s. Add ML and PT on top and the 2030s will be truly nextgen.
But games made from RDNA5/PS6 from the ground up are a very long way off. Probably 2031 at the earliest assuming PS6 is in 2027 and even then most developers will want to run on pre-DX13 hardware like your RTX 4090s etc, can see the dumb discourse about developers in 2033 or so "discriminating" against 4090 owners.
 
  • Like
Reactions: marees

jpiniero

Lifer
Oct 1, 2010
17,026
7,419
136
20% faster than a 4080 wouldn't be too bad for the top product (assuming no consumer products using AT0 end up shipping)

So it'd be like a 6070 Ti? Figure even without the memory surge that'd be close to the 5080's MSRP... so maybe AMD would try $750-$800?
 

reaperrr3

Member
May 31, 2024
156
460
96
Ayy Eye workstation cards?
Even if all worst-case circumstances like AI demand, high wafer prices, high memory prices etc. hold up until then, AMD could probably ask $1.5k or more for a 128-144 CU/384bit/24GB garbage SKU and it'd still sell, due to NV's alternatives being either slower (6080) or MUCH more expensive (6090), so I don't see why they shouldn't ship at least one token desktop flagship at modest volume for appearances.

I mean, if they hadn't cancelled N4C a bit prematurely (or if they'd at least had an N48+50% mono replacement above N48), they surely would've shipped that too.
AMD clearly overestimated Blackwell's gaming perf and underestimated how much the AI craze would trickle down to desktop in terms of local LLMs, otherwise they'd have done something above N48.