Discussion RDNA 5 / UDNA (CDNA Next) speculation

Page 71 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

soresu

Diamond Member
Dec 19, 2014
4,242
3,744
136
it's just that the current WMMA/dual-issue implementation is too limited and doesn't trigger often enough, so AMD stopped advertising dual-issue FLOPs.
Oh thank god, it's like they were just begging for nVidia to point it out.
 

Kepler_L2

Golden Member
Sep 6, 2020
1,074
4,604
136
RDNA4 (and RDNA3) already have 128 ALUs per CU, it's just that the current WMMA/dual-issue implementation is too limited and doesn't trigger often enough, so AMD stopped advertising dual-issue FLOPs.
RDNA5 will have the same ALU count per WGP/CU, only with much better utilisation.
They still advertise dual-issue FLOPs, what they don't do is advertise 2x number of "streaming processors" because they don't want another Bulldozer lawsuit. I expect that will change with RDNA5
 

marees

Platinum Member
Apr 28, 2024
2,199
2,854
96
The Medusa 3nm-SoC-Die only still has an NPU because it'll be used as stand-alone APU as well, and its small RDNA3.5 IGP can't handle AI workloads.
What is the spec of medusa point 1 monolithic apu ?

Is it 4 + 8 + 2 ??
 

dangerman1337

Senior member
Sep 16, 2010
437
74
91
Updated RDNA5-AT2 Lineup Speculation

View attachment 133874

  • As explained in SWV thread, RDNA5 will get double SP per CU. Thus AT2 with max 70CU will get 140CU in old format. That's explain 20% faster performance than RTX4080. It also means AT2 GPU is severely bounded by memory bandwidth.
  • Therefore, AMD do not need to clock as high as RDNA4, I am expecting 2GHz+ not ~3GHz. It also means AT2 has headroom to grow. That's why I suspect AMD is reserving XTX model with full die of 70CU for future 40Gbps and 4GB GDDR7 die to appear. That explains the cancellation of AT1 cause AT2 XTX is good enough to compete with upcoming Rubin-70Ti with 24GB 256-bit memory bus.
  • There are leaks saying RDNA5 dGPU will be released in Q2 next year. I am actually expecting early announcement in Q1. Thus, the cancellation of RTX-50 Super make senses because it will be bloodbath for NV :p: no amount of overclocking will save RTX-50 Super series. NV needs to speed up the release of Rubin dGPU. If Rubin dGPUs are indeed fabbed by 3N (variant of 3X), then the earliest release date would be Q3 next year. That gives AMD early head up of next gen dGPU war.
  • That's why I am predicting AMD will set higher price point for AT2-70XT and AT2-70. AMD will keep selling RX9070XT until NV able to launch Rubin-60.
  • AMD most likely will keep selling N48 in the form of 9070GRE by then. And no, AT3 and AT4 are NOT for cheap dGPU lineup, period. Now that we know Medusa will have XDNA3, where do you think the NPU will reside in AT3, huh? ;)
Are we sure AT2 is 70CUs? Feels like if AT0 is 192 then AT2 will be 72 CUs since that's the multipler scaling up with the memory bit bus (512/192 = 2.666666666666667, 2.666666666666667 x 72 = 192 as well). Unless AMD has a reason to cut 2 CUs out there.
 
  • Like
Reactions: marees

reaperrr3

Member
May 31, 2024
168
491
96
Unless AMD has a reason to cut 2 CUs out there.
If both kepler_l2 (4 SE) and MLID (70 CUs for full chip) are right, it looks like an asymmetric 9-9-9-8 (WGP) config.
Cutting 1 WGP/2 CUs from one SE like this was possibly the only way to sqeeze more than 64 CUs into the maximum desired chip size/layout.
72 CUs possibly would've required a different layout with more whitespace and lead to worse PPA.
 
Last edited:
  • Like
Reactions: MrMPFR

maddie

Diamond Member
Jul 18, 2010
5,203
5,612
136
Are we sure AT2 is 70CUs? Feels like if AT0 is 192 then AT2 will be 72 CUs since that's the multipler scaling up with the memory bit bus (512/192 = 2.666666666666667, 2.666666666666667 x 72 = 192 as well). Unless AMD has a reason to cut 2 CUs out there.
Yield?
 

ETI4711

Junior Member
Oct 25, 2025
16
34
51
There are leaks saying RDNA5 dGPU will be released in Q2 next year. I am actually expecting early announcement in Q1. Thus, the cancellation of RTX-50 Super make senses because it will be bloodbath for NV :p: no amount of overclocking will save RTX-50 Super series. NV needs to speed up the release of Rubin dGPU. If Rubin dGPUs are indeed fabbed by 3N (variant of 3X), then the earliest release date would be Q3 next year. That gives AMD early head up of next gen dGPU war.
At the Financial Analyst Day 2025 AMD talked about the line up for 2026. Medusa was mentioned because it is based on Zen 6.

Why do you think AMD has not mentioned RDNA5? I would suggest because RDNA5 is not in the 2026 line up.
 
  • Like
Reactions: Tlh97 and MrMPFR

marees

Platinum Member
Apr 28, 2024
2,199
2,854
96
At the Financial Analyst Day 2025 AMD talked about the line up for 2026. Medusa was mentioned because it is based on Zen 6.

Why do you think AMD has not mentioned RDNA5? I would suggest because RDNA5 is not in the 2026 line up.
Still doesn't rule out xbox

I believe xbox has a great opportunity to release RDNA 5 hardware (magnus) along with GTO (provided the software guys are ready for it)

Valve is coming for Microsoft’s lunch. The xbox team needs to communicate their strategies better to the public. There is now a decent window to strike if they launch next gen in another 1 years time (25th anniversary) & capitalize on the GTO delay. Xbox next could be the best way to play GTO — if they manage to fix the windows-xbox software merge in time by then

Any next-gen Xbox demands a UI that works. In that way, Valve is already far ahead of whatever is happening at Microsoft. Valve’s Linux-based SteamOS software is easier to navigate on handhelds, and it could be coming to more devices like a VR headset or a PC-like console. Microsoft “has always been chasing Valve,” video game researcher and NYU Stern School of Business professor Joost van Dreunen told Gizmodo. It’s a smaller, more agile company run by ex-Microsoft programmer Gabe Newell. Steam—which most developers think is a monopoly—makes so much money, you could consider it a yacht factory for the Valve CEO. It’s not likely to sell out to Microsoft or anybody else any time soon.
 

Win2012R2

Golden Member
Dec 5, 2024
1,300
1,358
96
I believe xbox has a great opportunity to release RDNA 5 hardware (magnus)
No chance with current RAM/NAND trends for the next couple of years, frankly they won't sell many even with old pricing and even if they could match whatever Sony's cooking: they've lost the war. Sony themselves recently said that console cycles getting longer, it's no brainer for them to wait till 2029 and get proper upgrade on N2.
 
  • Like
Reactions: Mopetar

eek2121

Diamond Member
Aug 2, 2005
3,472
5,147
136
When's the last time dram prices remained stupid for 2 years?

We are in a bubble. The last bubble (crypto) absolutely wrecked GPU affordability. We will never recover from that. The same could happen for DRAM pricing, or the AI bubble could pop and prices will drop due to a large amount of product being on the market.

The future is unpredictable.

My biggest regret is not getting a high speed 96gb kit. Instead, I got a DDR5-6000 64gb kit.
 

ToTTenTranz

Senior member
Feb 4, 2021
911
1,522
136
My biggest regret is not getting a high speed 96gb kit. Instead, I got a DDR5-6000 64gb kit.
I was on the verge of getting a X870E + 9950X3D + 4x48GB DDR5 combo in September, that I postponed due to the news of a dual-vcache CPU from AMD.


I guess I either spend $2k on RAM or that upgrade isn't coming anytime soon.
 

lightmanek

Senior member
Feb 19, 2017
521
1,284
136
I was on the verge of getting a X870E + 9950X3D + 4x48GB DDR5 combo in September, that I postponed due to the news of a dual-vcache CPU from AMD.


I guess I either spend $2k on RAM or that upgrade isn't coming anytime soon.
Recently, investing in gold is almost as profitable as investing in RAM .... go for it :D
 
  • Like
Reactions: marees

Win2012R2

Golden Member
Dec 5, 2024
1,300
1,358
96
When's the last time dram prices remained stupid for 2 years?
This time it's different ™️

Console release needs at least 12-18 months price fix, but already 2026 is sold out, and no doubt 2027 will sell out very soon, then a year of slow price decreases at best, so we are now looking at 2028 at best.

Sony is under zero pressure now, if anything they are better off maxing out PS5 sales, GTA6 will help push PS5 Pro nicely too.
 
  • Like
Reactions: Mopetar

dangerman1337

Senior member
Sep 16, 2010
437
74
91
If both kepler_l2 (4 SE) and MLID (70 CUs for full chip) are right, it looks like an asymmetric 9-9-9-8 (WGP) config.
Cutting 1 WGP/2 CUs from one SE like this was possibly the only way to sqeeze more than 64 CUs into the maximum desired chip size/layout.
72 CUs possibly would've required a different layout with more whitespace and lead to worse PPA.
I thought Kepler L2 speculated it was 72 CUs?
Sony is under zero pressure now, if anything they are better off maxing out PS5 sales, GTA6 will help push PS5 Pro nicely too.
Better hope Sony can get GTA VI at 60 FPS or so on the PS5 Pro then...
 
  • Like
Reactions: marees

reaperrr3

Member
May 31, 2024
168
491
96
I thought Kepler L2 speculated it was 72 CUs?
It sounded like kepler is only sure about the SE count, not the CU count, while MLID's info says 70 CUs for the full chip, of which 68 will be enabled on Xbox Next and possibly as few as 64 enabled for desktop cards (may depend on yields).

MLID's info could be wrong, of course, but some of the info in his leaked slide on ATx was already confirmed by others later, so there's a chance the 70CU-info could turn out to be correct, too.
 
  • Like
Reactions: dangerman1337

Win2012R2

Golden Member
Dec 5, 2024
1,300
1,358
96
Better hope Sony can get GTA VI at 60 FPS or so on the PS5 Pro then...
Seems unlikely as they have almost same CPU which is going to be main bottleneck for 60 FPS, but I guess we'll see, incentive from Sony to help make it happen will be very high.
 

Vikv1918

Member
Mar 12, 2025
74
195
66
Seems unlikely as they have almost same CPU which is going to be main bottleneck for 60 FPS, but I guess we'll see, incentive from Sony to help make it happen will be very high.
CPU performance can be scaled as well, its not fixed. You can have less NPCs/vehicles, less complex AI in a potential 60 fps mode.
 

Win2012R2

Golden Member
Dec 5, 2024
1,300
1,358
96
CPU performance can be scaled as well, its not fixed. You can have less NPCs/vehicles, less complex AI in a potential 60 fps mode.
It can be, but that will be standard Perf mode that will work on PS5 too, assuming it's still CPU bottleneck for their logic, chances are they shifted a bunch to GPU, if that's humanly possible
 

marees

Platinum Member
Apr 28, 2024
2,199
2,854
96
Anything is pretty suboptimal compared to work graphs especially if the underlying workload is either branchy, recursive, and/or data-dependent: SpMV, RTRT and PT, ML, physics simulations, PCG etc...



No it's not another DX11 -> DX12 shift in code abstraction. Work graphs completely changes how a GPU is programmed and what a GPU even is. Think of it as another GPGPU or Programmable shader moment.

One definition of GPU Work graphs could be: "Autonomous execution and scheduling of dependency-driven task graphs on a GPU with more fine-grained parallelism and dynamic execution."

This sounds nothing like Execute Indirect. Execute Indirect was about adressing the shortcomings of high level APIs not complete reinventing the API as is the case with GPU work graphs.

Work graphs in RDNA 3 vs RDNA 5
Based on my limited understanding GFX13 has aligned the HW scheduling scheme with GPU work graphs by making it as decentral and localized as possible which helps to further harness the benefits of GPU work graphs.
One example is WGP local self launch where the HW matches SW's independent task launch for a node in a work graph.
Another example is the autonomous schedulers and task dispatchers within each Shader Engine instead of relying on the command processor.

RDNA 3: everything is orchestrated through L2 cache and command processor.
RDNA 5: work is scheduled (WGS) and dispatched (ADC) within in each Shader Engine (L1 cache) and when required each WGP (L0 cache) can launch and manage its own work when a node in a work graph launches tasks.
Result: Much lower scheduling latency and more fine grained scheduling and dispatch benefiting most tasks although branchy, recursive, and/or data-dependent tasks should see the biggest benefits.

This probably only scratches the surface and with GFX13 I would expect many more changes benefitting work graphs. The schedulers will prob be upgraded to better handle the complexity of work graphs.

As for SWC that is probably most beneficial to coalescing launches.

On page 223 in the course notes PDF states this "We can specify how many records to the same node should be grouped together at maximum, and how many threads the group that is processing this collection should have."

RDNA introduced Wave-32, wouldn't at all be surprised if RDNA 5 introduces additional wave execution granularity and flexibility since this could greatly benefit work graphs in some cases, but there's still no patent for this.

@Kepler_L2 is this accurate or am I missing some important details?
developers also noted that this version also brings experimental support for D3D12 Work Graphs. They note that there are no games that use this technology yet, but based on their testing of it, it already runs faster than native implementation after converting it to normal compute shaders.

Source: VKD3D-Proton

 

MrMPFR

Member
Aug 9, 2025
191
387
96
developers also noted that this version also brings experimental support for D3D12 Work Graphs. They note that there are no games that use this technology yet, but based on their testing of it, it already runs faster than native implementation after converting it to normal compute shaders.

Source: VKD3D-Proton

Didn't know emulating Work Graphs was even possible. Compute shaders FTW!

Also look at the performance characteristics of vkd3d-proton / RADV based emulation vs native (supported RDNA 3+ and Ampere+) below. I didn't expect this but maybe WG is that early in development:


SimpleClassify
v / >FPS (native)FPS (emulated)
RX 760080~500
RX 6800n/a1339
RTX 4070400650


AMD's Compute shader rasterizer
v / >ms (WG-native)ms (WG-emulated)ms (EI-native)ms (EI-emulated)
RX 76001.70.92.91
RTX 40700.55n/a3.1


NVIDIA's Work Graph demo

v / >ms (native-WG)ms (native-compute dispatch/EI)ms (proton-WG)ms (proton-compute dispatch/EI)
RTX 40703.23.15.53.9
RX 76006.85.85.15.8

Source: VK3D-Proton/docs/workgraphs.md


Maybe there's a chance we'll see workgraphs adoption a lot sooner than post-crossgen?