Discussion RDNA 5 / UDNA (CDNA Next) speculation

Page 36 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Magras00

Junior Member
Aug 9, 2025
17
40
46
Seems like work graphs got a lot more potential benefits

100%. It's a bigger deal than DX12 and Vulkan but it's a new paradigm that requires re-educating engine SWE and complete rewrite of engine components. Will take a while but everyone should switch when crossgen is done. It's just much easier to work with than Execute Indirect (EI). GDC 2024 video explains the benefits:



Work Graphs is only supported on RDNA3 and later, and is really only optimized for performance in RDNA5 and later.

Still no PS5 Pro support? Stupid question of course it doesn't. RDNA 2 binaries. Custom ML and RT only things beyond RDNA 2.

@Kepler_L2 are these HW optimizations in GFX13 for Work graphs?
  1. SWC (reordering)
  2. WGS (Shader Engine level)
  3. Local launchers (WGP level)?
  4. Anything else I missed?

AMD already touted work graphs vs IE = 39% less ms for PCG Procedural content generation with 7900XTX and you're tell me it isn't even properly optimized for performance at HW level. Impressive!
Large gap in IPC gain (EI vs Work graphs) for RDNA 4 vs RDNA 5 expected With GPU Workgraphs we should expect a significantly larger IPC gain from RDNA 4 -> 5 compared to the gain with EI. It sounds like launch IPC testing def won't be indicative of RDNA 5's full potential.

For Work graphs vs EI does a doubling of RDNA 4 -> RDNA 5 raster IPC gain sound too high?

Edit: Retracted unrealistic claims.
 
Last edited:

soresu

Diamond Member
Dec 19, 2014
3,960
3,402
136
AMD already touted work graphs vs IE = 39% less ms for PCG with 7900XTX and you're tell me it isn't even optimized for performance at HW level. Impressive!
Could just as easily mean that the current sw solution(s) that work graphs are designed to replace are pretty suboptimal, such that even a relatively unoptimised hw µArch can extract a decent perf uptick.

Vulkan is in a constant state of revision as the various hw and sw stakeholders make their bids to extend it, make it easier to use or optimise.

It wouldn't surprise me at all to learn that certain features of the API are essentially just lower level equivalents to what was already in OpenGL or DirectX at the time of Vulkan's inception and hadn't yet been replaced with a more optimal solution.
 

Magras00

Junior Member
Aug 9, 2025
17
40
46
Could just as easily mean that the current sw solution(s) that work graphs are designed to replace are pretty suboptimal, such that even a relatively unoptimised hw µArch can extract a decent perf uptick.

Anything is pretty suboptimal compared to work graphs especially if the underlying workload is either branchy, recursive, and/or data-dependent: SpMV, RTRT and PT, ML, physics simulations, PCG etc...

It wouldn't surprise me at all to learn that certain features of the API are essentially just lower level equivalents to what was already in OpenGL or DirectX at the time of Vulkan's inception and hadn't yet been replaced with a more optimal solution.

No it's not another DX11 -> DX12 shift in code abstraction. Work graphs completely changes how a GPU is programmed and what a GPU even is. Think of it as another GPGPU or Programmable shader moment.

One definition of GPU Work graphs could be: "Autonomous execution and scheduling of dependency-driven task graphs on a GPU with more fine-grained parallelism and dynamic execution."

This sounds nothing like Execute Indirect. Execute Indirect was about adressing the shortcomings of high level APIs not complete reinventing the API as is the case with GPU work graphs.

Work graphs in RDNA 3 vs RDNA 5
Based on my limited understanding GFX13 has aligned the HW scheduling scheme with GPU work graphs by making it as decentral and localized as possible which helps to further harness the benefits of GPU work graphs.
One example is WGP local self launch where the HW matches SW's independent task launch for a node in a work graph.
Another example is the autonomous schedulers and task dispatchers within each Shader Engine instead of relying on the command processor.

RDNA 3: everything is orchestrated through L2 cache and command processor.
RDNA 5: work is scheduled (WGS) and dispatched (ADC) within in each Shader Engine (L1 cache) and when required each WGP (L0 cache) can launch and manage its own work when a node in a work graph launches tasks.
Result: Much lower scheduling latency and more fine grained scheduling and dispatch benefiting most tasks although branchy, recursive, and/or data-dependent tasks should see the biggest benefits.

This probably only scratches the surface and with GFX13 I would expect many more changes benefitting work graphs. The schedulers will prob be upgraded to better handle the complexity of work graphs.

As for SWC that is probably most beneficial to coalescing launches.

On page 223 in the course notes PDF states this "We can specify how many records to the same node should be grouped together at maximum, and how many threads the group that is processing this collection should have."

RDNA introduced Wave-32, wouldn't at all be surprised if RDNA 5 introduces additional wave execution granularity and flexibility since this could greatly benefit work graphs in some cases, but there's still no patent for this.

@Kepler_L2 is this accurate or am I missing some important details?
 

Win2012R2

Golden Member
Dec 5, 2024
1,103
1,146
96
No it's not another DX11 -> DX12 shift in code abstraction. Work graphs completely changes how a GPU is programmed and what a GPU even is. Think of it as another GPGPU or Programmable shader moment.
Is Nvidia fully onboard with this?

I get the feeling AMD is ahead of the curve here just like with Mantle, and unlikely Nvidia like this.
 
  • Like
Reactions: marees

Tup3x

Golden Member
Dec 31, 2016
1,245
1,374
136
Is Nvidia fully onboard with this?

I get the feeling AMD is ahead of the curve here just like with Mantle, and unlikely Nvidia like this.
And why do you feel that way? NVIDIA has supported work graphs for >=Ampere since 551.76. I think they were first with driver support if I remember correctly.
 

marees

Golden Member
Apr 28, 2024
1,389
2,001
96
CDNA 5 launches in H1 2026 — Vamsi Bopanna & Anush Elangovan

With the MI400, slated to launch in early 2026 and purpose-built for large-scale AI training and inference, we are seeing up to 10 times the gain in some applications. That kind of rapid progress is exactly what the agentic AI era demands.”


~Vamsi Bopanna, senior vice president of the Artificial Intelligence Group at AMD

Anush (2 months ago): So we, we want to be able to deliver hardware every year. So we did the MI300, the MI325. Now we have the three 50 series.And, uh, like Lisa mentioned, the 400 series right around the corner that's, you know, less than 12 months

 

Magras00

Junior Member
Aug 9, 2025
17
40
46
Is Nvidia fully onboard with this?

I get the feeling AMD is ahead of the curve here just like with Mantle, and unlikely Nvidia like this.

Yes and they have blogposts about it. Really don't think they have a choice.

Likely another repeat of GCN+Mantle. AMD is building spearheading it and basically building an entire GPU architecture around it with GFX13.

NVIDIA has superior official backwards HW support (Ampere and later). One gen earlier than AMD (RDNA 3 and later).

They were forced to adapt last time (Turing) but didn't mind as they were already building an AI and compute monster. Maybe Rubin will be a repeat of that.
 
  • Like
Reactions: Win2012R2

marees

Golden Member
Apr 28, 2024
1,389
2,001
96
Yes and they have blogposts about it. Really don't think they have a choice.

Likely another repeat of GCN+Mantle. AMD is building spearheading it and basically building an entire GPU architecture around it with GFX13.

NVIDIA has superior official backwards HW support (Ampere and later). One gen earlier than AMD (RDNA 3 and later).

They were forced to adapt last time (Turing) but didn't mind as they were already building an AI and compute monster. Maybe Rubin will be a repeat of that.
Will AMD again delay like RDNA 4 to launch behind Nvidia or take the lead with RDNA 5 ??
 

Kepler_L2

Senior member
Sep 6, 2020
945
3,877
136
Anything is pretty suboptimal compared to work graphs especially if the underlying workload is either branchy, recursive, and/or data-dependent: SpMV, RTRT and PT, ML, physics simulations, PCG etc...



No it's not another DX11 -> DX12 shift in code abstraction. Work graphs completely changes how a GPU is programmed and what a GPU even is. Think of it as another GPGPU or Programmable shader moment.

One definition of GPU Work graphs could be: "Autonomous execution and scheduling of dependency-driven task graphs on a GPU with more fine-grained parallelism and dynamic execution."

This sounds nothing like Execute Indirect. Execute Indirect was about adressing the shortcomings of high level APIs not complete reinventing the API as is the case with GPU work graphs.

Work graphs in RDNA 3 vs RDNA 5
Based on my limited understanding GFX13 has aligned the HW scheduling scheme with GPU work graphs by making it as decentral and localized as possible which helps to further harness the benefits of GPU work graphs.
One example is WGP local self launch where the HW matches SW's independent task launch for a node in a work graph.
Another example is the autonomous schedulers and task dispatchers within each Shader Engine instead of relying on the command processor.

RDNA 3: everything is orchestrated through L2 cache and command processor.
RDNA 5: work is scheduled (WGS) and dispatched (ADC) within in each Shader Engine (L1 cache) and when required each WGP (L0 cache) can launch and manage its own work when a node in a work graph launches tasks.
Result: Much lower scheduling latency and more fine grained scheduling and dispatch benefiting most tasks although branchy, recursive, and/or data-dependent tasks should see the biggest benefits.

This probably only scratches the surface and with GFX13 I would expect many more changes benefitting work graphs. The schedulers will prob be upgraded to better handle the complexity of work graphs.

As for SWC that is probably most beneficial to coalescing launches.

On page 223 in the course notes PDF states this "We can specify how many records to the same node should be grouped together at maximum, and how many threads the group that is processing this collection should have."

RDNA introduced Wave-32, wouldn't at all be surprised if RDNA 5 introduces additional wave execution granularity and flexibility since this could greatly benefit work graphs in some cases, but there's still no patent for this.

@Kepler_L2 is this accurate or am I missing some important details?
Pretty accurate
Is Nvidia fully onboard with this?

I get the feeling AMD is ahead of the curve here just like with Mantle, and unlikely Nvidia like this.
It will probably end up a repeat of Async Compute performance advantage for AMD for a few years
Will AMD again delay like RDNA 4 to launch behind Nvidia or take the lead with RDNA 5 ??
Rubin will launch first
 

maddie

Diamond Member
Jul 18, 2010
5,152
5,539
136
Anything is pretty suboptimal compared to work graphs especially if the underlying workload is either branchy, recursive, and/or data-dependent: SpMV, RTRT and PT, ML, physics simulations, PCG etc...



No it's not another DX11 -> DX12 shift in code abstraction. Work graphs completely changes how a GPU is programmed and what a GPU even is. Think of it as another GPGPU or Programmable shader moment.

One definition of GPU Work graphs could be: "Autonomous execution and scheduling of dependency-driven task graphs on a GPU with more fine-grained parallelism and dynamic execution."

This sounds nothing like Execute Indirect. Execute Indirect was about adressing the shortcomings of high level APIs not complete reinventing the API as is the case with GPU work graphs.

Work graphs in RDNA 3 vs RDNA 5
Based on my limited understanding GFX13 has aligned the HW scheduling scheme with GPU work graphs by making it as decentral and localized as possible which helps to further harness the benefits of GPU work graphs.
One example is WGP local self launch where the HW matches SW's independent task launch for a node in a work graph.
Another example is the autonomous schedulers and task dispatchers within each Shader Engine instead of relying on the command processor.

RDNA 3: everything is orchestrated through L2 cache and command processor.
RDNA 5: work is scheduled (WGS) and dispatched (ADC) within in each Shader Engine (L1 cache) and when required each WGP (L0 cache) can launch and manage its own work when a node in a work graph launches tasks.
Result: Much lower scheduling latency and more fine grained scheduling and dispatch benefiting most tasks although branchy, recursive, and/or data-dependent tasks should see the biggest benefits.

This probably only scratches the surface and with GFX13 I would expect many more changes benefitting work graphs. The schedulers will prob be upgraded to better handle the complexity of work graphs.

As for SWC that is probably most beneficial to coalescing launches.

On page 223 in the course notes PDF states this "We can specify how many records to the same node should be grouped together at maximum, and how many threads the group that is processing this collection should have."

RDNA introduced Wave-32, wouldn't at all be surprised if RDNA 5 introduces additional wave execution granularity and flexibility since this could greatly benefit work graphs in some cases, but there's still no patent for this.

@Kepler_L2 is this accurate or am I missing some important details?
Seems perfect for distributed chiplet designs.
 

Win2012R2

Golden Member
Dec 5, 2024
1,103
1,146
96
Will AMD again delay like RDNA 4 to launch behind Nvidia or take the lead with RDNA 5 ??
Hopefully they have accelerated RDNA 5 as much as possible, MLID keeps saying it's early 2027 but it sounds bull to me.
Rubin will launch first
Has it been confirmed that Rubin will also be in gaming cards? I guess after Blackwell fiasco Nvidia got no choice really, they'd better put out cracking 6000 series or else
 

Kepler_L2

Senior member
Sep 6, 2020
945
3,877
136
Hopefully they have accelerated RDNA 5 as much as possible, MLID keeps saying it's early 2027 but it sounds bull to me.

Has it been confirmed that Rubin will also be in gaming cards? I guess after Blackwell fiasco Nvidia got no choice really, they'd better put out cracking 6000 series or else
It's just a name, Ampere and Blackwell have the same name for datacenter and gaming GPUs but the architectures are completely different.
 

Win2012R2

Golden Member
Dec 5, 2024
1,103
1,146
96
It's just a name, Ampere and Blackwell have the same name for datacenter and gaming GPUs but the architectures are completely different.
Volta was also the name, no?

Either way if Nvidia pushes out 6000 series next year then AMD can't wait till 2027, I hope they'll do it before Nvidia
 

soresu

Diamond Member
Dec 19, 2014
3,960
3,402
136
No it's not another DX11 -> DX12 shift in code abstraction
Not what I meant.

I meant that it wouldn't surprise me if many/most API features in Vulkan v1 were just lower level equivalent implementations of features with the same name in OGL or DX11.

As in literally just blank replication of the same feature set in the higher level gfx APIs without redesigning for modern hardware capabilities or more modern software paradigms.
 
  • Like
Reactions: Magras00

soresu

Diamond Member
Dec 19, 2014
3,960
3,402
136
MLID keeps saying it's early 2027 but it sounds bull to me
It wouldn't be the first time AMD had taken so long between one high end SKU and another.

(assuming we are talking 7900 XTX -> RDNA5 XTX)

Given Kepler's take that RDNA5 represents the biggest change to AMD's GPU µArch since Southern Islands or GCN v1 then it's a significantly bigger step forward vs Vega -> RDNA1 or even Vega -> RDNA2, which will have no small technical debt attached to it on the driver side of things.

Especially if AMD wants to get serious about non CDNA ROCm use and give RDNA5 GPUs first class citizenship in ROCm from day 1, which is a whole extra thing on top of graphics API compliance and optimisation.

Bearing in mind that real time RT/PT added a whole extra chunk of work to do on the gfx side already vs a decade ago.

Even if they eliminated driver level work on OGL + DX7-11 and just left it to Zink/DXVK Vulkan translation it's stil so much work to stand up a new µArch.
 
  • Like
Reactions: Magras00

Magras00

Junior Member
Aug 9, 2025
17
40
46
Not what I meant.

I meant that it wouldn't surprise me if many/most API features in Vulkan v1 were just lower level equivalent implementations of features with the same name in OGL or DX11.

As in literally just blank replication of the same feature set in the higher level gfx APIs without redesigning for modern hardware capabilities or more modern software paradigms.

Sorry for misinterpreting your comment.

Given Kepler's take that RDNA5 represents the biggest change to AMD's GPU µArch since Southern Islands or GCN v1 then it's a significantly bigger step forward vs Vega -> RDNA1 or even Vega -> RDNA2, which will have no small technical debt attached to it on the driver side of things.

Especially if AMD wants to get serious about non CDNA ROCm use and give RDNA5 GPUs first class citizenship in ROCm from day 1, which is a whole extra thing on top of graphics API compliance and optimisation.

Bearing in mind that real time RT/PT added a whole extra chunk of work to do on the gfx side already vs a decade ago.

Even if they eliminated driver level work on OGL + DX7-11 and just left it to Zink/DXVK Vulkan translation it's stil so much work to stand up a new µArch.

AMD better go on a software side hiring spree ASAP. The last thing they want is a repeat of RDNA 1's launch driver debacle.

Was wondering about that greatest leap since GCN claim. How many gens do we skip to find a change equivalent to RDNA 4 -> RDNA 5 @Kepler_L2?

Add 30-50% (guesstimate) to 9060XT = ~4070S level perf baseline for RDNA 5. Should be plenty for older games even with no native support (DXVK). But DXVK needs to be reliable ALL the time, which isn't the case rn.

Assuming RDNA 5 name sticks around that's unlike the previous rebrands (Terascale, GCN, and RDNA). Is UDNA still being used internally @Kepler_L2?
 
Last edited:

Kepler_L2

Senior member
Sep 6, 2020
945
3,877
136
Assuming RDNA 5 name sticks around that's unlike the previous rebrands (Terascale, GCN, and RDNA). Is UDNA still being used internally @Kepler_L2?
These are marketing names and not used internally. They either use gfx number or in some cases MIxxx Navix (and now ATx)