Discussion RDNA 5 / UDNA (CDNA Next) speculation

Kepler_L2 · Aug 15, 2025

Win2012R2 said:
They can't run on base PS5?

Work Graphs is only supported on RDNA3 and later, and is really only optimized for performance in RDNA5 and later.

Win2012R2 · Aug 15, 2025

Kepler_L2 said:
Work Graphs is only supported on RDNA3 and later, and is really only optimized for performance in RDNA5 and later.

Ah, that's a pity, but at least this looks like nice exciting stuff for next gen - far more promising than mesh stuff and RTRT obviously limited by hardware

MrMPFR · Aug 15, 2025

Win2012R2 said:
Seems like work graphs got a lot more potential benefits

100%. It's a bigger deal than DX12 and Vulkan but it's a new paradigm that requires re-educating engine SWE and complete rewrite of engine components. Will take a while but everyone should switch when crossgen is done. It's just much easier to work with than Execute Indirect (EI). GDC 2024 video explains the benefits:

Kepler_L2 said:
Work Graphs is only supported on RDNA3 and later, and is really only optimized for performance in RDNA5 and later.

~~Still no PS5 Pro support?~~ Stupid question of course it doesn't. RDNA 2 binaries. Custom ML and RT only things beyond RDNA 2.

@Kepler_L2 are these HW optimizations in GFX13 for Work graphs?

SWC (reordering)
WGS (Shader Engine level)
Local launchers (WGP level)?
Anything else I missed?

AMD already touted work graphs vs IE = 39% less ms for ~~PCG~~ Procedural content generation with 7900XTX and you're tell me it isn't even properly optimized for performance at HW level. Impressive!
~~Large gap in IPC gain (EI vs Work graphs) for RDNA 4 vs RDNA 5 expected~~ With GPU Workgraphs we should expect a significantly larger IPC gain from RDNA 4 -> 5 compared to the gain with EI. It sounds like launch IPC testing def won't be indicative of RDNA 5's full potential.

~~For Work graphs vs EI does a doubling of RDNA 4 -> RDNA 5 raster IPC gain sound too high?~~

Edit: Retracted unrealistic claims.

soresu · Aug 15, 2025

Magras00 said:
AMD already touted work graphs vs IE = 39% less ms for PCG with 7900XTX and you're tell me it isn't even optimized for performance at HW level. Impressive!

Could just as easily mean that the current sw solution(s) that work graphs are designed to replace are pretty suboptimal, such that even a relatively unoptimised hw µArch can extract a decent perf uptick.

Vulkan is in a constant state of revision as the various hw and sw stakeholders make their bids to extend it, make it easier to use or optimise.

It wouldn't surprise me at all to learn that certain features of the API are essentially just lower level equivalents to what was already in OpenGL or DirectX at the time of Vulkan's inception and hadn't yet been replaced with a more optimal solution.

MrMPFR · Aug 16, 2025

soresu said:
Could just as easily mean that the current sw solution(s) that work graphs are designed to replace are pretty suboptimal, such that even a relatively unoptimised hw µArch can extract a decent perf uptick.

Anything is pretty suboptimal compared to work graphs especially if the underlying workload is either branchy, recursive, and/or data-dependent: SpMV, RTRT and PT, ML, physics simulations, PCG etc...

soresu said:
It wouldn't surprise me at all to learn that certain features of the API are essentially just lower level equivalents to what was already in OpenGL or DirectX at the time of Vulkan's inception and hadn't yet been replaced with a more optimal solution.

No it's not another DX11 -> DX12 shift in code abstraction. Work graphs completely changes how a GPU is programmed and what a GPU even is. Think of it as another GPGPU or Programmable shader moment.

One definition of GPU Work graphs could be: "Autonomous execution and scheduling of dependency-driven task graphs on a GPU with more fine-grained parallelism and dynamic execution."

This sounds nothing like Execute Indirect. Execute Indirect was about adressing the shortcomings of high level APIs not complete reinventing the API as is the case with GPU work graphs.

Work graphs in RDNA 3 vs RDNA 5
Based on my limited understanding GFX13 has aligned the HW scheduling scheme with GPU work graphs by making it as decentral and localized as possible which helps to further harness the benefits of GPU work graphs.
One example is WGP local self launch where the HW matches SW's independent task launch for a node in a work graph.
Another example is the autonomous schedulers and task dispatchers within each Shader Engine instead of relying on the command processor.

RDNA 3: everything is orchestrated through L2 cache and command processor.
RDNA 5: work is scheduled (WGS) and dispatched (ADC) within in each Shader Engine (L1 cache) and when required each WGP (L0 cache) can launch and manage its own work when a node in a work graph launches tasks.
Result: Much lower scheduling latency and more fine grained scheduling and dispatch benefiting most tasks although branchy, recursive, and/or data-dependent tasks should see the biggest benefits.

This probably only scratches the surface and with GFX13 I would expect many more changes benefitting work graphs. The schedulers will prob be upgraded to better handle the complexity of work graphs.

As for SWC that is probably most beneficial to coalescing launches.

On page 223 in the course notes PDF states this "We can specify how many records to the same node should be grouped together at maximum, and how many threads the group that is processing this collection should have."

RDNA introduced Wave-32, wouldn't at all be surprised if RDNA 5 introduces additional wave execution granularity and flexibility since this could greatly benefit work graphs in some cases, but there's still no patent for this.

@Kepler_L2 is this accurate or am I missing some important details?

Win2012R2 · Aug 16, 2025

Magras00 said:
No it's not another DX11 -> DX12 shift in code abstraction. Work graphs completely changes how a GPU is programmed and what a GPU even is. Think of it as another GPGPU or Programmable shader moment.

Is Nvidia fully onboard with this?

I get the feeling AMD is ahead of the curve here just like with Mantle, and unlikely Nvidia like this.

Tup3x · Aug 16, 2025

Win2012R2 said:
Is Nvidia fully onboard with this?

I get the feeling AMD is ahead of the curve here just like with Mantle, and unlikely Nvidia like this.

And why do you feel that way? NVIDIA has supported work graphs for >=Ampere since 551.76. I think they were first with driver support if I remember correctly.

marees · Aug 16, 2025

CDNA 5 launches in H1 2026 — Vamsi Bopanna & Anush Elangovan

“With the MI400, slated to launch in early 2026 and purpose-built for large-scale AI training and inference, we are seeing up to 10 times the gain in some applications. That kind of rapid progress is exactly what the agentic AI era demands.”

https://www.reddit.com/r/AMD_Stock/comments/1mp3awv/according_to_an_amd_svp_mi400_will_launch_in

~Vamsi Bopanna, senior vice president of the Artificial Intelligence Group at AMD

Anush (2 months ago): So we, we want to be able to deliver hardware every year. So we did the MI300, the MI325. Now we have the three 50 series.And, uh, like Lisa mentioned, the 400 series right around the corner that's, you know, less than 12 months

https://www.reddit.com/r/AMD_Stock/comments/1lfzty7/amds_vision_for_an_open_ecosystem_anush_elangovan

MrMPFR · Aug 16, 2025

Win2012R2 said:
Is Nvidia fully onboard with this?

I get the feeling AMD is ahead of the curve here just like with Mantle, and unlikely Nvidia like this.

Yes and they have blogposts about it. Really don't think they have a choice.

Likely another repeat of GCN+Mantle. AMD is building spearheading it and basically building an entire GPU architecture around it with GFX13.

NVIDIA has superior official backwards HW support (Ampere and later). One gen earlier than AMD (RDNA 3 and later).

They were forced to adapt last time (Turing) but didn't mind as they were already building an AI and compute monster. Maybe Rubin will be a repeat of that.

marees · Aug 16, 2025

Magras00 said:
Yes and they have blogposts about it. Really don't think they have a choice.

Likely another repeat of GCN+Mantle. AMD is building spearheading it and basically building an entire GPU architecture around it with GFX13.

NVIDIA has superior official backwards HW support (Ampere and later). One gen earlier than AMD (RDNA 3 and later).

They were forced to adapt last time (Turing) but didn't mind as they were already building an AI and compute monster. Maybe Rubin will be a repeat of that.

Will AMD again delay like RDNA 4 to launch behind Nvidia or take the lead with RDNA 5 ??

Kepler_L2 · Aug 16, 2025

Magras00 said:
Anything is pretty suboptimal compared to work graphs especially if the underlying workload is either branchy, recursive, and/or data-dependent: SpMV, RTRT and PT, ML, physics simulations, PCG etc...

No it's not another DX11 -> DX12 shift in code abstraction. Work graphs completely changes how a GPU is programmed and what a GPU even is. Think of it as another GPGPU or Programmable shader moment.

One definition of GPU Work graphs could be: "Autonomous execution and scheduling of dependency-driven task graphs on a GPU with more fine-grained parallelism and dynamic execution."

This sounds nothing like Execute Indirect. Execute Indirect was about adressing the shortcomings of high level APIs not complete reinventing the API as is the case with GPU work graphs.

Work graphs in RDNA 3 vs RDNA 5
Based on my limited understanding GFX13 has aligned the HW scheduling scheme with GPU work graphs by making it as decentral and localized as possible which helps to further harness the benefits of GPU work graphs.
One example is WGP local self launch where the HW matches SW's independent task launch for a node in a work graph.
Another example is the autonomous schedulers and task dispatchers within each Shader Engine instead of relying on the command processor.

RDNA 3: everything is orchestrated through L2 cache and command processor.
RDNA 5: work is scheduled (WGS) and dispatched (ADC) within in each Shader Engine (L1 cache) and when required each WGP (L0 cache) can launch and manage its own work when a node in a work graph launches tasks.
Result: Much lower scheduling latency and more fine grained scheduling and dispatch benefiting most tasks although branchy, recursive, and/or data-dependent tasks should see the biggest benefits.

This probably only scratches the surface and with GFX13 I would expect many more changes benefitting work graphs. The schedulers will prob be upgraded to better handle the complexity of work graphs.

As for SWC that is probably most beneficial to coalescing launches.

On page 223 in the course notes PDF states this "We can specify how many records to the same node should be grouped together at maximum, and how many threads the group that is processing this collection should have."

RDNA introduced Wave-32, wouldn't at all be surprised if RDNA 5 introduces additional wave execution granularity and flexibility since this could greatly benefit work graphs in some cases, but there's still no patent for this.

@Kepler_L2 is this accurate or am I missing some important details?

Pretty accurate

Win2012R2 said:
Is Nvidia fully onboard with this?

I get the feeling AMD is ahead of the curve here just like with Mantle, and unlikely Nvidia like this.

It will probably end up a repeat of Async Compute performance advantage for AMD for a few years

marees said:
Will AMD again delay like RDNA 4 to launch behind Nvidia or take the lead with RDNA 5 ??

Rubin will launch first

maddie · Aug 16, 2025

Magras00 said:
Anything is pretty suboptimal compared to work graphs especially if the underlying workload is either branchy, recursive, and/or data-dependent: SpMV, RTRT and PT, ML, physics simulations, PCG etc...

No it's not another DX11 -> DX12 shift in code abstraction. Work graphs completely changes how a GPU is programmed and what a GPU even is. Think of it as another GPGPU or Programmable shader moment.

One definition of GPU Work graphs could be: "Autonomous execution and scheduling of dependency-driven task graphs on a GPU with more fine-grained parallelism and dynamic execution."

This sounds nothing like Execute Indirect. Execute Indirect was about adressing the shortcomings of high level APIs not complete reinventing the API as is the case with GPU work graphs.

Work graphs in RDNA 3 vs RDNA 5
Based on my limited understanding GFX13 has aligned the HW scheduling scheme with GPU work graphs by making it as decentral and localized as possible which helps to further harness the benefits of GPU work graphs.
One example is WGP local self launch where the HW matches SW's independent task launch for a node in a work graph.
Another example is the autonomous schedulers and task dispatchers within each Shader Engine instead of relying on the command processor.

RDNA 3: everything is orchestrated through L2 cache and command processor.
RDNA 5: work is scheduled (WGS) and dispatched (ADC) within in each Shader Engine (L1 cache) and when required each WGP (L0 cache) can launch and manage its own work when a node in a work graph launches tasks.
Result: Much lower scheduling latency and more fine grained scheduling and dispatch benefiting most tasks although branchy, recursive, and/or data-dependent tasks should see the biggest benefits.

This probably only scratches the surface and with GFX13 I would expect many more changes benefitting work graphs. The schedulers will prob be upgraded to better handle the complexity of work graphs.

As for SWC that is probably most beneficial to coalescing launches.

On page 223 in the course notes PDF states this "We can specify how many records to the same node should be grouped together at maximum, and how many threads the group that is processing this collection should have."

RDNA introduced Wave-32, wouldn't at all be surprised if RDNA 5 introduces additional wave execution granularity and flexibility since this could greatly benefit work graphs in some cases, but there's still no patent for this.

@Kepler_L2 is this accurate or am I missing some important details?

Seems perfect for distributed chiplet designs.

Win2012R2 · Aug 16, 2025

Tup3x said:
And why do you feel that way?

Feels like really exciting stuff and it isn't "ai", that's why!

Win2012R2 · Aug 16, 2025

marees said:
Will AMD again delay like RDNA 4 to launch behind Nvidia or take the lead with RDNA 5 ??

Hopefully they have accelerated RDNA 5 as much as possible, MLID keeps saying it's early 2027 but it sounds bull to me.

Kepler_L2 said:
Rubin will launch first

Has it been confirmed that Rubin will also be in gaming cards? I guess after Blackwell fiasco Nvidia got no choice really, they'd better put out cracking 6000 series or else

Kepler_L2 · Aug 16, 2025

Win2012R2 said:
Hopefully they have accelerated RDNA 5 as much as possible, MLID keeps saying it's early 2027 but it sounds bull to me.

Has it been confirmed that Rubin will also be in gaming cards? I guess after Blackwell fiasco Nvidia got no choice really, they'd better put out cracking 6000 series or else

It's just a name, Ampere and Blackwell have the same name for datacenter and gaming GPUs but the architectures are completely different.

Win2012R2 · Aug 16, 2025

Kepler_L2 said:
It's just a name, Ampere and Blackwell have the same name for datacenter and gaming GPUs but the architectures are completely different.

Volta was also the name, no?

Either way if Nvidia pushes out 6000 series next year then AMD can't wait till 2027, I hope they'll do it before Nvidia

soresu · Aug 16, 2025

Magras00 said:
No it's not another DX11 -> DX12 shift in code abstraction

Not what I meant.

I meant that it wouldn't surprise me if many/most API features in Vulkan v1 were just lower level equivalent implementations of features with the same name in OGL or DX11.

As in literally just blank replication of the same feature set in the higher level gfx APIs without redesigning for modern hardware capabilities or more modern software paradigms.

soresu · Aug 16, 2025

Win2012R2 said:
MLID keeps saying it's early 2027 but it sounds bull to me

It wouldn't be the first time AMD had taken so long between one high end SKU and another.

(assuming we are talking 7900 XTX -> RDNA5 XTX)

Given Kepler's take that RDNA5 represents the biggest change to AMD's GPU µArch since Southern Islands or GCN v1 then it's a significantly bigger step forward vs Vega -> RDNA1 or even Vega -> RDNA2, which will have no small technical debt attached to it on the driver side of things.

Especially if AMD wants to get serious about non CDNA ROCm use and give RDNA5 GPUs first class citizenship in ROCm from day 1, which is a whole extra thing on top of graphics API compliance and optimisation.

Bearing in mind that real time RT/PT added a whole extra chunk of work to do on the gfx side already vs a decade ago.

Even if they eliminated driver level work on OGL + DX7-11 and just left it to Zink/DXVK Vulkan translation it's stil so much work to stand up a new µArch.

MrMPFR · Aug 16, 2025

maddie said:
Seems perfect for distributed chiplet designs.

Maybe the goal with GFX14? Here's the chiplet-GPU patent if you're interested:

CONFIGURABLE MULTIPLE-DIE GRAPHICS PROCESSING UNIT
- From <https://www.patents-review.com/a/20240193844-configurable-multiple-die-graphics-processing-unit.html>

MrMPFR · Aug 16, 2025

soresu said:
Not what I meant.

I meant that it wouldn't surprise me if many/most API features in Vulkan v1 were just lower level equivalent implementations of features with the same name in OGL or DX11.

As in literally just blank replication of the same feature set in the higher level gfx APIs without redesigning for modern hardware capabilities or more modern software paradigms.

Sorry for misinterpreting your comment.

soresu said:
Given Kepler's take that RDNA5 represents the biggest change to AMD's GPU µArch since Southern Islands or GCN v1 then it's a significantly bigger step forward vs Vega -> RDNA1 or even Vega -> RDNA2, which will have no small technical debt attached to it on the driver side of things.

Especially if AMD wants to get serious about non CDNA ROCm use and give RDNA5 GPUs first class citizenship in ROCm from day 1, which is a whole extra thing on top of graphics API compliance and optimisation.

Bearing in mind that real time RT/PT added a whole extra chunk of work to do on the gfx side already vs a decade ago.

Even if they eliminated driver level work on OGL + DX7-11 and just left it to Zink/DXVK Vulkan translation it's stil so much work to stand up a new µArch.

AMD better go on a software side hiring spree ASAP. The last thing they want is a repeat of RDNA 1's launch driver debacle.

Was wondering about that greatest leap since GCN claim. How many gens do we skip to find a change equivalent to RDNA 4 -> RDNA 5 @Kepler_L2?

Add 30-50% (guesstimate) to 9060XT = ~4070S level perf baseline for RDNA 5. Should be plenty for older games even with no native support (DXVK). But DXVK needs to be reliable ALL the time, which isn't the case rn.

Assuming RDNA 5 name sticks around that's unlike the previous rebrands (Terascale, GCN, and RDNA). ~~Is UDNA still being used internally @Kepler_L2?~~

Kepler_L2 · Aug 16, 2025

Magras00 said:
Assuming RDNA 5 name sticks around that's unlike the previous rebrands (Terascale, GCN, and RDNA). Is UDNA still being used internally @Kepler_L2?

These are marketing names and not used internally. They either use gfx number or in some cases MIxxx Navix (and now ATx)

adroc_thurston · Aug 16, 2025

Magras00 said:
How many gens do we skip to find a change equivalent to RDNA 4 -> RDNA 5 @Kepler_L2?

gfx9 to gfx10 lmao

SolidQ · Aug 16, 2025

Kepler_L2 said:
Rubin will launch first

it's same time interval, like first Ampere, and then RDNA2 or more gap?

MrMPFR · Aug 17, 2025

adroc_thurston said:
gfx9 to gfx10 lmao

No

@Kepler_L2 claim verbatim: "Well yeah they are changing everything, gfx13 is the biggest architectural overhaul since GCN."

GCN to RDNA ≠ Terascale to GCN

We can only speculate as to how many gens will be required to skip to match the architectural shift, but GFX13 is not another Vega -> Navi moment.

Win2012R2 · Aug 17, 2025

If UDNA is unified RDNA5 and CDNA4, then given that MI400 is officially said to be out in H1 2026, then surely consumer RDNA5 should also be out by end of that year at least?

Discussion RDNA 5 / UDNA (CDNA Next) speculation

Golden Member

Golden Member

Member

Diamond Member

Member

Golden Member

Golden Member

Platinum Member

Member

Platinum Member

Golden Member

Diamond Member

Golden Member

Golden Member

Golden Member

Golden Member

Diamond Member

Diamond Member

Member

Member

Golden Member

Diamond Member

Golden Member

Member

Golden Member