Discussion RDNA 5 / UDNA (CDNA Next) speculation

Joe NYC · 2026-01-10T14:17:50-0500

adroc_thurston said:
That's a 1.5 years later refresh.
How does this impact launch day z6 product binning?

9800x3d was close to the bottom bin and 9850x3d is not. That's what changed.

I don't think the clock speeds improved by 400 MHz so that 9850x3d would still be the bottom bin.

adroc_thurston · 2026-01-10T14:19:17-0500

Joe NYC said:
9800x3d was close to the bottom bin and 9850x3d is not. That's what changed.

that's normal for every refresh ever.

Joe NYC said:
I don't think the clock speeds improved by 400 MHz so that 9850x3d would still be the bottom bin.

how does this relate to anything Zen6?
Different node, different package, different power delivery.

Joe NYC · 2026-01-10T14:25:26-0500

adroc_thurston said:
how does this relate to anything Zen6?
Different node, different package, different power delivery.

The way it relates to Zen 6 is that prior to launch of Zen 6, AMD changed its approach to selling the main chip for taming.

The old approach was relegating bottom, say 40% of the bins to 9800x3d to now use top 40% of the binned chips to the top gaming chip. Because these gaming chips turned out to be such a good business for AMD (compared to the rest of the desktop line up).

So, Zen 6 will launch after this change has been put in place.

adroc_thurston · 2026-01-10T14:36:50-0500

Joe NYC said:
The way it relates to Zen 6 is that prior to launch of Zen 6, AMD changed its approach to selling the main chip for taming.

no lol.

Joe NYC said:
The old approach was relegating bottom, say 40% of the bins to 9800x3d to now use top 40% of the binned chips to the top gaming chip

no lol.

Joe NYC said:
Because these gaming chips turned out to be such a good business for AMD (compared to the rest of the desktop line up).

Yeah but you can sell gutter trash there and they'll buy it still.

Joe NYC said:
So, Zen 6 will launch after this change has been put in place.

No.
If you're desperate for good CCD Si, buy an SP8 Venice. Or MDS1 hi. Or higher Gator Range SKUs.

itsmydamnation · 2026-01-10T14:40:50-0500

I assume desktop would want the leaky chips. I would also assume most other product ranges would want the not leaky chips.

reaperrr3 · 2026-01-11T00:13:04-0500

Joe NYC said:
9800x3d was close to the bottom bin

What? Where's that coming from?

X3D parts need massively lower power to stay cool enough even under meh-ish cooling solutions, and iirc L3/VCache runs at the clock of the fastest core, so they limited PT and turbo clocks vs. the vanilla models.
Pretty sure that in terms of low leakage, the 5800X3D, 7800X3D and 9800X3D are better bins than most 5700X, 7700X and 9700X, respectively.

The 9850X3D has to be a better bin of course; it can probably just hit higher clocks at the same voltages, but like adroc said, they had 1.5 years of yield improvements and to accumulate top-bin chips for this SKU.

marees · 2026-01-11T03:26:53-0500

MrMPFR said:
Inclusion of Dedicated Accelerators in Graph Nodes

From <https://patents.google.com/patent/US20240202003A1>

MrMPFR said:
#4: Fixed-function units in Work Graphs and Task Graph on steroids

is it pointing to the right patent ?

why does it state the below inside

Abstract

Systems, apparatuses, and methods for implementing a hierarchical scheduling in fixed-function graphics pipeline are disclosed.

MrMPFR · 2026-01-11T05:47:07-0500

marees said:
is it pointing to the right patent ?

why does it state the below inside

Abstract
Systems, apparatuses, and methods for implementing a hierarchical scheduling in fixed-function graphics pipeline are disclosed.

Yes.
Well there's a lot of overlap between these patents, so I didn't want to post the same info multiple times for every patent including that functionality.

While each patent describes and/or includes new functionality, they could all easily be summarized as follows: Work Graph-centric scheduling HW accomodation taken to the limit
Not surprising when Chajdas the technical lead for the entire Work Graphs effort at AMD is listed under all the patents. It looks like AMD have gone back to the drawing table and rebuild the entire architecture around the Work Graphs idea. This is similar to GCN and Mantle, although changes are more significant this time to accomodate the big API paradigm shift.

While benefits of new scheduling extends to all workloads, I'm not sure if #3-6 can be enabled through compiler or needs API change.

marees · 2026-01-11T05:55:59-0500

MrMPFR said:
Yes.
Well there's a lot of overlap between these patents, so I didn't want to post the same info multiple times for every patent including that functionality.

While each patent describes and/or includes new functionality, they could all easily be summarized as follows: Work Graph-centric scheduling HW accomodation taken to the limit
Not surprising when Chajdas the technical lead for the entire Work Graphs effort at AMD is listed under all the patents. It looks like AMD have gone back to the drawing table and rebuild the entire architecture around the Work Graphs idea. This is similar to GCN and Mantle, although changes are more significant this time to accomodate the big API paradigm shift.

While benefits of new scheduling extends to all workloads, I'm not sure if #3-6 can be enabled through compiler or needs API change.

does all this benefit RDNA 5 (& DirectX 13 / xbox next etc.)

MrMPFR · 2026-01-11T06:54:24-0500

marees said:
does all this benefit RDNA 5 (& DirectX 13 / xbox next etc.)

If the patents (#1-6) are part of RDNA 5 then sure helluva lot. Scheduling is tailored to Programmable Shaders 2.0. Maybe @Kepler_L2 can confirm if any will be missing?

The potential speedup is massive especially with rewritten game engines (GCN Mantle fine wine on steroids). That's if we only focus on the six patents I listed in earlier post, ignoring all the potential data management, cachemem, dispatch, and execution related changes from patent filings. Add those on top and the gains are probably even greater.

This quote from the WGP Local launcher patent illustrates it quite well:
"In some implementations, the workgroup processor local launch mechanism provides an order-of-magnitude improvement to thread launch performance, allowing finer-grained dispatches, local consumption of data within a compute unit, and much improved performance in highly variable workloads. For example, the local launch mechanism improves the performance of application program interfaces that utilize work graphs by allowing a workgroup processor to self-schedule work without needing to submit a request to a work scheduling mechanism such as a command processor. Additionally, enabling resources to be allocated by either the local launcher or the SPI allows for better distribution of workloads that use both at the same time, such as graphics functions running concurrently with compute functions."

Also dug up some old responses from Kepler related to patents (see below). Seems like #6 (WGP local launcher) is confirmed to work through compiler and I would guess #1-3 (WGS related) are hardcoded into design, but not sure if that needs compiler as well. Assume it's very likely that some aspects of #5 (smarter graphics pipeline) could be enabled via compiler.

Clarifying API change isn't mandatory:

Kepler_L2 said:
Yes but it goes beyond that. WorkGraphs are used by developers but the WGP Local Scheduler can be used by the shader compiler without developer intervention.

Work Graphs RDNA3 gen support:

Kepler_L2 said:
Work Graphs is only supported on RDNA3 and later, and is really only optimized for performance in RDNA5 and later.

Meanwhile on Linux it can be emulated through VKD3D-Proton:

MrMPFR said:
Didn't know emulating Work Graphs was even possible. Compute shaders FTW!

Assessing my old RDNA3 vs RDNA5 Work Graphs description + GCN fine wine 2.0:

Kepler_L2 said:
Pretty accurate

It will probably end up a repeat of Async Compute performance advantage for AMD for a few years

Can recommend exchange from early to mid August last year (begins at page 33). Quite a lot of confirmations by Kepler regarding RDNA5.

IF they've implemented some limited form of true dataflow execution like in this patent, then the perf and efficiency gains should extend even further, but have no idea if that's part of RDNA5.

adroc_thurston · 2026-01-11T07:24:33-0500

MrMPFR said:
especially with rewritten game engines (GCN Mantle fine wine on steroids)

ah no one's doing that man.
All of that patentware is mostly irrelevant.

marees · 2026-01-11T08:22:27-0500

MrMPFR said:
If the patents (#1-6) are part of RDNA 5 then sure helluva lot. Scheduling is tailored to Programmable Shaders 2.0. Maybe @Kepler_L2 can confirm if any will be missing?

AMD blog on this — for those who want text version

GDC 2024 Work graphs and draw calls – a match made in heaven! - AMD GPUOpen

Introducing "mesh nodes", which make draw calls an integral part of the work graph, providing a higher perf alternative to ExecuteIndirect dispatches.

gpuopen.com

& follow up demo paper/pdf here

https://twitter.com/x/status/1813318845769097417

extracts from blog & paper (as screenshots)

MrMPFR · 2026-01-11T08:27:39-0500

marees said:
AMD blog on this — for those who want text version

GDC 2024 Work graphs and draw calls – a match made in heaven! - AMD GPUOpen

Introducing "mesh nodes", which make draw calls an integral part of the work graph, providing a higher perf alternative to ExecuteIndirect dispatches.

gpuopen.com

& follow up demo paper/pdf here

https://twitter.com/x/status/1813318845769097417

extracts from blog & paper (as screenshots)
View attachment 136538

View attachment 136541

View attachment 136540

That's the boring paper for mesh nodes launch.

At HPG 2025 they went full on crackpot with GPU tree gen: https://gpuopen.com/download/Real-Time_GPU_Tree_Generation.pdf

marees · 2026-01-11T08:39:48-0500

MrMPFR said:
That's the boring paper for mesh nodes launch.

At HPG 2025 they went full on crackpot with GPU tree gen: https://gpuopen.com/download/Real-Time_GPU_Tree_Generation.pdf

AMD's solution to the VRAM problem & memory price issues ??

https://twitter.com/x/status/1937338942157898113

MrMPFR · 2026-01-11T09:58:41-0500

marees said:
AMD's solution to the VRAM problem & memory price issues ??

https://twitter.com/x/status/1937338942157898113

Depends on the game. For AAA and AA procedural stuff is too unreliable. Limited to scratchpad.
Neural compression and smarter geometry compression schemes for pre-generated assets, BVH reduction, and shader code etc... .

This is too far out to matter rn. By the time this has wide or even selective adoption RAM situation should've been resolved long time ago. Think of it as a VRAM multiplier in PS6 post-crossgen era.

That's extremely misleading. Footprint for generation code has nothing to do with Work Graph itself, but that's no good for a stupid clickbait title. 55kB vs 1.5GB. They're off by orders of magnitude.

Search

Discussion RDNA 5 / UDNA (CDNA Next) speculation

Joe NYC

Diamond Member

adroc_thurston

Diamond Member

Joe NYC

Diamond Member

adroc_thurston

Diamond Member

itsmydamnation

Diamond Member

reaperrr3

Member

marees

Platinum Member