Discussion RDNA 5 / UDNA (CDNA Next) speculation

Page 87 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Joe NYC

Diamond Member
Jun 26, 2021
4,081
5,619
136
how does this relate to anything Zen6?
Different node, different package, different power delivery.

The way it relates to Zen 6 is that prior to launch of Zen 6, AMD changed its approach to selling the main chip for taming.

The old approach was relegating bottom, say 40% of the bins to 9800x3d to now use top 40% of the binned chips to the top gaming chip. Because these gaming chips turned out to be such a good business for AMD (compared to the rest of the desktop line up).

So, Zen 6 will launch after this change has been put in place.
 

adroc_thurston

Diamond Member
Jul 2, 2023
8,144
10,893
106
The way it relates to Zen 6 is that prior to launch of Zen 6, AMD changed its approach to selling the main chip for taming.
no lol.
The old approach was relegating bottom, say 40% of the bins to 9800x3d to now use top 40% of the binned chips to the top gaming chip
no lol.
Because these gaming chips turned out to be such a good business for AMD (compared to the rest of the desktop line up).
Yeah but you can sell gutter trash there and they'll buy it still.
So, Zen 6 will launch after this change has been put in place.
No.
If you're desperate for good CCD Si, buy an SP8 Venice. Or MDS1 hi. Or higher Gator Range SKUs.
 

reaperrr3

Member
May 31, 2024
163
477
96
9800x3d was close to the bottom bin
What? Where's that coming from?

X3D parts need massively lower power to stay cool enough even under meh-ish cooling solutions, and iirc L3/VCache runs at the clock of the fastest core, so they limited PT and turbo clocks vs. the vanilla models.
Pretty sure that in terms of low leakage, the 5800X3D, 7800X3D and 9800X3D are better bins than most 5700X, 7700X and 9700X, respectively.

The 9850X3D has to be a better bin of course; it can probably just hit higher clocks at the same voltages, but like adroc said, they had 1.5 years of yield improvements and to accumulate top-bin chips for this SKU.
 

marees

Platinum Member
Apr 28, 2024
2,161
2,778
96

MrMPFR

Member
Aug 9, 2025
178
362
96
is it pointing to the right patent ?

why does it state the below inside

Abstract​

Systems, apparatuses, and methods for implementing a hierarchical scheduling in fixed-function graphics pipeline are disclosed.​
Yes.
Well there's a lot of overlap between these patents, so I didn't want to post the same info multiple times for every patent including that functionality.

While each patent describes and/or includes new functionality, they could all easily be summarized as follows: Work Graph-centric scheduling HW accomodation taken to the limit
Not surprising when Chajdas the technical lead for the entire Work Graphs effort at AMD is listed under all the patents. It looks like AMD have gone back to the drawing table and rebuild the entire architecture around the Work Graphs idea. This is similar to GCN and Mantle, although changes are more significant this time to accomodate the big API paradigm shift.

While benefits of new scheduling extends to all workloads, I'm not sure if #3-6 can be enabled through compiler or needs API change.
 
  • Like
Reactions: marees

marees

Platinum Member
Apr 28, 2024
2,161
2,778
96
Yes.
Well there's a lot of overlap between these patents, so I didn't want to post the same info multiple times for every patent including that functionality.

While each patent describes and/or includes new functionality, they could all easily be summarized as follows: Work Graph-centric scheduling HW accomodation taken to the limit
Not surprising when Chajdas the technical lead for the entire Work Graphs effort at AMD is listed under all the patents. It looks like AMD have gone back to the drawing table and rebuild the entire architecture around the Work Graphs idea. This is similar to GCN and Mantle, although changes are more significant this time to accomodate the big API paradigm shift.

While benefits of new scheduling extends to all workloads, I'm not sure if #3-6 can be enabled through compiler or needs API change.
does all this benefit RDNA 5 (& DirectX 13 / xbox next etc.)
 

MrMPFR

Member
Aug 9, 2025
178
362
96
does all this benefit RDNA 5 (& DirectX 13 / xbox next etc.)
If the patents (#1-6) are part of RDNA 5 then sure helluva lot. Scheduling is tailored to Programmable Shaders 2.0. Maybe @Kepler_L2 can confirm if any will be missing?

The potential speedup is massive especially with rewritten game engines (GCN Mantle fine wine on steroids). That's if we only focus on the six patents I listed in earlier post, ignoring all the potential data management, cachemem, dispatch, and execution related changes from patent filings. Add those on top and the gains are probably even greater.

This quote from the WGP Local launcher patent illustrates it quite well:
"In some implementations, the workgroup processor local launch mechanism provides an order-of-magnitude improvement to thread launch performance, allowing finer-grained dispatches, local consumption of data within a compute unit, and much improved performance in highly variable workloads. For example, the local launch mechanism improves the performance of application program interfaces that utilize work graphs by allowing a workgroup processor to self-schedule work without needing to submit a request to a work scheduling mechanism such as a command processor. Additionally, enabling resources to be allocated by either the local launcher or the SPI allows for better distribution of workloads that use both at the same time, such as graphics functions running concurrently with compute functions."


Also dug up some old responses from Kepler related to patents (see below). Seems like #6 (WGP local launcher) is confirmed to work through compiler and I would guess #1-3 (WGS related) are hardcoded into design, but not sure if that needs compiler as well. Assume it's very likely that some aspects of #5 (smarter graphics pipeline) could be enabled via compiler.

Clarifying API change isn't mandatory:
Yes but it goes beyond that. WorkGraphs are used by developers but the WGP Local Scheduler can be used by the shader compiler without developer intervention.

Work Graphs RDNA3 gen support:
Work Graphs is only supported on RDNA3 and later, and is really only optimized for performance in RDNA5 and later.
Meanwhile on Linux it can be emulated through VKD3D-Proton:

Didn't know emulating Work Graphs was even possible. Compute shaders FTW!

Assessing my old RDNA3 vs RDNA5 Work Graphs description + GCN fine wine 2.0:
Pretty accurate

It will probably end up a repeat of Async Compute performance advantage for AMD for a few years

Can recommend exchange from early to mid August last year (begins at page 33). Quite a lot of confirmations by Kepler regarding RDNA5.

IF they've implemented some limited form of true dataflow execution like in this patent, then the perf and efficiency gains should extend even further, but have no idea if that's part of RDNA5.
 
  • Like
Reactions: marees

marees

Platinum Member
Apr 28, 2024
2,161
2,778
96
If the patents (#1-6) are part of RDNA 5 then sure helluva lot. Scheduling is tailored to Programmable Shaders 2.0. Maybe @Kepler_L2 can confirm if any will be missing?
AMD blog on this — for those who want text version


& follow up demo paper/pdf here


extracts from blog & paper (as screenshots)
Screenshot_20260111_183643_Opera.jpg

Screenshot_20260111_182555_Drive.jpg


Screenshot_20260111_182755_Drive.jpg
 

Attachments

  • Screenshot_20260111_183643_Opera.jpg
    Screenshot_20260111_183643_Opera.jpg
    929.4 KB · Views: 0

MrMPFR

Member
Aug 9, 2025
178
362
96
AMD blog on this — for those who want text version


& follow up demo paper/pdf here


extracts from blog & paper (as screenshots)
View attachment 136538

View attachment 136541


View attachment 136540
That's the boring paper for mesh nodes launch.

At HPG 2025 they went full on crackpot with GPU tree gen: https://gpuopen.com/download/Real-Time_GPU_Tree_Generation.pdf
 
  • Like
Reactions: marees

MrMPFR

Member
Aug 9, 2025
178
362
96
AMD's solution to the VRAM problem & memory price issues ??

Depends on the game. For AAA and AA procedural stuff is too unreliable. Limited to scratchpad.
Neural compression and smarter geometry compression schemes for pre-generated assets, BVH reduction, and shader code etc... .

This is too far out to matter rn. By the time this has wide or even selective adoption RAM situation should've been resolved long time ago. Think of it as a VRAM multiplier in PS6 post-crossgen era.

That's extremely misleading. Footprint for generation code has nothing to do with Work Graph itself, but that's no good for a stupid clickbait title. 55kB vs 1.5GB. They're off by orders of magnitude.
 
Last edited:
  • Like
Reactions: marees