Discussion RDNA 5 / UDNA (CDNA Next) speculation

Page 33 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Keller_TT

Member
Jun 2, 2024
146
170
76
There is no way RT will be "a bit better" - it will need to be x4 at least for PS6 to last another 7-8 years (that's till 2035, we should have flying cars on garbage by then)
It will be easily x4 compared to PS5, if you take a hypothetical 5070S, or a 9070 with 20% better RT, compared to the RX 6700. Would expect 6-8x in heavy RT scenarios. It's also a combination of CPU+GPU.

They've got a good enough RT+PSSR with 16GB VRAM, limited by CPU. VRAM is the plug & play aspect of it, and Sony can get it cheap enough. I'm sure the FSR4+ variant that'll ship with PS6 will have enough optimizations. Sony and AMD also have their philosophy on AI/ML in gaming, and what a game is.

With all that said, it's also about the price. It wouldn't be like the PS5 Pro, but more like PS5 launch. That's packing plenty for Sony's elite studios to get cracking.
 

Magras00

Junior Member
Aug 9, 2025
7
13
36
looks at AT0 gfx13
Yeah?

gfx13 does introduce a grabbag of tricks to make high WGP count blobs work a lot better.

Distributed and hierarchical decentralized scheduling clearly a corner stone of GFX13. Bypasses most severe command processor bottlenecks.

With all the patents AMD clearly isn't holding back any punches. The entire architecture basically says "made for GPU Workgraphs" like GCN's "made for Mantle and Async compute". The Shader Engine schedulers = Workgraph Schedulers.
 

Kepler_L2

Senior member
Sep 6, 2020
939
3,853
136
Distributed and hierarchical decentralized scheduling clearly a corner stone of GFX13. Bypasses most severe command processor bottlenecks.

With all the patents AMD clearly isn't holding back any punches. The entire architecture basically says "made for GPU Workgraphs" like GCN's "made for Mantle and Async compute". The Shader Engine schedulers = Workgraph Schedulers.
Yes but it goes beyond that. WorkGraphs are used by developers but the WGP Local Scheduler can be used by the shader compiler without developer intervention.
 

Magras00

Junior Member
Aug 9, 2025
7
13
36
Yes but it goes beyond that. WorkGraphs are used by developers but the WGP Local Scheduler can be used by the shader compiler without developer intervention.

That's very reassuring.

RDNA 5 has Autonomous Shader Engine level scheduling via Workgraph Scheduler (WGS) and dispatch via Asynchronous Dispatch Controller (ADC). When combined with the clever load balancing scheme ("work stealing" via command processor) it is basically made for scale and even enables a Zen-like chiplet paradigm for GPUs.

Is AT0 monolithic?

So it really doesn't appear AT0 will be held back by the usual global frontend scheduling problems. NVIDIA better have something similar nextgen or things could get very ugly at the high end for them. Look at their current issues with core scaling from RTX 5080 to 5090.


Edit: Retracted info not related to RDNA 5. See Kepler_L2's Reply.
Edit #2: More precise wording around WGS related changes.
 
Last edited:
  • Like
Reactions: Tlh97

branch_suggestion

Senior member
Aug 4, 2023
733
1,567
106
Looking back through the patent history, N4C would've been the only way to scale to hundreds of CUs at the time, an 8SE/192CU RDNA4 monodie would've had horrible scaling.
N4C was really a very bruteforce way of scaling, glad that with GFX13 you can make a realistic monodie (AT0) which scales beyond what N4C probably could've managed, not as high as N5C or whatever though.
GFX14 patents are right back on the chiplet train, this time the train is blasting on all of the drugs.
Is AT0 monolithic?
Well the compute is monolithic, there is a MID with all of the IO/multimedia connected via fanout. AT0/2 share this, also a beefier MID for Radeon Pro and the like.
So it really doesn't appear AT0 will be held back by the usual global frontend scheduling problems. NVIDIA better have something similar nextgen or things could get very ugly at the high end for them. Look at their current issues with core scaling from RTX 5080 to 5090.
Yeah GB202 scaling is not good, neither was AD102 really.
 
  • Like
Reactions: Tlh97 and Magras00

Magras00

Junior Member
Aug 9, 2025
7
13
36
That's for CDNA

That's already on RDNA4
Guess you cannot assume anything based on filing dates :-/ The RDNA 4 patent was filed in July 2024.


Do any of you know if some of the following patents have already been implemented or if they are related to GFX 13 and later, and perhaps future CDNA? Or alternatively if some of them are just software patents?
https://docs.google.com/document/d/1p396eB3Fa3eBVhDexvAkkNN5mCvHLziiOEufT4BBZoM

I've sorted the patents into categories and tagged everything with a number making it easier to reference and correct in the future. I've counted it to a total of ~75 patents.

Once the patent status is confirmed by a reputable source or expertise (understanding patent properly), I'll be sure to tag them accordingly and keep the doc online in case anyone wants to reference the RDNA 5 patents in the future. You're free to copy, edit and use the info as you like.
 

Kepler_L2

Senior member
Sep 6, 2020
939
3,853
136
Guess you cannot assume anything based on filing dates :-/ The RDNA 4 patent was filed in July 2024.


Do any of you know if some of the following patents have already been implemented or if they are related to GFX 13 and later, and perhaps future CDNA? Or alternatively if some of them are just software patents?
https://docs.google.com/document/d/1p396eB3Fa3eBVhDexvAkkNN5mCvHLziiOEufT4BBZoM

I've sorted the patents into categories and tagged everything with a number making it easier to reference and correct in the future. I've counted it to a total of ~75 patents.

Once the patent status is confirmed by a reputable source or expertise (understanding patent properly), I'll be sure to tag them accordingly and keep the doc online in case anyone wants to reference the RDNA 5 patents in the future. You're free to copy, edit and use the info as you like.
General #5 is in RDNA4, #6 is in RDNA3, #15 is in RDNA5. The others I'm not sure, but most of them are either in RDNA5 or have been superseded by a different implementation.
 

Magras00

Junior Member
Aug 9, 2025
7
13
36
That should be a favourable comparison if anything, DXR 1.2 SER/OMM is supported with SW emulation on RDNA4 while RDNA5 has HW accel, and NRC/Ray denoising should be much faster on RDNA5 thanks to FP4 support.

SWC = SER support yes?
How does RDNA 5 get HW OMM support? I can't find a patent filing for it. DGF encoding opacity micro maps + OMM HW accel?
 

Magras00

Junior Member
Aug 9, 2025
7
13
36
SWC goes way beyond SER

I don't think there's any patent for it, but it is supported in HW

Is it at all possible for you to share glimpses into what "well beyond" means?

OMM is supported directly in DGF so gfx13 prob FF HW in RT core similar to OMM in Blackwell. My accessment after reading the patents.

Filings suggest DGF is quite beneficial even to current GPUs. Enables reduced cache traffic, and cache storage requirements and transactions.

RDNA 5 benefits: when paired with prefiltering and decompression logic in HW this gets turbocharged: Skip FP tests completely for far detail, perform low precision integer bulk testing in parallel using many smaller INT testers as a prefilter state before FP.

Conclusion: Significant benefits to power efficiency, performance per area, processing speed, and cache traffic.


DGF + prefiltering related patents:

Kepler said RDNA 5 has feature parity with 50 series + then some. Patent filings for ray tracing, scheduling overhaul, and other general clearly changes support this. Then add the prefiltering step with dedicated decompression top and GFX13 should bury Blackwell per SKU (same raster).

Edit: moved pictures to other site + small rewrites
 
Last edited:
  • Like
Reactions: Mopetar

Magras00

Junior Member
Aug 9, 2025
7
13
36
It will be easily x4 compared to PS5, if you take a hypothetical 5070S, or a 9070 with 20% better RT, compared to the RX 6700. Would expect 6-8x in heavy RT scenarios.

6-8X That figure might not be high enough. Isn't PS5 Pro already 3-4X more than PS5 in raw ray intersect/traversal throughput? Based on patent filings a PS6 ~9070/9070XT raster with RDNA 5 RT gains prob ~8-20X more powerful (gigarays/sec).

Does someone knows how PS6 could compare to 5070 TI and 9070XT in raw ray processing throughput?

Yep add ML + SR on top and it becomes even more potent. Project Amethyst source of PS6 gen fine wine.

But that's not good enough for a platform that will need to last till 2035, Sony better off waiting extra couple of years and launch when N2 is cheap enough and plentiful.

Indeed. AMD and Sony needs PS6 targeting mainstream neurally augmented PT or it won't feel truly nextgen.

AD102 was almost double from Ampere, that was good enough uplift

Clockspeed and moar cores as @adroc_thurston said. The problem is scaling from x80 to x90 tier. 4080 -> 4090 bad, 5080 to 5090 bad as well like @branch_suggestion said. NVIDIA needs to fix this nextgen or AMD will dominate in PPA at very high end.

Edit: Was referencing core scaling problems on NVIDIA high end @Win2012R2, but yeah the gap between x80 and x90 tier is getting absurd.
 
Last edited:

Magras00

Junior Member
Aug 9, 2025
7
13
36
Am I the only one confused with @Kepler_L2's patent post on the other forum site?

- Mentions SWC as OoO execution but the patent only mentions reordering by thread (thread coherency sorting).
So how is SWC OoO execution?

- Then lists the three patents related to an AMD DMM implementation as beyond current µArch, when DMM has been supported on RTX 40 series since 2022.
How is AMD's implementation beyond Ada's DMM decompression engine (Blackwell removed it)?

Interesting if GFX13 has HW accel DGF + DMM alongside each other.

WIth all the changes RDNA 5 RT performance should significantly exceed NVIDIA Blackwell. PS6 (~9070XT) could get ~5080-4090 level in rendering path tracer (not ReStir PT) but rn that's just my speculation based on public patents.
 

soresu

Diamond Member
Dec 19, 2014
3,945
3,385
136
Filings suggest DGF is quite beneficial even to current GPUs. Enables reduced cache traffic, and cache storage requirements and transactions.
Bear in mind that many such things are utterly useless unless actually coded into the rendering backend of the game engine.

AMD will of course plaster their GPUOpen website with code samples and examples, but that's not the same as wheeling half a dozen free programmers into the HQ of Epic Games, DICE, Unity etc and having them do most of the work for those companies as nVidia has a tendency to do when promoting their (often proprietary) gfx bells and whistles.

No small amount of nVidia's success is simply throwing a lot of money at the problem of software dissemination, ensuring that 'lock in' becomes more and more likely.

(until companies like Epic get sick of it and start creating their own solutions like Chaos instead of PhysX)
 
  • Like
Reactions: marees and Saylick