Discussion RDNA 5 / UDNA (CDNA Next) speculation

Keller_TT · Aug 9, 2025

Win2012R2 said:
There is no way RT will be "a bit better" - it will need to be x4 at least for PS6 to last another 7-8 years (that's till 2035, we should have flying cars on garbage by then)

It will be easily x4 compared to PS5, if you take a hypothetical 5070S, or a 9070 with 20% better RT, compared to the RX 6700. Would expect 6-8x in heavy RT scenarios. It's also a combination of CPU+GPU.

They've got a good enough RT+PSSR with 16GB VRAM, limited by CPU. VRAM is the plug & play aspect of it, and Sony can get it cheap enough. I'm sure the FSR4+ variant that'll ship with PS6 will have enough optimizations. Sony and AMD also have their philosophy on AI/ML in gaming, and what a game is.

With all that said, it's also about the price. It wouldn't be like the PS5 Pro, but more like PS5 launch. That's packing plenty for Sony's elite studios to get cracking.

marees · Aug 9, 2025

RDNA 5 ??

adroc_thurston said:
No.

That's called having MFMA in shader cores and guess what has it.

MrMPFR · Aug 9, 2025

adroc_thurston said:
looks at AT0 gfx13
Yeah?

gfx13 does introduce a grabbag of tricks to make high WGP count blobs work a lot better.

Distributed and hierarchical decentralized scheduling clearly a corner stone of GFX13. Bypasses most severe command processor bottlenecks.

With all the patents AMD clearly isn't holding back any punches. The entire architecture basically says "made for GPU Workgraphs" like GCN's "made for Mantle and Async compute". The Shader Engine schedulers = Workgraph Schedulers.

Kepler_L2 · Aug 9, 2025

Magras00 said:
Distributed and hierarchical decentralized scheduling clearly a corner stone of GFX13. Bypasses most severe command processor bottlenecks.

With all the patents AMD clearly isn't holding back any punches. The entire architecture basically says "made for GPU Workgraphs" like GCN's "made for Mantle and Async compute". The Shader Engine schedulers = Workgraph Schedulers.

Yes but it goes beyond that. WorkGraphs are used by developers but the WGP Local Scheduler can be used by the shader compiler without developer intervention.

MrMPFR · Aug 9, 2025

Kepler_L2 said:
Yes but it goes beyond that. WorkGraphs are used by developers but the WGP Local Scheduler can be used by the shader compiler without developer intervention.

That's very reassuring.

RDNA 5 has Autonomous Shader Engine level scheduling via Workgraph Scheduler (WGS) and dispatch via Asynchronous Dispatch Controller (ADC). When combined with the clever load balancing scheme ("work stealing" via command processor) it is basically made for scale and even enables a Zen-like chiplet paradigm for GPUs.

Is AT0 monolithic?

So it really doesn't appear AT0 will be held back by the usual global frontend scheduling problems. NVIDIA better have something similar nextgen or things could get very ugly at the high end for them. Look at their current issues with core scaling from RTX 5080 to 5090.

Edit: Retracted info not related to RDNA 5. See Kepler_L2's Reply.
Edit #2: More precise wording around WGS related changes.

branch_suggestion · Aug 9, 2025

Looking back through the patent history, N4C would've been the only way to scale to hundreds of CUs at the time, an 8SE/192CU RDNA4 monodie would've had horrible scaling.
N4C was really a very bruteforce way of scaling, glad that with GFX13 you can make a realistic monodie (AT0) which scales beyond what N4C probably could've managed, not as high as N5C or whatever though.
GFX14 patents are right back on the chiplet train, this time the train is blasting on all of the drugs.

Magras00 said:
Is AT0 monolithic?

Well the compute is monolithic, there is a MID with all of the IO/multimedia connected via fanout. AT0/2 share this, also a beefier MID for Radeon Pro and the like.

Magras00 said:
So it really doesn't appear AT0 will be held back by the usual global frontend scheduling problems. NVIDIA better have something similar nextgen or things could get very ugly at the high end for them. Look at their current issues with core scaling from RTX 5080 to 5090.

Yeah GB202 scaling is not good, neither was AD102 really.

Kepler_L2 · Aug 9, 2025

Magras00 said:
MIMD-like execution at the thread level using Finite State Machines

That's for CDNA

Magras00 said:
WGP takeover mode with shared wavefront collaboration.

That's already on RDNA4

MrMPFR · Aug 9, 2025

Kepler_L2 said:
That's for CDNA

That's already on RDNA4

Guess you cannot assume anything based on filing dates :-/ The RDNA 4 patent was filed in July 2024.

Info pulled.

Win2012R2 · Aug 9, 2025

Keller_TT said:
It will be easily x4 compared to PS5

But that's not good enough for a platform that will need to last till 2035, Sony better off waiting extra couple of years and launch when N2 is cheap enough and plentiful.

Kepler_L2 · Aug 9, 2025

Magras00 said:
Guess you cannot assume anything based on filing dates :-/ The RDNA 4 patent was filed in July 2024.

Do any of you know if some of the following patents have already been implemented or if they are related to GFX 13 and later, and perhaps future CDNA? Or alternatively if some of them are just software patents?
https://docs.google.com/document/d/1p396eB3Fa3eBVhDexvAkkNN5mCvHLziiOEufT4BBZoM

I've sorted the patents into categories and tagged everything with a number making it easier to reference and correct in the future. I've counted it to a total of ~75 patents.

Once the patent status is confirmed by a reputable source or expertise (understanding patent properly), I'll be sure to tag them accordingly and keep the doc online in case anyone wants to reference the RDNA 5 patents in the future. You're free to copy, edit and use the info as you like.

General #5 is in RDNA4, #6 is in RDNA3, #15 is in RDNA5. The others I'm not sure, but most of them are either in RDNA5 or have been superseded by a different implementation.

Win2012R2 · Aug 9, 2025

branch_suggestion said:
Yeah GB202 scaling is not good, neither was AD102 really.

AD102 was almost double from Ampere, that was good enough uplift

MrMPFR · Aug 9, 2025

Kepler_L2 said:
That should be a favourable comparison if anything, DXR 1.2 SER/OMM is supported with SW emulation on RDNA4 while RDNA5 has HW accel, and NRC/Ray denoising should be much faster on RDNA5 thanks to FP4 support.

SWC = SER support yes?
How does RDNA 5 get HW OMM support? I can't find a patent filing for it. DGF encoding opacity micro maps + OMM HW accel?

dangerman1337 · Aug 9, 2025

Win2012R2 said:
AD102 was almost double from Ampere, that was good enough uplift

Though we never sadly got to see a 4090 Ti/Titan SKU released though.

adroc_thurston · Aug 9, 2025

Win2012R2 said:
AD102 was almost double from Ampere, that was good enough uplift

Tons more clocks carried datboi.

soresu · Aug 9, 2025

A snippet from that GATE paper dealing with more efficient ways to encode neural data in 3D scenes...

Kepler_L2 · Aug 9, 2025

Magras00 said:
SWC = SER support yes?

SWC goes way beyond SER

Magras00 said:
How does RDNA 5 get HW OMM support? I can't find a patent filing for it. DGF encoding opacity micro maps + OMM HW accel?

I don't think there's any patent for it, but it is supported in HW

MrMPFR · Aug 10, 2025

Kepler_L2 said:
SWC goes way beyond SER

I don't think there's any patent for it, but it is supported in HW

Is it at all possible for you to share glimpses into what "well beyond" means?

OMM is supported directly in DGF so gfx13 prob FF HW in RT core similar to OMM in Blackwell. My accessment after reading the patents.

Filings suggest DGF is quite beneficial even to current GPUs. Enables reduced cache traffic, and cache storage requirements and transactions.

RDNA 5 benefits: when paired with prefiltering and decompression logic in HW this gets turbocharged: Skip FP tests completely for far detail, perform low precision integer bulk testing in parallel using many smaller INT testers as a prefilter state before FP.

Conclusion: Significant benefits to power efficiency, performance per area, processing speed, and cache traffic.

https://imgur.com/a/pxMyU99

DGF + prefiltering related patents:

PRE-FILTERING NODES FOR BOUNDING VOLUME HIERARCHY,
- From <https://www.patents-review.com/a/20250111586-pre-filtering-nodes-bounding-volume-hierarchy.html>
Intersection Testing on Dense Geometry Data using Triangle Prefiltering
- From <https://www.patents-review.com/a/20250131640-intersection-testing-dense-geometry-data-triangle.html>
Dense Geometry Format
- From <https://www.patents-review.com/a/20250131639-dense-geometry-format.html>
SIMPLIFIED LOW-PRECISION RAY INTERSECTION THROUGH ACCELERATED HIERARCHY STRUCTURE PRECOMPUTATION
- From <https://www.patents-review.com/a/20...w-precision-ray-intersection-accelerated.html>
System and Method for Low-precision Ray Tests
- From <https://www.patents-review.com/a/20250209723-system-method-low-precision-ray-tests.html>
DISCRETE ROTATIONS FOR ORIENTED BOUNDING BOXES BASED ON PLATONIC SOLIDS
- From <https://www.patents-review.com/a/20...s-oriented-bounding-boxes-based-platonic.html>

Kepler said RDNA 5 has feature parity with 50 series + then some. Patent filings for ray tracing, scheduling overhaul, and other general clearly changes support this. Then add the prefiltering step with dedicated decompression top and GFX13 should bury Blackwell per SKU (same raster).

Edit: moved pictures to other site + small rewrites

Kepler_L2 · Aug 10, 2025

Magras00 said:
Is it at all possible for you to share glimpses into what "well beyond" means?

SER applies to Ray Tracing only and requires use of a developer API. SWC can apply to any shader workload and be done at either HW or shader compiler level.

MrMPFR · Aug 10, 2025

Keller_TT said:
It will be easily x4 compared to PS5, if you take a hypothetical 5070S, or a 9070 with 20% better RT, compared to the RX 6700. Would expect 6-8x in heavy RT scenarios.

6-8X That figure might not be high enough. Isn't PS5 Pro already 3-4X more than PS5 in raw ray intersect/traversal throughput? Based on patent filings a PS6 ~9070/9070XT raster with RDNA 5 RT gains prob ~8-20X more powerful (gigarays/sec).

Does someone knows how PS6 could compare to 5070 TI and 9070XT in raw ray processing throughput?

Yep add ML + SR on top and it becomes even more potent. Project Amethyst source of PS6 gen fine wine.

Win2012R2 said:
But that's not good enough for a platform that will need to last till 2035, Sony better off waiting extra couple of years and launch when N2 is cheap enough and plentiful.

Indeed. AMD and Sony needs PS6 targeting mainstream neurally augmented PT or it won't feel truly nextgen.

Win2012R2 said:
AD102 was almost double from Ampere, that was good enough uplift

Clockspeed and moar cores as @adroc_thurston said. The problem is scaling from x80 to x90 tier. 4080 -> 4090 bad, 5080 to 5090 bad as well like @branch_suggestion said. NVIDIA needs to fix this nextgen or AMD will dominate in PPA at very high end.

Edit: Was referencing core scaling problems on NVIDIA high end @Win2012R2, but yeah the gap between x80 and x90 tier is getting absurd.

Win2012R2 · Aug 10, 2025

Magras00 said:
The problem is scaling from x80 to x90 tier

The problem is that there is no 900-1200 gaming GPU market segment, people are either prepared to pay a lot less OR a lot more but for top notch device like 4090 was.

MrMPFR · Aug 11, 2025

Am I the only one confused with @Kepler_L2's patent post on the other forum site?

- Mentions SWC as OoO execution but the patent only mentions reordering by thread (thread coherency sorting).
So how is SWC OoO execution?

- Then lists the three patents related to an AMD DMM implementation as beyond current µArch, when DMM has been supported on RTX 40 series since 2022.
How is AMD's implementation beyond Ada's DMM decompression engine (Blackwell removed it)?

Interesting if GFX13 has HW accel DGF + DMM alongside each other.

WIth all the changes RDNA 5 RT performance should significantly exceed NVIDIA Blackwell. PS6 (~9070XT) could get ~5080-4090 level in rendering path tracer (not ReStir PT) but rn that's just my speculation based on public patents.

soresu · Aug 11, 2025

Magras00 said:
Filings suggest DGF is quite beneficial even to current GPUs. Enables reduced cache traffic, and cache storage requirements and transactions.

Bear in mind that many such things are utterly useless unless actually coded into the rendering backend of the game engine.

AMD will of course plaster their GPUOpen website with code samples and examples, but that's not the same as wheeling half a dozen free programmers into the HQ of Epic Games, DICE, Unity etc and having them do most of the work for those companies as nVidia has a tendency to do when promoting their (often proprietary) gfx bells and whistles.

No small amount of nVidia's success is simply throwing a lot of money at the problem of software dissemination, ensuring that 'lock in' becomes more and more likely.

(until companies like Epic get sick of it and start creating their own solutions like Chaos instead of PhysX)

branch_suggestion · Aug 11, 2025

AMD to Host Financial Analyst Day on November 11, 2025

SANTA CLARA, Calif., Aug. 11, 2025 (GLOBE NEWSWIRE) -- AMD (NASDAQ: AMD) announced it will host its 2025 Financial Analyst Day on Tuesday, Nov.…...

ir.amd.com

This is gonna be special, should get a preview of all 2026 products plus some 2027, and new roadmaps.

adroc_thurston · Aug 12, 2025

branch_suggestion said:
AMD to Host Financial Analyst Day on November 11, 2025

SANTA CLARA, Calif., Aug. 11, 2025 (GLOBE NEWSWIRE) -- AMD (NASDAQ: AMD) announced it will host its 2025 Financial Analyst Day on Tuesday, Nov.…...

ir.amd.com

This is gonna be special, should get a preview of all 2026 products plus some 2027, and new roadmaps.

Swell.
Poggers, dr. Su. Poggers.

Win2012R2 · Aug 12, 2025

soresu said:
Bear in mind that many such things are utterly useless unless actually coded into the rendering backend of the game engine.

Which it will be en masse for PS6

Discussion RDNA 5 / UDNA (CDNA Next) speculation

Member

Platinum Member

Member

Golden Member

Member

Senior member

Golden Member

Member

Golden Member

Golden Member

Golden Member

Member

Senior member

Diamond Member

Diamond Member

Golden Member

Member

Golden Member

Member

Golden Member

Member

Diamond Member

Senior member

Diamond Member

Golden Member