Discussion RDNA 5 / UDNA (CDNA Next) speculation

Page 64 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

MrMPFR

Member
Aug 9, 2025
116
235
76
That's just one patent. There are more related to traversal in HW:
Traversal recursion for acceleration structure traversal
Graphics processing unit traversal engine

With SWC it brings RT up to Level 3.5 like Alchemist and Ada Lovelace and later.
Everything else is unfortunately shaky (except DGF in HW). This was barely even a teaser for RDNA 5.

But this part was interesting:
"One top of those performance increases (BVH traversal in HW), there's other features in the works, too, such as flexible and efficient data structures for the geometry being ray traced."

Have to assume this goes well beyond DMMs and DGF. How far who knows.
This patent implementing partial BVH computations directly within RT cores (sorting and reductions) popped up last week:
System and Method for Bounding Volume Hierarchy Construction

But more likely to be referencing to something akin to the the overlay trees and delta instances compression patents:
Acceleration structures with delta instances
Overlay trees for ray tracing

^Just patents. Who knows what actually ends up in RDNA 5.


Ignore^. @Kepler_L2 has spoken. It's just old boring DGF. Really hoped for more in RDNA 5 even if it's still beyond Blackwell.

Edit: Huynh talked about the new BHV traversal HW reducing load on GPU shaders AND the CPU (It's possible he misspoke). Is the reduced CPU load from BVH in HW and/or actual HW BVH management as mentioned in the newer patent?

Again probably ignore. Seems like the novel partial BVH build in HW patent most likely absent in RDNA5. What a shame :( Level 5 RT implementation would've been massive.
 
Last edited:

MrMPFR

Member
Aug 9, 2025
116
235
76
because they don't really do it like anyone else.
Didn't expect it to be that different.

RDNA5 will have more stuff.
Sure. Just a teaser if you can even call it that.

was it really hype.
They just talked a bit about challenges ahead.
You're right but the lazy tech press will find a way to spin it as hype xD

FAD is for roadmaps and serious people, not console toddlerslop. get real.
Rewatch the 2020 FAD. There's tons of details on RDNA 2 and confirmation for NG consoles, but format would have had to be complete different.
 
  • Like
Reactions: Tlh97 and marees

adroc_thurston

Diamond Member
Jul 2, 2023
7,194
9,975
106
Didn't expect it to be that different.
They like meth.
Sure. Just a teaser if you can even call it that.
Yuh.
You're right but the lazy tech press will find a way to spin it as hype xD
wccftech article in 3... 2... 1.
Rewatch the 2020 FAD. There's tons of details on RDNA 2 and confirmation for NG consoles, but format would have had to be complete different.
The 2022 one had like one or two slides for RDNA3.
 

Keller_TT

Member
Jun 2, 2024
161
198
76
Hehe - python is just being used as a scripting language calling highly optimised 'AI primitives' coded in C/C++.

There is a thing called MegaKernel - you describe the computation graph for your LLM in python code and then it compiles a single gpu kernel that is highly optimised in terms of memory accesses. Very interesting stuff. Very fast and no C++ :p


A smidge offtopic though.... looking forward to the 128GB RDNA 5 AI cards!! :D
Btw, my alma mater, Uni Heidelberg, started a project called hipSYCL, which has been renamed to AdaptiveCpp (fully open source on GitHub), which is using standard C++ 17 to move over from CUDA for HPC, GPGPU work. It was specifically started to extract the best from AMD GPUs, and its foundational papers were published on AMD testbench. But it is meant to be vendor neutral for CPUs + GPUs.. It is a super project, and I'm glad that I could do few little things for it. It is not specifically targeted at ML, but one can write ML kernels nevertheless.

Whatever Lattner critiqued about OpenCL blowing it with its terrible governance and mismatch of competitive interests holding back, this one goes a long way solving it, as it is started and managed by University led pure scientific research for real-world needs.

Reg Python instead of C++ for ML/AI and making CUDA moot, that's called Mojo, though Mojo is much more than that. They just recently added GPU programming support for RDNA 3, and 4.
 

Mopetar

Diamond Member
Jan 31, 2011
8,490
7,739
136

MrMPFR

Member
Aug 9, 2025
116
235
76
Does it become self-aware at that level?
I'm just using Imagination Technologies's old levels of RT (each higher level build upon prev): https://gfxspeak.com/featured/the-levels-tracing/
Level 1 = SW emulaton
Level 2 = Ray tri/box (RDNA 2+)
Level 3 = HW BVH processing (RTX 20-30)
Level 3.5 = Thread coherency sorting (ARC, 40-50 series, M3 and later and ?RDNA 5)
Level 4 = Ray coherency sorting (PowerVR Photon)
Level 5 = HW BVH construction (PowerVR GR6500)

It's completely meaningless for performance but a good gauge of architectural sophistication (number of fixed-function HW blocks). BTW Imagination scrapped Level 5 since it wasn't worth it.
Don't take it too seriously.

This 'level' stuff is Fake and Gay since none of that slop addresses the main issue of doing RTRT on things not Larrabee.
I'll be interesting to see where RDNA 5 lands. Register renaming is already a step towards CPU territory but not enough.

Also the entire point of Level 4 is avoid that overhead entirely by making RT behave differently to align with SIMD rather than MIMD. SER/SWC are bandaids. They don't fix the problem at its root unlike ray coherency sorting does. Rays heading in the same direction need to be batched and run together, instead of randomly assigning rays heading in left and right within a SIMD. Until that happens RT will always prefer MIMD.

RT Level 5 is only
- hard(a)ware
- hard-aware
- what(a)ever
Lol
 
  • Like
Reactions: Tlh97

adroc_thurston

Diamond Member
Jul 2, 2023
7,194
9,975
106
Also the entire point of Level 4 is avoid that overhead entirely by making RT behave differently to align with SIMD rather than MIMD
it's all Fake and Gay since you're still adding chains of very latency-sensitive ops to a hardware pipeline that is just not built for it.
RTRT is just a really, really, really bad workload for anything, but especially GPUs that have like 200ns of L2 latency alone.
 
  • Like
Reactions: marees

Keller_TT

Member
Jun 2, 2024
161
198
76
YouTube decided to show me this channel "Threat Interactive", and this guy lays into the RT/PT kool aid, the current Unreal slop, and Digital Foundry's crap about "pushing gaming tech".

The guy has subsequently released a 2nd part to this today, but this part from 10 days ago is about Callisto Protocol's implementation of BRDF:

 

marees

Golden Member
Apr 28, 2024
1,782
2,399
96
YouTube decided to show me this channel "Threat Interactive", and this guy lays into the RT/PT kool aid, the current Unreal slop, and Digital Foundry's crap about "pushing gaming tech".

The guy has subsequently released a 2nd part to this today, but this part from 10 days ago is about Callisto Protocol's implementation of BRDF:

What is the RDNA 5 connection ?

Is it PT ??
 

poke01

Diamond Member
Mar 8, 2022
4,255
5,598
106
Don’t fall for marketing buzz words from any company, that AMD/PS video was pure puke.

Likewise that RT level chart is funny coming from IMG
 

marees

Golden Member
Apr 28, 2024
1,782
2,399
96
Not specific to RDNA 5 or any graphics card, but just the current trajectory pushed by the incumbent powers that be - Read Epic, Nvidia, and graphics built on Unreal Engine.

His channel is about graphics tech in game engines.
The combo of nanite with RT has wrecked many games

Plus Lumen implementation produces a result that is very hard to optimize for low end (SVOGI - Voxel Cone Tracing gives much better bang for buck)

I believe Epic UE5 has some work to do on UE5 for performance on low end cards
 

MrMPFR

Member
Aug 9, 2025
116
235
76
it's all Fake and Gay since you're still adding chains of very latency-sensitive ops to a hardware pipeline that is just not built for it.
RTRT is just a really, really, really bad workload for anything, but especially GPUs that have like 200ns of L2 latency alone.
Was just reporting the stuff mentioned in the patent filing and the PowerVR Photon Whitepaper (ignore this as the patent is more interesting). Leaving the Packet Coherency Gather related patent here in case anyone is interested: https://patents.google.com/patent/US20220068008A1

And a few quotes from the patent and there's more:
"It (coherency gathering) can allow geometry information to be read once, and to be tested against multiple rays. This also facilitates parallel implementation—for example, using a Single Instruction Multiple Data (SIMD) model—whereby separate hardware-units process the different rays (of the same group) in parallel against the same geometry information."

"By gathering rays according to each specific instance of each BLAS node, the system can arrange for a group of rays that share the same transform as well as the same BLAS node to be scheduled for testing together. Therefore, at most one memory request should be required to retrieve the transform for intersection-testing a given group of rays. According to examples, this is further facilitated by using an instance transform cache."

"When an instance transform is first required, it is loaded into the instance transform cache. The next time the same instance transform is used for intersection testing, it can be expected that it can be retrieved from the instance transform cache without needing to load it from the external memory. This reduces the memory access overhead."


Doesn't sound fake to me, but like a well thought out system of multiple HW optimization that AMD would want to license or reach through different means and then include in RDNA 5. That's assuming they're serious about solving the subpar coherency and cachemem overload problem plaguing RT rn.

Edit: Arrrgh I added the wrong link. You can find the real patent now.
 
Last edited:
  • Like
Reactions: Tlh97 and marees

Win2012R2

Golden Member
Dec 5, 2024
1,215
1,256
96
No, the devs are just incompetent.
Including Gearbox?

If they can't do it, then who can?

I believe Epic UE5 has some work to do on UE5 for performance on low end cards

They must do two things:
1) get to UE6 real quick because UE5 is now more or less toxic keyword, new games that are well made using it are better off stop saying which engine they've got
2) they need to fix upgrade situation - games dev who start making a game on major version X should be able to upgrade seamlessly to a minor version, otherwise it's total BS
 
  • Like
Reactions: marees

Bigos

Senior member
Jun 2, 2019
205
520
136
Threat Interactive is a sludge posting grifter who pays his rent by appealing to 104 IQ redditors with a big stiffy for hating games made after 2015. Replace any graphics buzzwords he hates with "woke" and the output is 1:1

And the fact modern games look bad is just me imagining things?

I am in the camp "truth is between the two opposing sides".
 
  • Like
Reactions: igor_kavinski