Discussion RDNA 5 / UDNA (CDNA Next) speculation

Page 65 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

MrMPFR

Member
Aug 9, 2025
114
232
76
The combo of nanite with RT has wrecked many games

Plus Lumen implementation produces a result that is very hard to optimize for low end (SVOGI - Voxel Cone Tracing gives much better bang for buck)
Nanite (AC Shadows Geo =/= Nanite) is only used in UE5 games and most UE5 games don't bother with RTXGI, PTGI or HW lumen. RT is not the problem. For example DDGI is blazing fast, but only covers diffuse lighting. For GI SW Lumen is an inferior and slow SW solution that tries to do everything (GI and reflections). HW version looks much better but still heavy.

Yeah but SVOGI only covers diffuse lighting and the lighting in KCD 2 or Crysis Remastered doesn't look close to SW Lumen in UE5 titles or DDGI in titles such as Metro Exodus Enhanced Edition. DDGI > SVOGI

They heavily customize UE5 for their two games, unlike some other devs. AFAICT no Lumen and Nanite in Arc Raiders or The Finals, some serious in-house engine tweaking, and 2016 midrange GPU on min specs. Not surprising they run well. IIRC both use DDGI (RTXGI) and a fallback GI solution.

Probably as a rule of thumb the more customized the UE5 version in a game is the better it'll run. Engineers always wanna tweak and optimize stuff. TW4 prob first UE5 game using the latest tech AND basically running flawlessly. TurboTECH FTW!

They must do two things:
1) get to UE6 real quick because UE5 is now more or less toxic keyword, new games that are well made using it are better off stop saying which engine they've got
2) they need to fix upgrade situation - games dev who start making a game on major version X should be able to upgrade seamlessly to a minor version, otherwise it's total BS
1) Not gonna happen when UE6 isn't launching anytime soon, Sweeney said release (preview) in a few years in Spring, so ~2030 release, 8 years after UE5 release. Yeah but does that really carry over to the average PC gamer and console gamers? He also said they're going to abandon the old code completely and rewrite everything to be multithreaded and that's not easy. Similar to what Unity did with DOTS but I suspect more profound changes including leveraging Work Graphs (a big deal for PS6 and RDNA 5) for virtually everything based on Epic's public statements in 2024. UE6 mass adoption in early to mid 2030s could be when RDNA 5 really begins to shine.
2) That is impossible considering how much they change with each release. But every single serious AA and AAA dev should commit to all the UE5.6+5.7 experimental stuff right away. UAF, FastGeo, Nanite Foliage...

This is an early adopters phase. UE4 all over again. Give it a few more years and by 2028 post PS6 launch a lot of new UE5 games will leverage all the experimental UE5.6 tech to eliminate traversal stutters and just run much better overall. By then the HW will be more capable (fingers crossed RDNA 5 and Rubin are good) and Advanced Shader Delivery will be pervasive.

Not specific to RDNA 5 or any graphics card, but just the current trajectory pushed by the incumbent powers that be - Read Epic, Nvidia, and graphics built on Unreal Engine.
This is a RDNA 5 thread so please post this somewhere else in the future or don't.
 

MrMPFR

Member
Aug 9, 2025
114
232
76
Found a proposed explanation for why Sony bothered with Neural Arrays here:

"As I mentioned in another thread, it appears like the Neural Arrays solution likely is a means of providing groups of CUs that have additional circuits that can either passively work like the PS5/Pro or can treat the array of CU registers as sharing a memory address space, so that tensor tiles bigger than an individual CU register memory's L1 cache can be spanned across the CU's by a higher level Neural Array controller and eliminate a lot of the 40-70% wasted tile border processing (TOPs) that PSSR on PS5 Pro suffers from in the PS5 Pro technical seminar video at 23:54.

By allowing for much larger tiles via Neural Arrays the hardware could either be retasked to a Transformer model like DLSS4 or would already be operating on such large titles at lower resolution tensors that the holistic benefits of Transformers would already be achieved by the CNNs.

Assuming a Neural array tile was already big enough for a full 360x240 tensor to fit. If the Array was able to work like I'm guessing it would effectively be processing an entire Mip of the whole scene all at once."



As per SIE Road to PS5 Pro vid it has WGP takeover mode to process one tile per WGP. With RDNA 5 AMD took the next logical step which was to implement takeover mode at the SE level and process tiles not on a per WGP/CU basis but on a Neural Array basis.
I think this is the patent for WGP takeover mode: https://patents.google.com/patent/US12033275B2
Wonder if this takeover mode is a PS5 Pro customization or in RDNA 4 as well?

Unhinged speculation but if scheduling, synchronization, and control logic is relegated to higher level anyway (Shader Engine), AMD could decouple the the ML logic completely from the current four SIMDs within a WGP and merge it into one giant systolic array per WGP/CU. With AMDFP4 (they need their own answer to NVFP4) and doubled FP8 throughput (4X/WGP) prob 16 times larger FP4 WGP level systolic array than RDNA 4's FP8 SIMD level systolic array. In effect something like the systolic array found in a DC class Tensor core or a NPU.
Doing this SIMD decoupling would require massive cache system changes. Perhaps with some tweaks to this patent AMD could implement a scheme where the Systolic Array gobbles up the most of or entire LDS+L0 and VGPR and allocates it as a giant shared Tensor memory or a combination of this and private data stores. RDNA 4 has 4 x 192kB VGPR + 1 x 128kB LDS + 2 x 32kB L0 = 960kB maximum theoretical Tensor Memory per WGP/CU. Possible RDNA 5 is even larger if VRF and LDS+L0 gets bigger with GFX13.
To connect it all together implement relevant SE level logic AND a inter-WGP/CU fabric and process enormous FSR5 tiles on a per Neural Array basis.

Sounds cool but prob not happening.

Whatever ends up happening still a shame DF latest vid didn't pry Cerny on this. Some clarification could've been nice. All we got was:
"Neural Arrays will allow us to proces a large chunk of the screen in one go, and the efficiencies that come from that are going to be a game changer as we begin to develop the next generation of upscaling and denoising technologies together."
 
Last edited:

MrMPFR

Member
Aug 9, 2025
114
232
76
gfx13 has it because gfx1250 has it.
2022: Hopper Adds DSMEM + TBC
2022: Ada Lovelace ignores^
2025: Blackwell consumer ignores ^^

AMD could've chosen to cut it like from consumer like NVIDIA (Kepler confirmed it's not on consumer) but opted to include it anyway.
Kepler is wrong. @adroc_thurston is correct. Blackwell consumer has DSMEM + TBC.

Cerny already said the point was larger tiles. I assume this is targeting mostly the CNN portion of FSR5 assuming it sticks with FSR4 Hybrid CNN+ViT design. Working on a larger tile is effectively a larger "context window" = improved fidelity + also less wasted tile border processing.
That's the idea. Then how it carries over to the actual FSR5 implementation who knows.
 
Last edited:

adroc_thurston

Diamond Member
Jul 2, 2023
7,193
9,971
106
2022: Ada Lovelace ignores^
because it was sm89.
2025: Blackwell consumer ignores ^^
it does have dsmem actually.
NVIDIA (Kepler confirmed it's not on consumer)
It in on consumer, see CUDA cc12 feature compatability matrix.
1760210161702.png
Cerny already said the point was larger tiles. I assume this is targeting mostly the CNN portion of FSR5 assuming it sticks with FSR4 Hybrid CNN+ViT design. Working on a larger tile is effectively a larger "context window" = improved fidelity + also less wasted tile border processing.
That's the idea. Then how it carries over to the actual FSR5 implementation who knows.
?
the point is that you get accelerated shmem transfers.
Without hammering the L2.
 
Last edited:

MrMPFR

Member
Aug 9, 2025
114
232
76
Thanks for the screenshot of the table. Really impressive they managed to cram all this new tech into GB206 die that's still smaller than AD106, Tons of low level optimizations and/or the silicon overhead is just minimal.

?
the point is that you get accelerated shmem transfers.
Without hammering the L2.
Yeah my explanation isn't great. Maybe someone else can explain it better.

Sure but you can't do the massive image processing tiles Cerny talked about without that. Just not feasible.

A related quote in case anyone is interested: "DSMEM enables more efficient data exchange between SMs, where data no longer must be written to and read from global memory to pass the data. The dedicated SM-to-SM network for clusters ensures fast, low latency access to remote DSMEM. Compared to using global memory, DSMEM accelerates data exchange between thread blocks by about 7x."
- From https://developer.nvidia.com/blog/nvidia-hopper-architecture-in-depth/
 

Panino Manino

Golden Member
Jan 28, 2017
1,144
1,383
136
Won't argue about Threat Interactive, but I have to say, it's hard for me to understand something.
Why? With the flow of time, even as more and more processing power and memory becomes available, why does game graphics seem to have more and graphical compromisses and compromisses?
It always seems that more and more and required to do the same things that were perfected before.
 

itsmydamnation

Diamond Member
Feb 6, 2011
3,075
3,904
136
Won't argue about Threat Interactive, but I have to say, it's hard for me to understand something.
Why? With the flow of time, even as more and more processing power and memory becomes available, why does game graphics seem to have more and graphical compromisses and compromisses?
It always seems that more and more and required to do the same things that were perfected before.
i think the only real regression has been deferred rendering , hopefully at some point we can get back to some form of forward rendering that supports large number of light sources. Then we can have real AA again.
 

soresu

Diamond Member
Dec 19, 2014
4,117
3,575
136
And the fact modern games look bad is just me imagining things?
Look bad?

Doom Dark Ages looks great IMHO, a significant step up from Doom Eternal.

Though clearly good art direction and/or cinematography is a big part of it, as the later addition of path tracing makes little difference to the cut scene visual fidelity.

Merely having a great rendering engine isn't nearly so good as having a director who actually knows what they are doing with it.

I doubt that merely putting Doom Eternal assets in the id Tech 8 engine would be anywhere near as effective.
 
  • Like
Reactions: MrMPFR

soresu

Diamond Member
Dec 19, 2014
4,117
3,575
136