Discussion RDNA 5 / UDNA (CDNA Next) speculation

Darkmont · Friday at 9:56 PM

adroc_thurston said:
Embark.

My savior, praise be to the future first ballot hall of famers

MrMPFR · Saturday at 8:46 AM

marees said:
The combo of nanite with RT has wrecked many games

Plus Lumen implementation produces a result that is very hard to optimize for low end (SVOGI - Voxel Cone Tracing gives much better bang for buck)

Nanite (AC Shadows Geo =/= Nanite) is only used in UE5 games and most UE5 games don't bother with RTXGI, PTGI or HW lumen. RT is not the problem. For example DDGI is blazing fast, but only covers diffuse lighting. For GI SW Lumen is an inferior and slow SW solution that tries to do everything (GI and reflections). HW version looks much better but still heavy.

Yeah but SVOGI only covers diffuse lighting and the lighting in KCD 2 or Crysis Remastered doesn't look close to SW Lumen in UE5 titles or DDGI in titles such as Metro Exodus Enhanced Edition. DDGI > SVOGI

adroc_thurston said:
Embark.

They heavily customize UE5 for their two games, unlike some other devs. AFAICT no Lumen and Nanite in Arc Raiders or The Finals, some serious in-house engine tweaking, and 2016 midrange GPU on min specs. Not surprising they run well. IIRC both use DDGI (RTXGI) and a fallback GI solution.

Probably as a rule of thumb the more customized the UE5 version in a game is the better it'll run. Engineers always wanna tweak and optimize stuff. TW4 prob first UE5 game using the latest tech AND basically running flawlessly. TurboTECH FTW!

Win2012R2 said:
They must do two things:
1) get to UE6 real quick because UE5 is now more or less toxic keyword, new games that are well made using it are better off stop saying which engine they've got
2) they need to fix upgrade situation - games dev who start making a game on major version X should be able to upgrade seamlessly to a minor version, otherwise it's total BS

1) Not gonna happen when UE6 isn't launching anytime soon, Sweeney said release (preview) in a few years in Spring, so ~2030 release, 8 years after UE5 release. Yeah but does that really carry over to the average PC gamer and console gamers? He also said they're going to abandon the old code completely and rewrite everything to be multithreaded and that's not easy. Similar to what Unity did with DOTS but I suspect more profound changes including leveraging Work Graphs (a big deal for PS6 and RDNA 5) for virtually everything based on Epic's public statements in 2024. UE6 mass adoption in early to mid 2030s could be when RDNA 5 really begins to shine.
2) That is impossible considering how much they change with each release. But every single serious AA and AAA dev should commit to all the UE5.6+5.7 experimental stuff right away. UAF, FastGeo, Nanite Foliage...

This is an early adopters phase. UE4 all over again. Give it a few more years and by 2028 post PS6 launch a lot of new UE5 games will leverage all the experimental UE5.6 tech to eliminate traversal stutters and just run much better overall. By then the HW will be more capable (fingers crossed RDNA 5 and Rubin are good) and Advanced Shader Delivery will be pervasive.

Keller_TT said:
Not specific to RDNA 5 or any graphics card, but just the current trajectory pushed by the incumbent powers that be - Read Epic, Nvidia, and graphics built on Unreal Engine.

This is a RDNA 5 thread so please post this somewhere else in the future or don't.

MrMPFR · Saturday at 2:34 PM

Found a proposed explanation for why Sony bothered with Neural Arrays here:

"As I mentioned in another thread, it appears like the Neural Arrays solution likely is a means of providing groups of CUs that have additional circuits that can either passively work like the PS5/Pro or can treat the array of CU registers as sharing a memory address space, so that tensor tiles bigger than an individual CU register memory's L1 cache can be spanned across the CU's by a higher level Neural Array controller and eliminate a lot of the 40-70% wasted tile border processing (TOPs) that PSSR on PS5 Pro suffers from in the PS5 Pro technical seminar video at 23:54.

By allowing for much larger tiles via Neural Arrays the hardware could either be retasked to a Transformer model like DLSS4 or would already be operating on such large titles at lower resolution tensors that the holistic benefits of Transformers would already be achieved by the CNNs.

Assuming a Neural array tile was already big enough for a full 360x240 tensor to fit. If the Array was able to work like I'm guessing it would effectively be processing an entire Mip of the whole scene all at once."

As per SIE Road to PS5 Pro vid it has WGP takeover mode to process one tile per WGP. With RDNA 5 AMD took the next logical step which was to implement takeover mode at the SE level and process tiles not on a per WGP/CU basis but on a Neural Array basis.
I think this is the patent for WGP takeover mode: https://patents.google.com/patent/US12033275B2
Wonder if this takeover mode is a PS5 Pro customization or in RDNA 4 as well?

Unhinged speculation but if scheduling, synchronization, and control logic is relegated to higher level anyway (Shader Engine), AMD could decouple the the ML logic completely from the current four SIMDs within a WGP and merge it into one giant systolic array per WGP/CU. With AMDFP4 (they need their own answer to NVFP4) and doubled FP8 throughput (4X/WGP) prob 16 times larger FP4 WGP level systolic array than RDNA 4's FP8 SIMD level systolic array. In effect something like the systolic array found in a DC class Tensor core or a NPU.
Doing this SIMD decoupling would require massive cache system changes. Perhaps with some tweaks to this patent AMD could implement a scheme where the Systolic Array gobbles up the most of or entire LDS+L0 and VGPR and allocates it as a giant shared Tensor memory or a combination of this and private data stores. RDNA 4 has 4 x 192kB VGPR + 1 x 128kB LDS + 2 x 32kB L0 = 960kB maximum theoretical Tensor Memory per WGP/CU. Possible RDNA 5 is even larger if VRF and LDS+L0 gets bigger with GFX13.
To connect it all together implement relevant SE level logic AND a inter-WGP/CU fabric and process enormous FSR5 tiles on a per Neural Array basis.

Sounds cool but prob not happening.

Whatever ends up happening still a shame DF latest vid didn't pry Cerny on this. Some clarification could've been nice. All we got was:
"Neural Arrays will allow us to proces a large chunk of the screen in one go, and the efficiencies that come from that are going to be a game changer as we begin to develop the next generation of upscaling and denoising technologies together."

adroc_thurston · Saturday at 2:41 PM

MrMPFR said:
Found a proposed explanation for why Sony bothered with Neural Arrays here:

no it's just some rando schizophrenia.
gfx13 has it because gfx1250 has it.
it's just dsmem.

man, console children are like 40iq.

MrMPFR · Saturday at 3:06 PM

adroc_thurston said:
gfx13 has it because gfx1250 has it.

2022: Hopper Adds DSMEM + TBC
2022: Ada Lovelace ignores^
2025: Blackwell consumer ignores ^^

AMD could've chosen to cut it like from consumer like NVIDIA (Kepler confirmed it's not on consumer) but opted to include it anyway. Kepler is wrong. @adroc_thurston is correct. Blackwell consumer has DSMEM + TBC.

Cerny already said the point was larger tiles. I assume this is targeting mostly the CNN portion of FSR5 assuming it sticks with FSR4 Hybrid CNN+ViT design. Working on a larger tile is effectively a larger "context window" = improved fidelity + also less wasted tile border processing.
That's the idea. Then how it carries over to the actual FSR5 implementation who knows.

adroc_thurston · Saturday at 3:10 PM

MrMPFR said:
2022: Ada Lovelace ignores^

because it was sm89.

MrMPFR said:
2025: Blackwell consumer ignores ^^

it does have dsmem actually.

MrMPFR said:
NVIDIA (Kepler confirmed it's not on consumer)

It in on consumer, see CUDA cc12 feature compatability matrix.

MrMPFR said:
Cerny already said the point was larger tiles. I assume this is targeting mostly the CNN portion of FSR5 assuming it sticks with FSR4 Hybrid CNN+ViT design. Working on a larger tile is effectively a larger "context window" = improved fidelity + also less wasted tile border processing.
That's the idea. Then how it carries over to the actual FSR5 implementation who knows.

?
the point is that you get accelerated shmem transfers.
Without hammering the L2.

MrMPFR · Saturday at 3:51 PM

Thanks for the screenshot of the table. Really impressive they managed to cram all this new tech into GB206 die that's still smaller than AD106, Tons of low level optimizations and/or the silicon overhead is just minimal.

adroc_thurston said:
?
the point is that you get accelerated shmem transfers.
Without hammering the L2.

Yeah my explanation isn't great. Maybe someone else can explain it better.

Sure but you can't do the massive image processing tiles Cerny talked about without that. Just not feasible.

A related quote in case anyone is interested: "DSMEM enables more efficient data exchange between SMs, where data no longer must be written to and read from global memory to pass the data. The dedicated SM-to-SM network for clusters ensures fast, low latency access to remote DSMEM. Compared to using global memory, DSMEM accelerates data exchange between thread blocks by about 7x."
- From https://developer.nvidia.com/blog/nvidia-hopper-architecture-in-depth/

adroc_thurston · Saturday at 3:53 PM

MrMPFR said:
Really impressive they managed to cram all this new tech into GB206 die that's still smaller than AD106

They're minor stuff.

MrMPFR said:
Sure but you can't do the massive image processing tiles Cerny talked about without that

who cares what Cerny talks about, he's not in the driving seat.

Panino Manino · Saturday at 4:50 PM

Won't argue about Threat Interactive, but I have to say, it's hard for me to understand something.
Why? With the flow of time, even as more and more processing power and memory becomes available, why does game graphics seem to have more and graphical compromisses and compromisses?
It always seems that more and more and required to do the same things that were perfected before.

adroc_thurston · Saturday at 5:04 PM

Panino Manino said:
With the flow of time, even as more and more processing power and memory becomes available, why does game graphics seem to have more and graphical compromisses and compromisses?

?
Games look fancier than ever.

itsmydamnation · Saturday at 5:21 PM

Panino Manino said:
Won't argue about Threat Interactive, but I have to say, it's hard for me to understand something.
Why? With the flow of time, even as more and more processing power and memory becomes available, why does game graphics seem to have more and graphical compromisses and compromisses?
It always seems that more and more and required to do the same things that were perfected before.

i think the only real regression has been deferred rendering , hopefully at some point we can get back to some form of forward rendering that supports large number of light sources. Then we can have real AA again.

igor_kavinski · Saturday at 11:42 PM

Darkmont said:
game devs have the unenviable position of performance vs release deadlines

When they deliberately choose the wrong technology buzzwords (specifically Nvidia's), that's on them and I have zero sympathy for such devs.

soresu · Sunday at 8:33 AM

Bigos said:
And the fact modern games look bad is just me imagining things?

Look bad?

Doom Dark Ages looks great IMHO, a significant step up from Doom Eternal.

Though clearly good art direction and/or cinematography is a big part of it, as the later addition of path tracing makes little difference to the cut scene visual fidelity.

Merely having a great rendering engine isn't nearly so good as having a director who actually knows what they are doing with it.

I doubt that merely putting Doom Eternal assets in the id Tech 8 engine would be anywhere near as effective.

soresu · Sunday at 8:38 AM

itsmydamnation said:
i think the only real regression has been deferred rendering , hopefully at some point we can get back to some form of forward rendering that supports large number of light sources

There's a web page that deals with the notable points of this subject matter.

A Primer On Efficient Rendering Algorithms & Clustered Shading.

A website to showcase my work & personal projects

www.aortiz.me

vinifera · 2025-10-13T02:11:34-0400

More patents related to any future gfx architecture from AMD

https://patentscope.wipo.int/search/en/detail.jsf?docId=US464433976&_cid=P10-MGOQ3T-47495-1

https://patentscope.wipo.int/search/en/detail.jsf?docId=US464434024&_cid=P10-MGOQ3T-47495-1

https://patentscope.wipo.int/search/en/detail.jsf?docId=US464434039&_cid=P10-MGOQ3T-47495-2

https://patentscope.wipo.int/search/en/detail.jsf?docId=US464435335&_cid=P10-MGOQ3T-47495-2

https://patentscope.wipo.int/search/en/detail.jsf?docId=US464435300&_cid=P10-MGOQAD-54852-1

https://patentscope.wipo.int/search/en/detail.jsf?docId=WO2025207518&_cid=P10-MGOQCF-57288-1

MrMPFR · 2025-10-13T06:55:24-0400

Post was too long so pushed in depth reporting to this Google Docs. Unlike my AMD patent docs I promise this stay up forever:

An incomplete report on RDNA 5’s incredible RT advances

An incomplete report on RDNA 5’s incredible RT advances Selective reporting and analysis on the most important significant AMD RDNA 5 RT patents with publication prior to October 2025. Displaced Micro-Maps (DMMs) AMD uses a special method to replace the traditional three subdivision of one base ...

docs.google.com

Intro and Disclaimer (Skip if you like)

Last major post discussed RDNA 5's schedule and dispatch changes. This one selectively addresses RDNA 5's potential RT, and significantly expands info disclosed from my previous posts here and elsewhere as well as the stuff disclosed by reputable leakers such as Kepler_L2. The scope of proposed changes is massive which explains the length of the linked Docs. Probably takes 10-15 minutes to read it, but I've summarised the most important insights here for your convenience.

Analysis is patent derived so usual the usual caveats apply. Not confirmed yet but it's very likely all things considered and while he exact implementations may not mirror patents 1:1 the impacts should be roughly the same. But we still need confirmation from a reputable leaker like Kepler_L2 to be certain what is and isn't in RDNA 5.

I'm sorry Kepler for my old ignorant comments, and I've tried my best to address these in the docs and I'll link them below.

The Good Stuff:

I know I've talked about thes RDNA 5 RT changes before and even at length but after reading the patents again, properly this time, the changes now appear much more profound in effect and scope, easily enough warrant a summary:

Pre-Filtering Pipeline: Implements very wide parallel low-precision (Integer) intersection testers (pre-filtering) that mass cull triangles in DGF/prefiltering nodes and DMMs. These units have a tiny area overhead and low latency and as a result perf/area can be massively increased. They also reduce cachemem load due to many factors including less control circuitry, since Integers are much easier to process than floating points, and also a reduced precision scheme of almost half-precision integer math (~INT16 or more precisely Q+3) vs full-precision FP math used (FP32).
Integer dominates FP: By default INT tests all boxes/tris/primitives in a node. If one of these is inconclusive the one traditional FP test is used to confirm the results. In some instances when the increased fideliy isn't needed or can't be appreciated (too far off in the distance) FP tests are actually never required providing an even further speedup.
Very wide BVH: Since the INT pipeline has a tiny cachemem load and area cost per ray intersection, very wide and shallow BVHs can be used. BVH8-16 is discussed in patents, but maybe it will be even wider.
Versatile Pre-filtering: Pre-filtering can be used for ray intersection tests against all sorts of primitives, including linear swept spheres, quads, bounding boxes. Only limited by the HW used.
Benefits of DGF and DMM : DGF and DMMs both have lower cachemem overhead (footprint, BW and circuitry load). Both boost performance on their own without pre-filtering.
Always cache aligned: GFX13's RT pipeline strives to bundle geometry into cache aligned fixed size data structures whether DGF is used or not. It uses these to reduce memory transactions and load on cachemem system.
DGF Fallback method: When DGF hasn't been implemented by a dev for one asset a fallback method called pre-filtering nodes used. I call it DGF Lite since it only compresses vertices.
Less decode and data prefetching: With pre-filtering the decoding overhead from DGF and pre-filtering nodes can be reduced since less data also needs to be fetched and stored. Full prcision data is newer prefetched and only fetched when the pipeline requires a floating point test.
Novel DMM encoding: DMM encoding scheme replaces 64 subdivided triangles with 14 and can be evaluated with a single traversal step and BVH14 node. In contrast to previous methods reliant on three traversal steps using BVH4 and two traversal steps using BVH8.
Prism Volume HW: Dedicated Bounding Circuitry in RT cores construct prism volumes to accelerate DMM evaluation.
Math enabling prefiltering: Various precomputations enable low-precision tests at ray setup are run at BVH build instead of at runtime, which finally makes pre-filtering feasible.
Quantized OBBs: OBBs using platonic solids to quantize them which enables prefiltering of ray/box intersections using oriented boundin boxes (OBBs).
Goated CBLAS: DGF and Prefiltering nodes are basically made for a compacted CBLAS BVH architecture. With DMM on top this is even more insane as we can see up to ~16,400 times reduction in the number of leaf nodes compared to conventional method. Achiees more than an order of magnitude reduction in BVH footprint vs RTX Mega Geometry at iso-geometric complexity.
Less redundant math: Reduced Configurable inside/outside ray/edge test sharing directly benefits from DGF adoption, but can work with all kinds of geometry. Pre-filtering provides further speedup here.
The Holy Grail - Partial Ray Coherency Sorting: Ray coherency sorting for leaf nodes is achieved by coalescing rays against the same DGF/prefiltering node. Then the pipeline executes them all at once within a RT core before switching to the next node. Pipeline exploits spatial coherency to deliver unprecedented (except for PowerVR Photon) scheduling and data coherency, allowing for superior data locality and reuse. In order words a massive cachemem load reduction and speedup for RT leaf node evaluations.
DGF + DMM Ray Coherency Sorting: With DGF/Prefilter nodes + DMMs the scope of ray coherency sorting can be expanded to cover more of BVH and may be implemented an additional time at DMM base triangle to minimize load on the cachemem system even futher. It also eases the load on the Bounding Circuitry for Prism Volumes by avoiding duplicative builds.

Implications for Nextgen

With RDNA 5 it appears AMD has managed to design a novel and groundbreaking ray tracing pipeline. It's a monumental leap over RDNA 4's pipeline that easily qualifies as a clean slate. Note this conclusion was even derived from an incomplete analysis. There are many more public and likely soon to be published patents that will further expanding the scope of changes further solidifying this excellent architecture.

This shows AMD's architectural team is extremely talented. The changes are not about brute forcing the problem by mindlessly throwing more logic and cache at the problem; they are about redesigning the entire pipeline from scratch with ingenious optimisations derived from first principles thinking. The results are as expected: RT in RDNA 5 appears mighty impressive.

If we compare against the competition GFX13 RT is well ahead of 50 series in architectural sophistication; likely multiple gens of leapfrogging with NVIDIA's usual cadence. So unless Rubin is a massive leap as well AMD will easily have the architectural upper hand. But in the end this is just one side of the coin since area investment is equally important, so Rubin remains a joker. But if they loose in Ray tracing then they absolutely need to find a new thing to chase.

Addressing prev ignorant comments

MrMPFR said:
- Then lists the three patents related to an AMD DMM implementation as beyond current µArch, when DMM has been supported on RTX 40 series since 2022.
How is AMD's implementation beyond Ada's DMM decompression engine (Blackwell removed it)?

I'm sorry Kepler for not bothering to actually read the patent. The leapfrogging is obvious and significant.

MrMPFR said:
It's just old boring DGF. Really hoped for more in RDNA 5 even if it's still beyond Blackwell.

I'm taking that back when DGF is actually amazing, especially when you built an architecture around it. Cerny is 100% correct when he said that DGF enables flexible and efficient data structures. It keeps as much data wrapped into cache aligned fixed size packages.
I was just expecting additional changes related to data structures like overlay trees and delta instances in HW.

Edit: @vinifera found another patent that is actually related to the new geo encoding scheme beyond DMMs (see Docs).
Second Edit: Formatting, rephrasing for better reading experience, improved info in point list and summary + moved in-depth to Docs.

MrMPFR · 2025-10-13T09:43:18-0400

@vinifera you're free to steal this formatting an include in your post. Makes it easier to reference later. I'll delete when it's moved:

RESCHEDULING WORK ONTO PERSISTENT THREADS
- From <https://patentscope.wipo.int/search/en/detail.jsf?docId=US464433976>
SHADER CORE INDEPENDENT SORTING CIRCUIT
- From <https://patentscope.wipo.int/search/en/detail.jsf?docId=US464434024>
SYSTEMS AND METHODS FOR GRAPHICS PROCESSING UNITS WITH ENHANCED RESOURCE BARRIERS
- From <https://patentscope.wipo.int/search/en/detail.jsf?docId=US464434039>
IMPROVED QUALITY OF ANIMATION FOR COMPRESSED GEOMETRY
- From <https://patentscope.wipo.int/search/en/detail.jsf?docId=US464435335>
DEFERRED ANY HIT SHADER EXECUTION FOR REDUCED DIVERGENCE
- From <https://patentscope.wipo.int/search/en/detail.jsf?docId=US464435300>
HIGH-QUALITY SKINNED OBJECT ANIMATIONS FOR DISPLACED MICRO MESHES
- From <https://patentscope.wipo.int/search/en/detail.jsf?docId=US464101855>

END
---------------------------

Reporting and analysis

#1: Looks like deadlock and long latency stall mitigation that can make GPUs more versatile (i.e. supporting more application types). Introduces fine-grained context saves and restores on GPU down to wavefront level.
Might be related to this patent, that sounds a lot like Volta's Independent Thread Scheduling: https://patents.google.com/patent/US20250306946A1
- Guessing this is CDNA 5 related.

#2: A Shader Engine level payload sorting circuit coupled to the Work Graph Scheduler. Might also be implemented at CU level. It is a specific HW optimization for work graphs independent of compute units. It "...improves coherency recovery time by sorting payloads to be consumed by the same consumer compute unit(s) into the same bucket(s). The producer compute units are able to perform processing while the sorting operations are being performed by the sorting circuit in parallel."
While the main target is work graphs the technique "...applicable to other operations, such as raytracing or hit shading, and other objects, such as rays and material identifiers (IDs)." Complementary to the Streaming Wave Coalescer.
- Since they mention rays it's very possible that this unit is responsible for the ray coalescing against DGF nodes that I described earlier. Very likely a RDNA 5 patent. Chajdas is involved and once again this optimization is crucial for Work Graphs.

#3: This allows a ressource for a second task to be assessed in advance without interfering with first task. It's as follows: execute first task, then initiate second task, but pausing before accessing said ressource, and if ressource for second task is ready after completion of first task then it gets executed. Looks like this is implemented at the Shader Engine level. The patent states: "...sequential tasks can be executed more quickly and/or GPU resources can be utilized more fully and/or efficiently."
- Not sure about this one, but could end up in RDNA 5 or perhaps CDNA 5.

#4: A method of animated compressed geometry that's based on curved surface patches. This is related to the beyond DMM patent I discussed in prev post.
- Looks too novel to be in RDNA 5 + no HW blocks specified. Gruen is the sole originator.

#5: A method of deferring any hit shader execution until which makes it"...possible to group the execution of an any hit shader for multiple work-items together, thereby reducing divergence."
- This is a big deal, possibly even bigger than SER if they can make the any hit shader evaluation very coherent. NVIDIA said this at the launch of SER: "With increasingly complex renderer implementations, more workloads are becoming limited by shader execution rather than the tracing of rays." Until fairly recently I thought SER was for coalescing ray tracing operations. Yeah I know it's stupid.
- This patent has McAllister listed alongside many researchers. Has to be in RDNA 5 since not including it would be asinine.

#6: This looks like the technique behind the Animated DMM GPUOpen paper unveiled at Eurographics 2024 and shared by @basix.
- I don't see specific HW mentions of logic for the animated DMMs beyond basic DMM HW pipeline, but AMD needs this or a better approach because the paper stated that it on RDNA 3 has "...∼ 2.3−2.5× higher ray tracing costs than regular ray tracing of static DMMs." Gruen is the sole originator.

What can we expect?

#2 and #5 are most important and will almost certainly end up in RDNA 5 on top of what I previously discussed in my last comment. It strongly implies their GFX13 RT implementation is leapfrogging NVIDIA Blackwell by several gens, well at least in sophistication. AMD could decide to just gimp RT cores to save on die space, but overall it looks like AMD might turn the tables against NVIDIA in RT nextgen. Rubin is still a joker so anything could happen and we'll see.
If they loose NVIDIA will prob go: "RT is for console peasants, now here's a selection of generated AI games that can run on the new 6090 at 20 frames per second. We use DLSS and MFG to run it at 120 FPS xD." or "Now our tensor cores are so powerful that we can replace most of the ray tracing pipeline and it looks better."

Regardless not surprised AMD and Sony is openly talking about path tracing on future HW when the pipeline looks this capable. Hope they resist temptation offsetting architectural sophistication with less HW by of cutting it down because it's "good enough". It can be amazing it they let it shine.

adroc_thurston said:
hey that's normal, Intel GFX R&D guys got swallowed whole by AMD.

Think we're beginning to see the results of that in patent filings rn.

Looks like RDNA5 def won't be short of paradigm shifts and novel ideas.

soresu · 2025-10-13T13:17:47-0400

MrMPFR said:
Guessing this is CDNA 5 related

Going forward it would seem CDNA -> RDNA -> CDNA -> RDNA.

At least as far as the CU goes, so unless the patent specifically talks about matrix cores you can assume it's likely going to end up in RDNA too if it ends up in anything.

marees · 2025-10-13T13:30:37-0400

soresu said:
Going forward it would seem CDNA -> RDNA -> CDNA -> RDNA.

At least as far as the CU goes, so unless the patent specifically talks about matrix cores you can assume it's likely going to end up in RDNA too if it ends up in anything.

I am excited about the CCU patent but it looks like CDNA stuff for now. Atleast it doesn't seem to be in PS6

adroc_thurston · 2025-10-13T13:40:16-0400

marees said:
I am excited about the CCU patent

Forget about weird coprocessor models.

marees said:
Atleast it doesn't seem to be in PS6

Normal gfx IP rules.

soresu · 2025-10-13T20:22:47-0400

Did somebody else already post this new AMD publication?

Neural Visibility Cache for Real-Time Light Sampling

soresu · 2025-10-13T20:29:12-0400

MrMPFR said:
If they loose NVIDIA will prob go: "RT is for console peasants, now here's a selection of generated AI games that can run on the new 6090 at 20 frames per second. We use DLSS and MFG to run it at 120 FPS xD." or "Now our tensor cores are so powerful that we can replace most of the ray tracing pipeline and it looks better."

There's nothing probable about it.

They are already laying the foundation for such a pivot with all their neural rendering language in PR for RTX 50.

adroc_thurston · 2025-10-13T20:33:23-0400

soresu said:
They are already laying the foundation for such a pivot with all their neural rendering language in PR for RTX 50.

Everyone's gonna be doing that.
GPUs are pretty damn good at running GEMM and are really bad at doing RT.

tsamolotoff · 2025-10-14T06:15:16-0400

Bigos said:
And the fact modern games look bad is just me imagining things?

No, it's simply that some people don't really see all the noise and blur and praise TAA and its derivatives despite the fact that they destroy image clarity, small details and make your head spin with the ghosting in motion. This alone makes me not play the modern games if there is no way to disable temporal stuff. What is the point of fancy 'realistic' (tm) (C) lightning (which means fractions of ray samples per pixel temporally smeared and accumulated) if you can't see anything on the screen and your eyes start bleeding after one minute or so (Talos principle 2 is an egregious example of this)

tsamolotoff · 2025-10-14T06:17:27-0400

soresu said:
Doom Dark Ages looks great IMHO, a significant step up from Doom Eternal.

At a cost of literally 10x fps (and 5x if I enable raytracing in this scene), is it really worth it (for us users, it's obvious that full RT mode allowed ID to save lots of bux on development process)

Discussion RDNA 5 / UDNA (CDNA Next) speculation

Member

Member

Member

Diamond Member

Member

Diamond Member

Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Junior Member

Member

Intro and Disclaimer (Skip if you like)​

The Good Stuff:​

Implications for Nextgen​

Addressing prev ignorant comments​

Member

Reporting and analysis​

What can we expect?​

Diamond Member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Senior member

Intro and Disclaimer (Skip if you like)

The Good Stuff:

Implications for Nextgen

Addressing prev ignorant comments

Reporting and analysis

What can we expect?