Discussion RDNA 5 / UDNA (CDNA Next) speculation

adroc_thurston · Thursday at 2:49 AM

igor_kavinski said:
That just shows me that Nvidia has more to lose and AMD more to gain when the tides turn.

the what turn

igor_kavinski · Thursday at 2:54 AM

adroc_thurston said:
the what turn

Something other than the GPU excels at AI workloads at lower power.

adroc_thurston · Thursday at 2:55 AM

igor_kavinski said:
Something other than the GPU excels at AI workloads at lower power.

That's a net loss for both AMD and NV.
They're the only ones selling GPUs.

igor_kavinski · Thursday at 3:03 AM

adroc_thurston said:
That's a net loss for both AMD and NV.

Not if it comes out of AMD

adroc_thurston · Thursday at 3:04 AM

igor_kavinski said:
Not if it comes out of AMD

AMD has a GPU roadmap and that's it.

igor_kavinski · Thursday at 3:06 AM

adroc_thurston said:
AMD has a GPU roadmap and that's it.

They have Xilinx too which is not a GPU.

adroc_thurston · Thursday at 3:10 AM

igor_kavinski said:
They have Xilinx too which is not a GPU.

GPU guys won.

Win2012R2 · Thursday at 6:14 AM

ToTTenTranz said:
Not if you go with Samsung's foundries.

If their "N2" was any good they would never drop prices so much at time of max "AI" demand.

lucasworais · Thursday at 6:26 AM

Microsoft has changed all the Xbox Game Pass plans and they all include cloud gaming now which appears to finally be out of beta.
They also raised the prices by 50%+ (99% in my country). Gotta pay for all those AT0s.

marees · Thursday at 7:32 AM

lucasworais said:
Microsoft has changed all the Xbox Game Pass plans and they all include cloud gaming now which appears to finally be out of beta.
They also raised the prices by 50%+ (99% in my country). Gotta pay for all those AT0s.

& the ultimate plan has a higher quality cloud gaming

What I would prefer is a cheap as chips (& pay as you play ) cloud gaming only plan (no live or Xbox or pc) with options to stack individual games — at discounted rates

MrMPFR · Thursday at 2:41 PM

Seeking some clarification for info + updating and providing new info potentially relevant for RDNA 5. Hope y'all don't mind.

Kepler_L2 said:
Yeah but gfx13 does the real cocaine stuff like register renaming.

Is this patent related to the gfx13 implementation?

US20230315536A1 - Dynamic register renaming in hardware to reduce bank conflicts in parallel processor architectures - Google Patents

To reduce inter- and intra-instruction register bank access conflicts in parallel processors, a processing system includes a remapping circuit to dynamically remap virtual registers to physical registers of a parallel processor during execution of a wavefront. The remapping circuit remaps...

patents.google.com

For OoO execution SWC was less clear but here there can be no doubts. Why would AMD do something this insane if it weren't to do actual OoO execution on a GPU. Just hope it doesn't suffer from the same issues as LOOG, but perhaps AMD can reign it in through the compiler.
Comparison between the LOOG and the lightweight SOTA GhOST OoO scheduler available here: https://liberty.cs.princeton.edu/Publications/isca24_ghost.pdf

Kepler_L2 said:
Not the case with RDNA4 anymore, they support round-robin scheduling now

Meanwhile here's one AMD patent dissing round-robin for load balancing: https://patents.google.com/patent/US11941723B2

Maybe this was the original low effort plan for RDNA 5 or possibly what was to be used in N4C (cancelled gargantuan RDNA 4 chiplet GPU).
Possibly also these as well two for N4C:
#1 https://patents.google.com/patent/US12361628B2
#2 https://patents.google.com/patent/US11755336B2

Kepler_L2 said:
MI450X doesn't have any texture/geometry/RT engine stuff, it also doesn't have any of the gfx13 goodies like SWC/WGS/DGF.

All that was before the AMD R&D heavy weights Matthäus G. Chajdas, Michael J. Mantor, and Christopher J. Brennan got involved and proposed the CP decoupled scheduling and dispatch with WGS+ADC:
#1 https://patents.google.com/patent/US20240111574A1
#2 https://patents.google.com/patent/US12153957B2
#3 https://patents.google.com/patent/US20240111575A1

Kepler_L2 said:
More like they improved SE-level scaling with some trickery: https://patentscope.wipo.int/search/en/detail.jsf?docId=WO2025144455&_cid=P21-MCPWUB-70374-1

Not surprisingly the local launcher patent first spotted by Kepler lists him as well alongside the other heavy weights:
Here's the Google Patents version: https://patents.google.com/patent/US20250217195A1

Completely instrumental for accelerating workgraphs prime candidate code (branchy):
"...local launch mechanism provides an order-of-magnitude improvement to thread launch performance, allowing finer-grained dispatches, local consumption of data within a compute unit, and much improved performance in highly variable workloads. For example, the local launch mechanism improves the performance of application program interfaces that utilize work graphs by allowing a workgroup processor to self-schedule work without needing to submit a request to a work scheduling mechanism such as a command processor."

Here's another patent where he's listed that rewrites the pipeline for fixed-function units, unconfirmed for now:

WO2024129349A1 - Inclusion of dedicated accelerators in graph nodes - Google Patents

Systems, apparatuses, and methods for implementing a hierarchical scheduling in fixed-function graphics pipeline are disclosed. In various implementations, a processor includes a pipeline comprising a plurality of fixed-function units and a scheduler. The scheduler is configured to schedule a...

patents.google.com

Might be going out on a limb but especially patents that list Matthäus G. Chajdas are very likely to inform finalized HW implementations going forward. Part of DGF and GPU Work Graphs SW teams, multiple patents already implemented etc... Keep an eye out on his work and patents for sure.

WGS+ADC+Work stealing (load balancing) and local launchers in RDNA 5 alone is enough to warrant gfx13 a clean slate designation. RDNA tackled CU level scheduling and dispatch and dispatch while it looks like RDNA 5 will adress the entire stack from CP to SIMD unit in a radical fashion. Remember that this is just one aspect of a GPU architecture. Imagine this kind of clean slate approach or at the bare minimum major changes across the entire GPU architecture as Kepler alluded to here:

"Well yeah they are changing everything, gfx13 is the biggest architectural overhaul since GCN."

The most crazy thing is that as Kepler aluded to this redesign is just one aspect of a GPU, and while obviously I can't say this is how it'll be for sure, the patents suggest AMD is tackling every single aspect of the GPU. Indicates AMD is building a power efficient and data locality focused GPU architecture heavily reliant on adaptive and feedback driven mechanisms that are is extremely fine-grained and flexible while also implementing clever and novel techniques to maximize processing and data efficiency. RDNA 5 isn't a repeat of RDNA, that tried to overcome GCNs limitations by align itself with Maxwell µarch derivatives and it isn't a repeat of GCN, that while forward looking in multiple areas was unfortunately still limited by its circumstances of being mostly funded by Sony and MS during the comatose AMD era. It's AMD finally tackling nearly every single aspect of GPU design in a manner generally forward looking, novel and well beyond NVIDIA's current gen.

The shared design and IP pipeline moving forward means R&D spent on consumer and DC overlaps, Many of these optimizations also carry over to DC which makes the monetary incentives much stronger. Besides the gaming specific optimizations (no overlap with DC) I doubt RDNA 5 is primarely funded by MS and Sony as these changes are too big to be primarely for nextgen consoles and gaming GPU family. Would wager that the combined R&D funding for ~~CDNA 5~~ edit. oops CDNA 6 and RDNA 5 massively eclipses anything previous even when accounting for number of implementations (dies) and node related design cost explosion.

With such a big redesign comes enourmous risk of HW bugs and flaws so I really hope AMD finally has the neccesary money and manpower to avoid any major issues and unused features similar to Vega's DSBR, RDNA 1 HW bugs, and the RDNA 3 chiplet bugs. So AMD take your time with this one.

poke01 · Thursday at 3:05 PM

MrMPFR said:
Seeking some clarification for info + updating and providing new info potentially relevant for RDNA 5. Hope y'all don't mind.

Is this patent related to the gfx13 implementation?

US20230315536A1 - Dynamic register renaming in hardware to reduce bank conflicts in parallel processor architectures - Google Patents

To reduce inter- and intra-instruction register bank access conflicts in parallel processors, a processing system includes a remapping circuit to dynamically remap virtual registers to physical registers of a parallel processor during execution of a wavefront. The remapping circuit remaps...

patents.google.com

For OoO execution SWC was less clear but here there can be no doubts. Why would AMD do something this insane if it weren't to do actual OoO execution on a GPU. Just hope it doesn't suffer from the same issues as LOOG, but perhaps AMD can reign it in through the compiler.
Comparison between the LOOG and the lightweight SOTA GhOST OoO scheduler available here: https://liberty.cs.princeton.edu/Publications/isca24_ghost.pdf

Meanwhile here's one AMD patent dissing round-robin for load balancing: https://patents.google.com/patent/US11941723B2

Maybe this was the original low effort plan for RDNA 5 or possibly what was to be used in N4C (cancelled gargantuan RDNA 4 chiplet GPU).
Possibly also these as well two for N4C:
#1 https://patents.google.com/patent/US12361628B2
#2 https://patents.google.com/patent/US11755336B2

All that was before the AMD R&D heavy weights Matthäus G. Chajdas, Michael J. Mantor, and Christopher J. Brennan got involved and proposed the CP decoupled scheduling and dispatch with WGS+ADC:
#1 https://patents.google.com/patent/US20240111574A1
#2 https://patents.google.com/patent/US12153957B2
#3 https://patents.google.com/patent/US20240111575A1

Not surprisingly the local launcher patent first spotted by Kepler lists him as well alongside the other heavy weights:
Here's the Google Patents version: https://patents.google.com/patent/US20250217195A1

Completely instrumental for accelerating workgraphs prime candidate code (branchy):
"...local launch mechanism provides an order-of-magnitude improvement to thread launch performance, allowing finer-grained dispatches, local consumption of data within a compute unit, and much improved performance in highly variable workloads. For example, the local launch mechanism improves the performance of application program interfaces that utilize work graphs by allowing a workgroup processor to self-schedule work without needing to submit a request to a work scheduling mechanism such as a command processor."

Here's another patent where he's listed that rewrites the pipeline for fixed-function units, unconfirmed for now:

WO2024129349A1 - Inclusion of dedicated accelerators in graph nodes - Google Patents

Systems, apparatuses, and methods for implementing a hierarchical scheduling in fixed-function graphics pipeline are disclosed. In various implementations, a processor includes a pipeline comprising a plurality of fixed-function units and a scheduler. The scheduler is configured to schedule a...

patents.google.com

Might be going out on a limb but especially patents that list Matthäus G. Chajdas are very likely to inform finalized HW implementations going forward. Part of DGF and GPU Work Graphs SW teams, multiple patents already implemented etc... Keep an eye out on his work and patents for sure.

WGS+ADC+Work stealing (load balancing) and local launchers in RDNA 5 alone is enough to warrant gfx13 a clean slate designation. RDNA tackled CU level scheduling and dispatch and dispatch while it looks like RDNA 5 will adress the entire stack from CP to SIMD unit in a radical fashion. Remember that this is just one aspect of a GPU architecture. Imagine this kind of clean slate approach or at the bare minimum major changes across the entire GPU architecture as Kepler alluded to here:

"Well yeah they are changing everything, gfx13 is the biggest architectural overhaul since GCN."

The most crazy thing is that as Kepler aluded to this redesign is just one aspect of a GPU, and while obviously I can't say this is how it'll be for sure, the patents suggest AMD is tackling every single aspect of the GPU. Indicates AMD is building a power efficient and data locality focused GPU architecture heavily reliant on adaptive and feedback driven mechanisms that are is extremely fine-grained and flexible while also implementing clever and novel techniques to maximize processing and data efficiency. RDNA 5 isn't a repeat of RDNA, that tried to overcome GCNs limitations by align itself with Maxwell µarch derivatives and it isn't a repeat of GCN, that while forward looking in multiple areas was unfortunately still limited by its circumstances of being mostly funded by Sony and MS during the comatose AMD era. It's AMD finally tackling nearly every single aspect of GPU design in a manner generally forward looking, novel and well beyond NVIDIA's current gen.

The shared design and IP pipeline moving forward means R&D spent on consumer and DC overlaps, Many of these optimizations also carry over to DC which makes the monetary incentives much stronger. Besides the gaming specific optimizations (no overlap with DC) I doubt RDNA 5 is primarely funded by MS and Sony as these changes are too big to be primarely for nextgen consoles and gaming GPU family. Would wager that the combined R&D funding for CDNA 5 and RDNA 5 massively eclipses anything previous even when accounting for number of implementations (dies) and node related design cost explosion.

With such a big redesign comes enourmous risk of HW bugs and flaws so I really hope AMD finally has the neccesary money and manpower to avoid any major issues and unused features similar to Vega's DSBR, RDNA 1 HW bugs, and the RDNA 3 chiplet bugs. So AMD take your time with this one.

Crazy work. Thanks for the detailed explanation.

MrMPFR · Thursday at 3:29 PM

poke01 said:
Crazy work. Thanks for the detailed explanation.

Yw but I should probably stop looking torturing myself by compiling AMD GPU patent filings. AMD files a crazy amount of patents. Almost 1200 patents glanced with publication date from EoY 2022-now. 150 saved for RDNA 5, later RDNA or never used. I've given a few dozen a closer look. Every new patent I read makes the wait increasingly unbearable. Can't wait ~2 years xD

Kepler_L2 said:
Mid to late 2027

A combined launch with NG consoles, Zen 7 (to counter Intel), CDNA AND RDNA 5 around Black Friday-Holidays 2027 would be insane, but we'll see.

MrMPFR · Thursday at 3:46 PM

@marees Someone on Reddit found this additional useful info about DGF. It's from JoeMan in Disgus section below the Videocardz article on Animated DGF. Explains it much better than I ever could:

https://videocardz.com/newz/amd-details-dense-geometry-format-dgf-with-hardware-acceleration-support-for-upcoming-rdna5-gpus

"So let’s clear this up, because an incredible amount of nonsense is being said about it.

NVIDIA DMM and AMD DGF are fundamentally different approaches to the same problem. DMM works by taking the original surface and then radically reducing its detail, so in the end it consists of fewer triangles, while storing the surface information in the form of a displacement map. This way, the BVH acceleration structure can remain simple, since only a fraction of the surface’s real detail needs to be taken into account for the actual calculations. DGF, on the other hand, is a scalable, lossy compression method for meshlets, and unlike DMM, it actually represents the geometry instead of only reproducing it through a displacement map. The result is similar, and the BVH acceleration structure can remain relatively simple. Since both methods are lossy compression techniques, there will be some quality degradation, but the benefits gained from compression are significantly greater than the loss in quality.

The principle is therefore similar, but the advantages and disadvantages lie in different places. DMM compresses much more effectively, so in theory the gain is greater, but because of the required surface preprocessing it imposes a significant overhead on the CPU, and it is also not compatible with the actual geometry used in today’s games, since the content has to be tailored specifically for DMM. DGF compresses less effectively, so in theory the gain is smaller, but it is compatible with all kinds of geometry, and it does not impose any significant overhead on the CPU either.

Because DMM proved so impractical that no developer was willing to adopt it, NVIDIA decided to discontinue the technology in the RTX 50 series, meaning it is unlikely to ever see use in practice.

Since DMM is practically unusable, NVIDIA introduced Mega Geometry as its replacement, which primarily works by clustering triangles rather than manipulating the surface itself. This addresses DMM’s compatibility issues and imposes relatively low additional overhead on the CPU, but it does not perform actual geometric compression, meaning its memory requirements are extremely high compared to both DMM and DGF.

A simple comparison of the situation:
DMM: very limited surface compatibility, high CPU overhead, extremely low memory usage
Mega Geometry: full surface compatibility, moderate CPU overhead, very high memory usage
DGF: full surface compatibility, low CPU overhead, low memory usage"

DMM well ahead of DGF in compression factor when it works, but limited usability, which explains deprecation.
DMM memory usage superior to RTX Mega Geometry reliant on raw uncompressed triangle clustering.

"One more point: while DMM and DGF target similar problem areas and therefore cannot really be used alongside each other, DGF and Mega Geometry are not direct replacements and these can actually complement each other, as they approach the situation differently. A Mega Geometry-like solution can works on DGF surfaces, so it is likely that both will eventually be utilized together, as they work very well side by side. It is very easy to implement a Mega Geometry-like solution in hardware, while DGF can be emulated via compute shaders, which is important, because a GDF-like decompression engine can be extremely complex in hardware, so it will take a while for Intel and NVIDIA to implement it."

DMM and RTX Mega Geometry are complementary. DGF + RTX MG = Goated RT

marees · Thursday at 5:08 PM

MrMPFR said:
It is very easy to implement a Mega Geometry-like solution in hardware, while DGF can be emulated via compute shaders, which is important, because a GDF-like decompression engine can be extremely complex in hardware, so it will take a while for Intel and NVIDIA to implement it."

Can this DGF work be outsourced to CCU ?

MrMPFR · Thursday at 5:37 PM

marees said:
Can this DGF work be outsourced to CCU ?

In theory yes but it depends on the overall HW implementation: what is in CCU and CU.

Kepler_L2 · Thursday at 6:56 PM

marees said:
Can this DGF work be outsourced to CCU ?

That patent is not in RDNA5

vinifera · Thursday at 8:58 PM

MrMPFR said:
@marees Someone on Reddit found this additional useful info about DGF. It's from JoeMan in Disgus section below the Videocardz article on Animated DGF. Explains it much better than I ever could:

https://videocardz.com/newz/amd-details-dense-geometry-format-dgf-with-hardware-acceleration-support-for-upcoming-rdna5-gpus

"So let’s clear this up, because an incredible amount of nonsense is being said about it.

NVIDIA DMM and AMD DGF are fundamentally different approaches to the same problem. DMM works by taking the original surface and then radically reducing its detail, so in the end it consists of fewer triangles, while storing the surface information in the form of a displacement map. This way, the BVH acceleration structure can remain simple, since only a fraction of the surface’s real detail needs to be taken into account for the actual calculations. DGF, on the other hand, is a scalable, lossy compression method for meshlets, and unlike DMM, it actually represents the geometry instead of only reproducing it through a displacement map. The result is similar, and the BVH acceleration structure can remain relatively simple. Since both methods are lossy compression techniques, there will be some quality degradation, but the benefits gained from compression are significantly greater than the loss in quality.

The principle is therefore similar, but the advantages and disadvantages lie in different places. DMM compresses much more effectively, so in theory the gain is greater, but because of the required surface preprocessing it imposes a significant overhead on the CPU, and it is also not compatible with the actual geometry used in today’s games, since the content has to be tailored specifically for DMM. DGF compresses less effectively, so in theory the gain is smaller, but it is compatible with all kinds of geometry, and it does not impose any significant overhead on the CPU either.

Because DMM proved so impractical that no developer was willing to adopt it, NVIDIA decided to discontinue the technology in the RTX 50 series, meaning it is unlikely to ever see use in practice.

Since DMM is practically unusable, NVIDIA introduced Mega Geometry as its replacement, which primarily works by clustering triangles rather than manipulating the surface itself. This addresses DMM’s compatibility issues and imposes relatively low additional overhead on the CPU, but it does not perform actual geometric compression, meaning its memory requirements are extremely high compared to both DMM and DGF.

A simple comparison of the situation:
DMM: very limited surface compatibility, high CPU overhead, extremely low memory usage
Mega Geometry: full surface compatibility, moderate CPU overhead, very high memory usage
DGF: full surface compatibility, low CPU overhead, low memory usage"

DMM well ahead of DGF in compression factor when it works, but limited usability, which explains deprecation.
DMM memory usage superior to RTX Mega Geometry reliant on raw uncompressed triangle clustering.

"One more point: while DMM and DGF target similar problem areas and therefore cannot really be used alongside each other, DGF and Mega Geometry are not direct replacements and these can actually complement each other, as they approach the situation differently. A Mega Geometry-like solution can works on DGF surfaces, so it is likely that both will eventually be utilized together, as they work very well side by side. It is very easy to implement a Mega Geometry-like solution in hardware, while DGF can be emulated via compute shaders, which is important, because a GDF-like decompression engine can be extremely complex in hardware, so it will take a while for Intel and NVIDIA to implement it."

DMM and RTX Mega Geometry are complementary. DGF + RTX MG = Goated RT

Related to DGF/DMM, at 20:25, One of the AMD RT Fellow Architects is asked that can DGF and DMM be combined and he basically says yes but it will require architecture work to do so and the goal is to have a DGF base layer with DMM encoded on top of the DGF base layer.

branch_suggestion · Thursday at 10:54 PM

MrMPFR said:
Zen 7 (to counter Intel)

Intel has to counter Zen6 first, NVL ain't it.

poke01 · Friday at 12:06 AM

branch_suggestion said:
Intel has to counter Zen6 first, NVL ain't it.

Umm in terms of gaming it has to beat Zen5X3D too. Forget Zen6 lol

marees · Friday at 6:28 AM

Round-up of a few news stories that could impact future RDNA 5 (2027 to 2029)

1. Memory prices to stay high for next decade — https://www.club386.com/nand-shorta...e-of-ai-data-centre-demand-claims-phison-ceo/

2. Xbox raises gamepass prices just as it adds the most games ever. Adds cloud gaming to all console tiers. (Nadella was earlier head of Azure) https://insider-gaming.com/microsoft-tries-to-defend-xbox-game-pass-price-increase/

3. Strix Halo (zen 5 + RDNA 3.75) scales linearly from 25watts to 45 watts — raising hope that medusa halo (zen 6 + AT3) will scale well from 15 watts itself

https://twitter.com/x/status/1973750063626068304

4. Current Leaked RDNA 5 line-up

1. Series S / handheld / z3 extreme = AT4 in medusa premium

2. Halo tablets/NUC/All-in-one/AI box = AT3 in medusa halo

3. Xbox = AT2 in magnus

4. scrapped? = AT1 (6080 competitor)

5. Xcloud = AT0

SolidQ · Friday at 8:04 AM

marees said:
Xcloud = AT0

There is Gaming AT0, with lower CU's

marees · Friday at 11:38 AM

marees said:
Round-up of a few news stories that could impact future RDNA 5 (2027 to 2029)

1. Memory prices to stay high for next decade — https://www.club386.com/nand-shorta...e-of-ai-data-centre-demand-claims-phison-ceo/

2. Xbox raises gamepass prices just as it adds the most games ever. Adds cloud gaming to all console tiers. (Nadella was earlier head of Azure) https://insider-gaming.com/microsoft-tries-to-defend-xbox-game-pass-price-increase/

3. Strix Halo (zen 5 + RDNA 3.75) scales linearly from 25watts to 45 watts — raising hope that medusa halo (zen 6 + AT3) will scale well from 15 watts itself
https://twitter.com/x/status/1973750063626068304

4. Current Leaked RDNA 5 line-up

1. Series S / handheld / z3 extreme = AT4 in medusa premium
2. Halo tablets/NUC/All-in-one/AI box = AT3 in medusa halo
3. Xbox = AT2 in magnus
4. scrapped? = AT1 (6080 competitor)
5. Xcloud = AT0

Huge AT0 news

Free xcloud tier (ala GeForce Now)

https://twitter.com/x/status/1974132006838841848

adroc_thurston · Friday at 11:45 AM

marees said:
Strix Halo (zen 5 + RDNA 3.75) scales linearly from 25watts to 45 watts — raising hope that medusa halo (zen 6 + AT3) will scale well from 15 watts itself

Idk what did you snort, but mdsH is an 85W part.

marees · Friday at 12:34 PM

adroc_thurston said:
Idk what did you snort, but mdsH is an 85W part.

Buy 2 extra battery packs with the 2028 version of gpd win 5 😉

Discussion RDNA 5 / UDNA (CDNA Next) speculation

Diamond Member

Lifer

Diamond Member

Lifer

Diamond Member

Lifer

Diamond Member

Golden Member

Member

Golden Member

Member

Diamond Member

Member

Member

Golden Member

Member

Senior member

Junior Member

Senior member

Diamond Member

Golden Member

Golden Member

Golden Member

Diamond Member

Golden Member