Discussion RDNA 5 / UDNA (CDNA Next) speculation

basix · Aug 31, 2025

adroc_thurston said:
They've been diddling xformers in purely incremental way for how long?
Lmao.

Yes cloud gaming. That's what AT0 is for.

You seem to just see what you want. A narrow mind has always a narrow view

adroc_thurston · Aug 31, 2025

Magras00 said:
Even if it goes beyond that still nothing new (M3 in 2023).

Apple has to source baby amounts of bandwidth outta their SRAM slab.

Magras00 said:
Yes they do. Read the patents, go beyond SOTA.

Patentware has no relation to any actual products.

Magras00 said:
Very novel and we might finally see AMD pioneer tech for once.

Pioneering is worthless in a vacuum.
AMD just build good products.

basix said:
You seem to just see what you want. A narrow mind has always a narrow view

Reality is often disappointing.
You'll learn to accept the limits of it soon enough.

MrMPFR · Aug 31, 2025

Kepler_L2 said:
Not directly but you can read between the lines
View attachment 129351

View attachment 129352
"supportsWGP" flag where you would expect a "isGFX1250Plus" and "has gfx1250 instructions" condition rather than checking the GPU generation directly (gfx* instruction flags carry over from one generation to the next, i.e gfx1250 has gfx9/10/11/12 instructions).

There's still a chance they did keep RDNA1-4's CU mode in RDNA5. Wasn't the entire point of the split CU mode in RDNA2 BWC for the consoles?

So by default GFX13 WGP as defined in LLVM is gone, but it can still split the shared L0+LDS in half resulting in two pseudo-CUs aligning with CU mode from RDNA1-4. For PS6 this would be needed for ideal PS4 BWC right?

MrMPFR · Aug 31, 2025

adroc_thurston said:
Apple has to source baby amounts of bandwidth outta their SRAM slab.

Patentware has no relation to any actual products.

Pioneering is worthless in a vacuum.
AMD just build good products.

Reality is often disappointing.
You'll learn to accept the limits of it soon enough.

Yeah but it's still better than the alternative (fixed caches and VRFs). This is obviously at iso-combined cache capacity. A worse but much bigger cache will always win.

~~Kepler_L2 listed many of those patents in a thread a month ago when making claims about GFX13. Already confirmed most things.~~ Misleading. Kepler newer said this is how it'll be 100%. Was a reply to a post about what beyond Blackwell could be so not a leak.

Good products (at launch) that aren't looking ahead will age poorly over time. Look at Kepler vs GCN1. Same thing with RDNA1 vs Turing.

adroc_thurston · Aug 31, 2025

Magras00 said:
Yeah but it's still better than the alternative (fixed caches and VRFs).

Nope, you're making a pile of tradeoffs.

Magras00 said:
Kepler_L2 listed many of those patents in a thread a month ago when making claims about GFX13. Already confirmed most things.

Patentware that adheres to your imaginary checkbox list has nothing to do with the product™ at large.
All you need to know is that gfx13 cachemem is different.

Magras00 said:
Good products (at launch) that aren't looking ahead will age poorly over time. Look at Kepler vs GCN1. Same thing with RDNA1 vs Turing.

GPUs don't live long enough to 'age'.
Client graphics upgrade cycles are really short (and used to be much shorter).
Embedded lands have hardware living for decades.

branch_suggestion · Aug 31, 2025

adroc_thurston said:
Yes cloud gaming. That's what AT0 is for.

Man I love ROI, also AT0 really will be a massive service density leap for Xbox Cloud.
Like you can have 8 AT0 cards in a 2S Venice box serving like 32 XSX streams or 16 Nextbox streams.
The existing blade is 4x XSX in a 2U blade, this will probably be a 5U aircooled rackmount.
Total GFN death. Wildcard is what Sony does really.

Every part has reuse outside of just being a dGPU, gotta credit Huynh here with such resourcefulness.

reaperrr3 · Aug 31, 2025

Magras00 said:
Good products (at launch) that aren't looking ahead will age poorly over time. Look at Kepler vs GCN1. Same thing with RDNA1 vs Turing.

What adroc said.

Timing is more important than being first in general.

ATI's first DX9 implementation (9700) was superior to Nvidia's (5800).
Did it help AMD?
Not nearly as much as it should have, because by the time it became relevant, Nvidia's 6k series came out.

ATI's first SM3.0 implementation (X1800 and X1900) later turned out to be superior to GF6k and 7k implementations in proper shader-heavy DX9.0c games.
Did it help them?
No, because by the time it became relevant, GF8k was out and demolished everything that came before in DX9.

AMD was far ahead on (async) compute until Turing.
Did it help them much in desktop?
All in all, no.

And outside some NV-sponsored implementations to sell the feature, RT didn't become truly relevant until what, 2024?
If N31 and N32 had gotten closer to their frequency, perf/W and maybe IPC targets (assuming they missed that a bit, too), they would've been good enough for their planned lifecycle.

Nevermind that the memory capacities of the Geforce 3070 and 3080 were definitely not very forward-looking, but they still flew off the shelves (in part because of mining, but still).

Supporting checkbox features early is mostly an additional marketing benefit when your architecture is otherwise better, too, because then it might help giving potential customers the final push.

But in terms of actual value?
The only GPU feature in recent memory that in my opinion really panned out well in that regard even for older architectures was DLSS.
Everything else only became relevant enough (or performant enough) 2-3 gens down the line.

adroc_thurston · Aug 31, 2025

branch_suggestion said:
Every part has reuse outside of just being a dGPU

hence why AT. thatsthejoke.png

branch_suggestion said:
gotta credit Huynh here with such resourcefulness.

A lot more people than him, really.

branch_suggestion · Aug 31, 2025

adroc_thurston said:
A lot more people than him, really.

Of course many people have been working hard towards this, and a few above had to be convinced.
He leads the BU so simplest guy to give credit to.
Also yes, it is the one kind of transformer I can get behind.

soresu · Sep 1, 2025

adroc_thurston said:
Client graphics upgrade cycles are really short (and used to be much shorter).

IMHO if not for nVidia's pivot to ray tracing they would be significantly longer still by now.

adroc_thurston · Sep 1, 2025

soresu said:
IMHO if not for nVidia's pivot to ray tracing they would be significantly longer still by now.

Nope, RTRT hasn't been the upgrade cycle driver so far, like at all.
It's all general performance creep plus VRAM limitations etc.

soresu · Sep 1, 2025

adroc_thurston said:
Nope, RTRT hasn't been the upgrade cycle driver so far, like at all.
It's all general performance creep plus VRAM limitations etc.

Not quite what I meant.

I meant that raster complexity increase in each new generation of games was if not plateauing then certainly decreasing significantly to the point that playable 4K was quite achievable for hi end GPUs of the pre Turing generation.

Without rt RT/PT to add a ginormous new compute burden to the mix the gaming GPU market was destined to get pretty stale as "good enough" a la the vast majority of smartphone users was achieved.

adroc_thurston · Sep 1, 2025

soresu said:
I meant that raster complexity increase in each new generation of games was if not plateauing then certainly decreasing significantly to the point that playable 4K was quite achievable for hi end GPUs of the pre Turing generation

Oh no, openworld bloat plus Nanite and friends promised the infinite future of GPU torture anyhow.

soresu said:
Without rt RT/PT to add a ginormous new compute burden to the mix the gaming GPU market was destined to get pretty stale as "good enough"

Again, not the case.
RTRT remained and remains a gimmick.

soresu · Sep 1, 2025

adroc_thurston said:
Oh no, openworld bloat

Are you talking about UE5 and Unigine supporting large (ie FP64) world coordinates for insanely big (potentially larger than Earth) world maps?

MrMPFR · Sep 1, 2025

adroc_thurston said:
Nope, you're making a pile of tradeoffs.

Patentware that adheres to your imaginary checkbox list has nothing to do with the product™ at large.
All you need to know is that gfx13 cachemem is different.

GPUs don't live long enough to 'age'.
Client graphics upgrade cycles are really short (and used to be much shorter).
Embedded lands have hardware living for decades.

Might be different in implementation but the overall idea and benefits would still be the same. As for nothing to do with final product we'll see who's right about that in 2027.

I would be extremely disappointed if unified L0+LDS is the only major change in GFX13. ~~Incredibly boring AMD catching up to NVIDIA Volta ~10 years later situation.~~ Misleading since AMD's cache hierarchy is way different, but the overall goal is the same: increased cache allocation flexibility. No more fixed caches in programming model for both cards if true (LLVM is as good as it gets so yeah true). More context under #1,201.

Depends on the user upgrade cadence, but for people playing the latest games in general yes.

reaperrr3 said:
What adroc said.

Timing is more important than being first in general.

ATI's first DX9 implementation (9700) was superior to Nvidia's (5800).
Did it help AMD?
Not nearly as much as it should have, because by the time it became relevant, Nvidia's 6k series came out.

ATI's first SM3.0 implementation (X1800 and X1900) later turned out to be superior to GF6k and 7k implementations in proper shader-heavy DX9.0c games.
Did it help them?
No, because by the time it became relevant, GF8k was out and demolished everything that came before in DX9.

AMD was far ahead on (async) compute until Turing.
Did it help them much in desktop?
All in all, no.

And outside some NV-sponsored implementations to sell the feature, RT didn't become truly relevant until what, 2024?
If N31 and N32 had gotten closer to their frequency, perf/W and maybe IPC targets (assuming they missed that a bit, too), they would've been good enough for their planned lifecycle.

Nevermind that the memory capacities of the Geforce 3070 and 3080 were definitely not very forward-looking, but they still flew off the shelves (in part because of mining, but still).

Supporting checkbox features early is mostly an additional marketing benefit when your architecture is otherwise better, too, because then it might help giving potential customers the final push.

But in terms of actual value?
The only GPU feature in recent memory that in my opinion really panned out well in that regard even for older architectures was DLSS.
Everything else only became relevant enough (or performant enough) 2-3 gens down the line.

I don't disagree with any of that (see the comment). When AMD has a bad µarch compared to NVIDIA of course nothing will fix that or compensate.
But people will never go AMD if they always see NVIDIA as the premium forward looking option. AMD needs to not only undercut NVIDIA but also appear forward looking. If you give NVIDIA consumers a reason to stick to NVIDIA they'll do it, even if AMD's card is better with perf/$. Equalizing playing field on features + undercutting them on price is the only way. It's good to see them finally taking this path with RDNA4 (laying groundwork) and beyond.

Let's not compare 2000s with the 2020s, wildly different markets and upgrade cadences.

VRAM argument very fair.

Sure but unlike past gens the feature will be sold on forward looking features whether you like it or not. Welcome to the Phone era of PC gaming. Post N3 PPA scaling is atrocious and will turn sharply into negative territory when adjusted for wafer prices. N3 is the final frontier for price competitive dGPU products.
Unless we Intel and Samsung actually gets their act together for once (unlikely) or we see a paradigm shift in manufacturing process or interconnect architecture (photonics) for chips what we get nextgen will be the best we can get in terms of perf/$ based on underlying BOM costs. SW and tricks will supplant raw power gains and get increasingly important as a distinguishing factor between AMD and NVIDIA. We've already seen that and it'll only get worse.

MrMPFR · Sep 1, 2025

Know this is old but might be important to the RDNA5 discussion

https://twitter.com/x/status/1925525698422112514

GFX12.5 (CDNA5) has 256 shaders/ALUs per CU.

If RDNA5 doubles INT over RDNA4 then they'll reach this number per WGP/CU or whatever AMD will call the new local compute block.
With Project Amethyst AMD has the same motivations as NVIDIA so this is very likely to happen, unless they want to fall behind.

RDNA 4 WGP = 4 x 32 FMA/INT + 32 FMA = 128 FP/INT cores
RDNA 5 [insert name] = 4 x 64 FMA/INT = 256 FP/INT cores

While CDNA5 might add dedicated datapaths for FP and INT similar to Hopper DC, doubt RDNA5 can afford such a thing. Area cost simply too big.

Win2012R2 · Sep 1, 2025

Makaveli said:
Price is very important.

US $1500-2000 has been proven to be acceptable price to at least a few millions ready to spent on top end consumer level GPU.

If next AMD top end GPU isn't faster than 5090 then it won't sell, I hope they don't use that stupid 16 pin connector.

511 · Sep 1, 2025

Win2012R2 said:
If next AMD top end GPU isn't faster than 5090 then it won't sell, I hope they don't use that stupid 16 pin connector.

The Fire Connector?

GodisanAtheist · Sep 1, 2025

511 said:
The Fire Connector?

-I frankly cannot think of a better way for Nvidia diehards to turn against the 16 pin connector than when their precious Halo card competitor uses it as well.

ToTTenTranz · Sep 1, 2025

Win2012R2 said:
I hope they don't use that stupid 16 pin connector.

They probably will because the AI Pro 9700 reference board already uses it, but as long as they demand an ATX3.1 PSU they're probably fine.

Truth be told, how else are they supposed to power a 500-600W TDP graphics card like AT0 probably is? With 4x 8pin connectors?

Either they try to bring a new connector into the market to compete with Nvidia's, or they're bound to follow. At least until something like Asus BTF comes up as a standard (which it should have, years ago).

adroc_thurston · Sep 1, 2025

Magras00 said:
Might be different in implementation but the overall idea and benefits would still be the same

Nope.

Magras00 said:
Incredibly boring AMD catching up to NVIDIA Volta ~10 years later situation.

Fermi had a unified L1$/shmem slab 15 years ago.
Are you really that new?

basix · Sep 1, 2025

branch_suggestion said:
Man I love ROI, also AT0 really will be a massive service density leap for Xbox Cloud.
Like you can have 8 AT0 cards in a 2S Venice box serving like 32 XSX streams or 16 Nextbox streams.

They could probably even aim at 64x XBSX streams with such a system:
- 8x AT0 cards with 184 CU at ~2.7 GHz deliver ~1000 TFLOPS => 82x XBSX
- 2S Venice could feature 2*256C = 512C => 64x XBSX

basix · Sep 1, 2025

adroc_thurston said:
Nope, RTRT hasn't been the upgrade cycle driver so far, like at all.

adroc_thurston said:
RTRT remained and remains a gimmick.

The push towards RTRT or especially HWRT will change with the next console cycle. Not because RTRT drives graphics forward, which RTRT definitely will, but because it will change production pipelines of games.
No more or at least vastly reduced baking, shortened development cycles. RTRT is mainly a game changer for developers. Not for gamers. But we as gamers might get better or, to be more precisely, more consistent quality. Because HWRT gets rid of the biggest illumination flaws of raster approximations.

PS5 etc. as last-gen consoles feature HWRT in a good enough fashion to make that shift. You can see it live happening if looking at UE5 development processes. SW-Lumen got frozen since UE 5.4 or 5.5, no further development.
MegaLights should scale from PS5, Switch 2 etc. up to PS6 and eventually Highend-PC. Replacing SW and HW Lumen as it is available today. Read the MegaLights presentation of Epic at Siggraph 2025: https://advances.realtimerendering.com/s2025/content/MegaLights_Stochastic_Direct_Lighting_2025.pdf
And MegaLights must scale that widely. Why? Because you do not want to design a game with hundreds of light sources and then scale down to a PS5 without MegaLights. That will not work.

HWRT together with a very scalable RTGI solution like MegaLights with virtually no limits on light source count will be the future. Starting with PS6 and Xbox-Next release. Main reason: Game development.
In that regard, RTXDI from Nvidia based on ReSTIR is conceptually the very same thing as MegaLights. Just geared towards the upper end of the quality spectrum (and HW requirements).

Regarding PC costumers, HWRT or RTRT will not really drive buying decisions too much, because the hardware is already here today. It is more about general performance of your card.
RDNA4 supports good HWRT and Nvidia cards since ages (if featuring enough VRAM). I expect the gap to be closing even further with RDNA5.
And when the next console cycle begins, not too many PC users will own a card with bad RT acceleration (Nvidias market share is too big to be otherwise).
RDNA2 and RDNA3 cards might struggle, I don't know. But if techniques like MegaLights run on a PS5, it should also run on a 6700XT at similar quality settings.

adroc_thurston · Sep 1, 2025

basix said:
The push towards RTRT or especially HWRT will change with the next console cycle. Not because RTRT drives graphics forward, which RTRT definitely will, but because it will change production pipelines of games.
No more or at least vastly reduced baking, shortened development cycles. RTRT is mainly a game changer for developers. Not for gamers. But we as gamers might get better or, to be more precisely, more consistent quality. Because HWRT gets rid of the biggest illumination flaws of raster approximations.

Wrong because we're gonna be doing even less RTRT than we do now, replacing it with ML approximations.

basix said:
HWRT together with a very scalable RTGI solution like MegaLights with virtually no limits on light source count will be the future. Starting with PS6 and Xbox-Next release. Main reason: Game development.
In that regard, RTXDI from Nvidia based on ReSTIR is conceptually the very same thing as MegaLights. Just geared towards the upper end of the quality spectrum (and HW requirements).

words words words completely disconnected from reality.
congrats.

MrMPFR · Sep 1, 2025

adroc_thurston said:
Nope.

Fermi had a unified L1$/shmem slab 15 years ago.
Are you really that new?

~~Guess we'll have to agree to disagree.~~ Fermi did, but Maxwell changed that by moving things around in the programming model. I misunderstood this.
Volta tuning guide mentions a underlying combined data cache and note this is not mentioned as a distinguishing feature. What Volta did is allowing greater flexibility for how that cache gets allocated towards different uses, for example the L1 can now take up the entire data cache + some changes to shared memory as well. So it's likely about a change to programming model made possible by making the underlying data cache more flexible. It's not about physically splitting and merging the underlying cache. Likely the same thing for AMD as well, but we can't know for sure. See #1,201 for documentation.

That's not what I'm talking about and that's nothing like the Volta and later implementation on NVIDIA side. Fermi has L1/shared memory, uniform cache, and texture cache. NVIDIA Volta merged texture cache and L1/shared memory into one big shared cache.

L0 in RDNA4 is literally texture cache (check the RDNA4 WGP diagram) while LDS is equivalent to L1/shared. GFX12.5 merges this aligning with Volta and later on NVIDIA side and as Kepler has established through LLVM GFX13 will do the same. Ignore all this, misleading. LDS is not like L1/shared and Pascal had a different setup. Architectures are so different that you can't compare. But the overarching goal is the same, to allow the data cache to align with a more flexible programming model. As a result the application can allocate cache in the most effective manner. Maxwell made the underlying datacache more efficient and Turing only extended that with the new unified cachemem design.

Discussion RDNA 5 / UDNA (CDNA Next) speculation

Senior member

Diamond Member

Member

Member

Diamond Member

Senior member

Member

Diamond Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Member

Member

Golden Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Senior member

Senior member

Diamond Member

Member