Discussion RDNA 5 / UDNA (CDNA Next) speculation

Page 49 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Magras00

Member
Aug 9, 2025
60
103
61
oh my god you're going by the NV marketing blurbs.
it's over

See no one is correcting you, so assume you're correct but I don't understand why. I'm very confused and trying to grasp this and let's just say the info online is not easy to find. Please entire read comment before commenting.

I still don't understand how the cache architecture can be unchanged since Fermi, when NVIDIA went out of their way to communicate changes to developers with Volta and no doubt also earlier changes like Maxwell's reworked SM cache programming model (IDK what else to call this). Here's an example (there's more) from the NVIDIA Volta tuning guide for SWEs (not marketing):
- The answer is hidden in plain sight, read the entire description for cache system in the Volta tuning guide on NVIDIA's website and connect the dots. Prev comments now reflect this.

"Like Pascal, Volta combines the functionality of the L1 and texture caches into a unified L1/Texture cache which acts as a coalescing buffer for memory accesses, gathering up the data requested by the threads of a warp prior to delivery of that data to the warp.

Volta increases the maximum capacity of the L1 cache to 128 KB, more than 7x larger than the GP100 L1. Another benefit of its union with shared memory, the Volta L1 improves in terms of both latency and bandwidth compared to Pascal."


Link: https://docs.nvidia.com/cuda/volta-tuning-guide/index.html#unified-shared-memory-l1-texture-cache

Are you saying that NVIDIA is communicating BS in this tuning guide or is there something here that explains this discrepancy between the misleading SM diagram (marketing blurb) and the true HW implementation, which if I understand you correctly is unchanged since Fermi? Why would NVIDIA change the cache programming model for Maxwell and Volta, when the HW is unchanged since Fermi. It makes no sense. Again what am I missing here? Does both companies have one big slab with different seperate datapaths or is it something else entirely?

If none of the info NVIDIA is providing online is correct, then I assume AMD's WGP diagram by that extension is also extremely misleading. If so how can we uncover the true architecture on the HW level devoid of any abstractions and simplications (marketing BS)? I couldn't find any publicly available info about this online and what that even encompasses.
By extension without intricate knowledge of the HW change of the L0+LDS merger for CDNA5 is meaningless (can't be understood) and not something you can understand by just looking at L0+LDS (implies shared). I mean there has to be a HW level change because this functionality is for GFX12.5+ only and thus cannot be exposed via the HW we have rn.

If the following is inaccurate then please tell me how it could be roughly compared AMD vs NVIDIA? There has to be a way to compare architectures cache systems right?

Do we agree that the AMD L0 vector cache is roughly equivalent to NVIDIAs texture cache (misleading name since GPGPU) and that the L1 instruction cache in Maxwel-Pascal is roughly equivalent to the functionality of the Scalar and instruction cache in RDNA1-4?
For AMD that only leaves LDS and for NVIDIA L1 and shared memory, pre Turing. So what's comparable here LDS and shared memory or LDS and Shared memory +L1 or is it just impossible to compare this and/or the other SM/CU cache systems for NVIDIA and AMD?

Also I still don't understand why NVIDIA would remove the L1 instruction cache from the SM diagram when it's there. They don't do this for DC didn't do it pre-Turing, yet every single client GPU SM description since Turing has no L1-i cache. I've also tried to search specifically for L1 instruction cache Turing and L1-i Turing and zero hits. It makes no sense that this a simple case of lying by omission. Please explain this discrepancy.

In case @adroc_thurston doesn't want to address my questions is there anyone else that want to give it a go?
There's no need. I think I figured it out myself.
 
Last edited:
  • Like
Reactions: marees

ToTTenTranz

Senior member
Feb 4, 2021
576
1,009
136
I had a Guillemot Maxi Gamer Phoenix Voodoo Banshee, the PCI version. There was an AGP version as well.

The sheer amount of options and constant newcomers made the 1995-2004 GPU market mighty fun to keep up with. At one point we'd go to a computer store and had either graphics cards or northbridges with iGPUs from intel, 3dfx, ATi, Nvidia, Matrox, S3, PowerVR, 3DLabs/Creative and SiS/XGI. And these were only the ones making graphics cards for the DIY market.

Early 2000's days of the internet would have us get news about recent and upcoming GPU models almost every two weeks. Nowadays we get like a new, actually relevant leak every 3 months, a mild refresh every 18 months by AMD and Nvidia and an actually new architecture release every 2.5 years.


I guess the closest thing we have now is all the companies doing their own AI accelerators, but we're not seeing any of those in stores because most of them are B2B.
 
  • Like
Reactions: Magras00 and soresu

DaaQ

Golden Member
Dec 8, 2018
1,947
1,394
136
The GPU term was only coined by Nvidia after it introduced the first GeForce 256 with Transform and Lighting acceleration, but that was only in 99.
The first consumer 3D accelerators came from 3dfx and PowerVR in 96 or something. Those you still had to connect to a 2D graphics card like the super popular S3 Virge because they didn't have that function.

Back when I got my first graphics card, the Voodoo 2 in 98, there were two OEMs selling voodoo cards in my country: Diamond and Creative.
My first PC had a Pentium 266 MMX (instruction set) and a Matrox Millennium graphics accelerator. I was to say is was 128 maybe 256? All I know is it introduced me to Diablo and ran great.
May even have had a 56k modem. unless there was an in between of the 28k baud one. 36 maybe?
 

maddogmcgee

Senior member
Apr 20, 2015
410
421
136
I have not really been following this thread. Do we have any idea if UDNA is actually going to be better for gaming than RDNA. From memory, last time AMD tried a unified architecture they ended up with power hungry cards like the 290x. Recently got the itch to upgrade and almost bought a 9070xt last night.....
 

soresu

Diamond Member
Dec 19, 2014
4,011
3,460
136
It's "unified" in the sense that CDNA is picking up some RDNA features and ISA (they're both supposed to be gfx13 right?)
From what someone told me a while back AI/ML ops aside the ISA in RDNA is still mostly similar to GCN despite the significant µArch changes?
 
  • Like
Reactions: Magras00

ToTTenTranz

Senior member
Feb 4, 2021
576
1,009
136
You need to stop smoking MLID. It'll rot your mind

Did the people at Ubisoft, Nvidia and InWorld AI also smoked MLID a couple years ago when they built a team to prototype game experiences with LLM-enhanced NPCs, which they demoed at GTC 2024?



Was Nvidia high on MLID when they created a full set of tools exactly for that?


Was this developer high on this week's MLID rumors when they started making a game all about LLM-enhanced NPCs, due out this year?


Was Microsoft high on MLID in 2023 when they announced DirectML for DirectX 12 GPUs to run Llama 2 7B?



If so, then yes I guess I am high on MLID. 🤷‍♂️


So apparently, on August 27 Meta also launched their LLM-enhanced NPC tool for their Meta Horizon game creation platform.


Surely because they were high on MLID's video from August 28, and not because developers are planning to have LLM NPCs in their videogames at all.


Just look at it not working.
 

ToTTenTranz

Senior member
Feb 4, 2021
576
1,009
136
It's almost as if they intentionally omitted any meaningful show of substance regarding the dialogue interactions this generates.

I get that you were expecting Oscar for Best Actor monologue performance for the first ever iteration of a high-level tool of this kind, but that's just not how things work.

Next gen consoles are only releasing ~2 years from now and games with this could only be releasing 4 years from now. There's plenty of time for things to improve.


In the meantime you can check out Microsoft's latest text-to-speech 1.5B model that they made open source under MIT license and already processes emotion speech:



Also, from Meta's article:

Later this year we’ll also be adding functionality that enables your characters to feel even more authentic by leveraging AI to trigger in-world actions, dynamically converse with real players and more.

So yes, they're already bringing agentic AI to the game. E.g. you ask a character, in natural language, for them to go to a place and at the right time they'll be there. The following natural step to this is to implement incremental changes in the world's state.
 
Last edited:
  • Like
Reactions: Magras00 and marees

Hail The Brain Slug

Diamond Member
Oct 10, 2005
3,849
3,228
146
I get that you were expecting Oscar for Best Actor monologue performance for the first ever iteration of a high-level tool of this kind, but that's just not how things work.

Next gen consoles are only releasing ~2 years from now and games with this could only be releasing 4 years from now. There's plenty of time for things to improve.


In the meantime you can check out Microsoft's latest text-to-speech 1.5B model that they made open source under MIT license and already processes emotion speech:

Explain to me how the following showing of generated dialogue is, in any way, a useful demonstration of "it working".

Note that these are all shown one-sided, with no player input or prompting to show it working interactively. There's also zero context given for what the NPC is saying or why.

"..."
"Yeah, man. What time"
"Whoa,"
"You're in!"
"The party starts at Mars' sunset, which is around 20:47 Martial Local Time."

They spent 70% of the video showing setting it up and 10% of the video speedrunning through some meaningless dialogue bits without actually demonstrating it working in a live environment.
 

ToTTenTranz

Senior member
Feb 4, 2021
576
1,009
136
Explain to me how the following showing of generated dialogue is, in any way, a useful demonstration of "it working".

Note that these are all shown one-sided, with no player input or prompting to show it working interactively. There's also zero context given for what the NPC is saying or why.

"..."
"Yeah, man. What time"
"Whoa,"
"You're in!"
"The party starts at Mars' sunset, which is around 20:47 Martial Local Time."

They spent 70% of the video showing setting it up and 10% of the video speedrunning through some meaningless dialogue bits without actually demonstrating it working in a live environment.

Whoa it's almost like they're announcing and promoting a tool, not a game that uses the tool.
 

Hail The Brain Slug

Diamond Member
Oct 10, 2005
3,849
3,228
146
Whoa it's almost like they're announcing and promoting a tool, not a game that uses the tool.
Either it does what they claim and it's demonstrable or it doesn't. This doesn't demonstrate it doing what they claim it can do, therefore it must not.

As a developer this does nothing to grow my interest in it. I've seen too many promises never materialize. Either you prove it works or your product is vaporware.
 
  • Like
Reactions: marees

ToTTenTranz

Senior member
Feb 4, 2021
576
1,009
136
Either it does what they claim and it's demonstrable or it doesn't. This doesn't demonstrate it doing what they claim it can do, therefore it must not.

As a developer this does nothing to grow my interest in it. I've seen too many promises never materialize. Either you prove it works or your product is vaporware.


Why would you, "as a developer" not believe you can integrate a LLM into a NPC's dialogue? What exactly is the technological barrier you envision here?
Have you not used chatgpt, grok or gemini ever before?
 
  • Like
Reactions: Magras00

ToTTenTranz

Senior member
Feb 4, 2021
576
1,009
136
A better question is: have you?

I'm currently leading a project for implementing agentic AI with LLMs and RAG embeddings at my company, using open source libraries and models to run on local hardware, yes.

Which is why I'm asking where exactly do you see a technical or technological barrier in using LLMs for NPC dialogue.
 

Hail The Brain Slug

Diamond Member
Oct 10, 2005
3,849
3,228
146
I'm currently leading a project for implementing agentic AI with LLMs and RAG embeddings at my company, using open source libraries and models to run on local hardware, yes.

Which is why I'm asking where exactly do you see a technical or technological barrier in using LLMs for NPC dialogue.
This isn't even on topic for the thread, and you're really gunning to bait. The video you posted does not demonstrate it working to do what is advertised. The end.
 

Magras00

Member
Aug 9, 2025
60
103
61
I have not really been following this thread. Do we have any idea if UDNA is actually going to be better for gaming than RDNA. From memory, last time AMD tried a unified architecture they ended up with power hungry cards like the 290x. Recently got the itch to upgrade and almost bought a 9070xt last night.....
Yes of course. Kepler already talked about this at length in another forum. Said AMD is changing everything with RDNA5/[insert name], that it's the largest redesign since GCN, and that the HW feature set eclipses NVIDIA Blackwell.
Largest since GCN will have massive implications. Remember that RDNA at its core still has a lot of GCN baggage. In other words it wasn't a near total clean slate µarch like GCN was vs Terascale. Every gen retires more and more from GCN and RDNA4 especially did a lot of things (refer to C&C's RDNA4 LLVM article).
Still wouldn't get too hyped up for massive Vega -> RDNA IPC gains in raster and suspect the changes will be more geared towards specific domains like RTRT, ML, and GPU Work Graphs acceleration and in general just be forward looking, but we'll see.

290X wasn't a very good architecture (required too much mem BW) requiring a 512bit bus vs NVIDIA's 384 bit, it had nothing to do with unification it was just to many mem PHYs (512 bit GDDR is ludicrous) not a hyperoptimized gaming architecture like Maxwell or a very power efficient compute design like NVIDIA Turing.
Also 290X power draw is nothing special, just terrible jet engine stock cooler. 290W 290X vs 304W 9070XT. But NVIDIA was at or below 250W IIRC and it stayed that way until 30 series.

AMD isn't on life support anymore and can afford proper design teams and go all out if they want. Look at what they even managed with RDNA in 2019 before Zen money printer went brrr, although MS and Sony paid most of that IIRC.
 

ToTTenTranz

Senior member
Feb 4, 2021
576
1,009
136
This isn't even on topic for the thread, and you're really gunning to bait. The video you posted does not demonstrate it working to do what is advertised. The end.
First you complained about "show of substance", then you accused the demo of not showing any player interaction (which is a lie because the video shows the player talking to the NPC at the 30s mark), then you failed to say why you thought the shown interaction was staged.

If you had no idea what you were talking about and simply kept changing the goalposts until running away, why even enter the consersation at all?
Do you even know why this was being discussed in this thread? It's about VRAM demands for running smaller LLMs locally and how that will affect GPUs like the RDNA5 family releasing 2 years from now. Yes, it's on topic.



BTW, NPCs using LLMs for dialogue are already in commercially available videogames. I already posted them in this thread.
Meta is simply releasing a tool to accelerate its implementation. There's a text field for people to write in a system prompt for the LLM (i.e. character's backstory and personality), the text-to-speech voices available and that's it.





Kepler already talked about this at length in another forum. Said AMD is changing everything with RDNA5/[insert name], that it's the largest redesign since GCN, and that the HW feature set eclipses NVIDIA Blackwell.
Largest since GCN will have massive implications.
My fear with this is RDNA5 needing a complete driver overhaul that will massively hinder the GPUs' performance at launch, resulting yet again in those "Fine Wine" months / years where AMD loses all the marketshare because release-date performance is reallistically all that matters as most GPU reviews happen in those months.
 
Last edited:
  • Like
Reactions: Magras00

soresu

Diamond Member
Dec 19, 2014
4,011
3,460
136
My fear with this is RDNA5 needing a complete driver overhaul that will massively hinder the GPUs' performance at launch
For Soney at least the considerations of retaining some level of backward compatibility with PS4 and PS5 generations means it will never go quite that far I think.
 
  • Like
Reactions: Magras00

marees

Golden Member
Apr 28, 2024
1,539
2,133
96
AT can be transformer because RDNA chiplets can transform & attach to APUs etc.

Orion & Canis can also be interpreted as constellations

To the west of Orion and standing upright is his perennially faithful hunting companion, Canis Major, the Great Dog. Sirius, the brightest star in the sky as seen from both hemispheres, marks the head of the dog and the star Wezen marks his hind quarters.​

 
  • Like
Reactions: Josh128

Hail The Brain Slug

Diamond Member
Oct 10, 2005
3,849
3,228
146
First you complained about "show of substance", then you accused the demo of not showing any player interaction (which is a lie because the video shows the player talking to the NPC at the 30s mark), then you failed to say why you thought the shown interaction was staged.

If you had no idea what you were talking about and simply kept changing the goalposts until running away, why even enter the consersation at all?
Do you even know why this was being discussed in this thread? It's about VRAM demands for running smaller LLMs locally and how that will affect GPUs like the RDNA5 family releasing 2 years from now. Yes, it's on topic.



BTW, NPCs using LLMs for dialogue are already in commercially available videogames. I already posted them in this thread.
Meta is simply releasing a tool to accelerate its implementation. There's a text field for people to write in a system prompt for the LLM (i.e. character's backstory and personality), the text-to-speech voices available and that's it.






My fear with this is RDNA5 needing a complete driver overhaul that will massively hinder the GPUs' performance at launch, resulting yet again in those "Fine Wine" months / years where AMD loses all the marketshare because release-date performance is reallistically all that matters as most GPU reviews happen in those months.
1756917354332.png