Discussion RDNA 5 / UDNA (CDNA Next) speculation

Page 53 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

stayfrosty

Member
Apr 4, 2024
26
56
51
What exactly makes GDDR7 card (aka CPX solution) so much better for large context inference? This appears to be normal 202 die that will be relatively cheap with relatively large memory, so using it for inference rather than super expensive HBM versions makes sense.

Equivalent response from AMD will be this AT0 chip.
The larger the context the longer the prefill phase takes. Prefill needs lots of low-precision compute but barely any bandwidth, so the big fat expensive HBM is sitting idle during prefill.

I'm guessing AT0 is not optimized for low-precision like Rubin CPX. Rubin CPX has 6x the FP4 compute of a 5090 with a similar die size. this isn't just GR202.
 

ToTTenTranz

Senior member
Feb 4, 2021
686
1,147
136
So, what do you guys think AMD will do in response to Rubin CPX?
AT0 is a rasterization-able 512bit GDDR7 part, possibly close to reticle limit as well.


  1. Introduce their own low-precision optimized co-processors based on tweaked client chips like nvidia
RDNA5 probably supports FP8 and FP4 as well.


  1. Introduce MI500(?) flavours that replace part of (or the complete) HBM bus with more compute chiplets and a gddr7 PHY
The MiXXX series don't compete with rasterizers like this CPX.
 

adroc_thurston

Diamond Member
Jul 2, 2023
7,082
9,835
106
So, what do you guys think AMD will do in response to Rubin CPX
Nothing.
That and using the xtor budget for bigger systolic arrays. Ain’t rocket science.
Duhhhhh.
Delightfully delicious 4bit multipliers. Neat and nice.
AT0 is a rasterization-able 512bit GDDR7 part, possibly close to reticle limit as well.
It's not made for inference.
It's a cloud gaming part.
RDNA is driven by Sony
It's driven by AMD.
 

Saylick

Diamond Member
Sep 10, 2012
4,035
9,455
136
Duhhhhh.
Delightfully delicious 4bit multipliers. Neat and nice.
Yeah. It’s becoming more and more evident that consumer Nvidia GPUs are trickle down enterprise parts now. Consumer Rubin GB202 sounds like we’ll get 20-30% faster pure raster due to same SM count but higher clocks, likely a bigger boost to RT workloads via 2x ray/triangle intersection, and an even bigger boost to tensor performance. Oh, and invent some new version of an AI feature that requires tapping into a ton of FP4 which coincidentally Rubin has.
 

adroc_thurston

Diamond Member
Jul 2, 2023
7,082
9,835
106
It’s becoming more and more evident that consumer Nvidia GPUs are trickle down enterprise parts now.
Yeah, but at least some are purpose-build parts.
RDNA5 is an all-star lineup of handmedowns from other markets.
likely a bigger boost to RT workloads via 2x ray/triangle intersection
See the problem is that we're not really limited by box/tri testing for RTRT.
Blackwell was barely an improvement RTRT-wise, and that's because making RTRT faster is hard without going for some, mildly unorthodox ways to build a GPU shader core.
 
  • Like
Reactions: Tlh97

Saylick

Diamond Member
Sep 10, 2012
4,035
9,455
136
Yeah, but at least some are purpose-build parts.
RDNA5 is an all-star lineup of handmedowns from other markets.
Understandably so. Nvidia are re-purposing server AI parts for consumers and AMD are re-purposing parts from other, higher volume markets for consumers. Makes sense in my mind.
See the problem is that we're not really limited by box/tri testing for RTRT.
Blackwell was barely an improvement RTRT-wise, and that's because making RTRT faster is hard without going for some, mildly unorthodox ways to build a GPU shader core.
Queue providing a new AI feature that leverages FP4 out the wazoo as a crutch for the lackluster RT performance uplift.
 

Saylick

Diamond Member
Sep 10, 2012
4,035
9,455
136
Large context inference market is the most lucrative. Dunno if AMD can afford to just give that away.
They aren't "giving it away". They just don't have an equivalent product to the CPX. For large scale inference, they'll provide MI450.
 

Win2012R2

Golden Member
Dec 5, 2024
1,208
1,245
96
They just don't have an equivalent product to the CPX
Why not just use AT0? Maybe it's not equivalent but if it is profitable to sell it for $3k for that then it can work easily as long as same code runs on it as well as MI400

Very odd Nvidia announced it now - they are clearly worried about AMD's threat or perhaps trying to counter custom ASICs.
I'm guessing AT0 is not optimized for low-precision like Rubin CPX.
Why would not it be? RDNA4 already got FP4 and it's use clearly going even bigger in next gen, it's pretty clear single direction of travel for "AI" stuff and looks like 4 bits is the floor for the time being, so can go all in on it just like Nvidia appears to do.

On a real plus side that means we'll see faster GDDR7 quicker, perhaps 32 gbits will be normal parts, so should be very nice jump for RDNA5.

And also it means AMD must push out AT0 as soon as possible, might be my first AMD card ever since I moved to Riva128...
 
Last edited:
  • Like
Reactions: Joe NYC and Tlh97

Win2012R2

Golden Member
Dec 5, 2024
1,208
1,245
96
Cuz AT0 is only ~7 PF or so, AMD is not doing the same matrix cores on gaming dGPUs and DC GPUs like NVIDIA is doing with Rubin.
So that means AMD at least got good chance to catch up in perf and exceed in top end card for gaming?
 

Saylick

Diamond Member
Sep 10, 2012
4,035
9,455
136
So that means AMD at least got good chance to catch up in perf and exceed in top end card for gaming?
Who knows where it will land in gaming performance. It won't touch GR202, that's for sure. Nvidia can afford to bring that die to consumers at reduced margin because it's a hand-me-down from the newly announced CPX line, essentially. AMD doesn't have that luxury.
 
  • Like
Reactions: Tlh97

branch_suggestion

Senior member
Aug 4, 2023
826
1,804
106
Who knows where it will land in gaming performance. It won't touch GR202, that's for sure. Nvidia can afford to bring that die to consumers at reduced margin because it's a hand-me-down from the newly announced CPX line, essentially. AMD doesn't have that luxury.
Whilst NV certainly should win with a 800mm^2ish die, I'm pretty sure AMD will have better gaming PPA across the stack with how much die area NV has put in systolic arrays and associated stuff like TMEM.
And keeping the number of shaders the same it seems, so both should be 24756 FP32.
If the SM is the same across the stack, I can say that nobody is making dGPUs for the main purpose of gaming anymore. Don't give me the neural rendering forward looking talk, you simply cannot fire up that much low precision compute without running into memory and power bottlenecks.
Or NV is trolling everyone.

AMD really needs to build the trvthnvke now more than ever.
 
  • Like
Reactions: Tlh97

branch_suggestion

Senior member
Aug 4, 2023
826
1,804
106
And what's the volume and margin expected for cloud gaming? Probably laughable compared to Nvidia's AI volumes.
AT0 should ship over a million units over its lifetime just in the cloud, and for several thousand a pop so very good margins.
RDNA is for games, past, present and future.
CDNA is for GPGPU, whatever is needed to win, it shall be made.