Discussion RDNA 5 / UDNA (CDNA Next) speculation

stayfrosty · Sep 11, 2025

Win2012R2 said:
What exactly makes GDDR7 card (aka CPX solution) so much better for large context inference? This appears to be normal 202 die that will be relatively cheap with relatively large memory, so using it for inference rather than super expensive HBM versions makes sense.

Equivalent response from AMD will be this AT0 chip.

The larger the context the longer the prefill phase takes. Prefill needs lots of low-precision compute but barely any bandwidth, so the big fat expensive HBM is sitting idle during prefill.

I'm guessing AT0 is not optimized for low-precision like Rubin CPX. Rubin CPX has 6x the FP4 compute of a 5090 with a similar die size. this isn't just GR202.

jpiniero · Sep 11, 2025

stayfrosty said:
Rubin CPX has 6x the FP4 compute of a 5090 with a similar die size. this isn't just GR202.

That's probably what the die savings from the shrink is being spent on.

It does look like the Rubin 2 die.

adroc_thurston · Sep 11, 2025

stayfrosty said:
this isn't just GR202.

It is.

jpiniero said:
It does look like the Rubin 2 die.

It is one.

stayfrosty · Sep 11, 2025

adroc_thurston said:
It is.

what kind of magic sauce is nvidia putting into these chips to 6-8x FP4 Perf while barely increasing area and power

Nvidia advertises 4 PFLOPs for the PRO 6000 vs 30 PFLOPs for the CPX (both FP4 w sparsity)

adroc_thurston · Sep 11, 2025

stayfrosty said:
what kind of magic sauce is nvidia putting into these chips to 6-8x FP4 Perf while barely increasing area and power

It's called a die shrink.

Saylick · Sep 11, 2025

adroc_thurston said:
It's called a die shrink.

That and using the xtor budget for bigger systolic arrays. Ain’t rocket science.

ToTTenTranz · Sep 11, 2025

stayfrosty said:
So, what do you guys think AMD will do in response to Rubin CPX?

AT0 is a rasterization-able 512bit GDDR7 part, possibly close to reticle limit as well.

stayfrosty said:
Introduce their own low-precision optimized co-processors based on tweaked client chips like nvidia

RDNA5 probably supports FP8 and FP4 as well.

stayfrosty said:
Introduce MI500(?) flavours that replace part of (or the complete) HBM bus with more compute chiplets and a gddr7 PHY

The MiXXX series don't compete with rasterizers like this CPX.

adroc_thurston · Sep 11, 2025

stayfrosty said:
So, what do you guys think AMD will do in response to Rubin CPX

Nothing.

Saylick said:
That and using the xtor budget for bigger systolic arrays. Ain’t rocket science.

Duhhhhh.
Delightfully delicious 4bit multipliers. Neat and nice.

ToTTenTranz said:
AT0 is a rasterization-able 512bit GDDR7 part, possibly close to reticle limit as well.

It's not made for inference.
It's a cloud gaming part.

marees said:
RDNA is driven by Sony

It's driven by AMD.

Saylick · Sep 11, 2025

adroc_thurston said:
Duhhhhh.
Delightfully delicious 4bit multipliers. Neat and nice.

Yeah. It’s becoming more and more evident that consumer Nvidia GPUs are trickle down enterprise parts now. Consumer Rubin GB202 sounds like we’ll get 20-30% faster pure raster due to same SM count but higher clocks, likely a bigger boost to RT workloads via 2x ray/triangle intersection, and an even bigger boost to tensor performance. Oh, and invent some new version of an AI feature that requires tapping into a ton of FP4 which coincidentally Rubin has.

adroc_thurston · Sep 11, 2025

Saylick said:
It’s becoming more and more evident that consumer Nvidia GPUs are trickle down enterprise parts now.

Yeah, but at least some are purpose-build parts.
RDNA5 is an all-star lineup of handmedowns from other markets.

Saylick said:
likely a bigger boost to RT workloads via 2x ray/triangle intersection

See the problem is that we're not really limited by box/tri testing for RTRT.
Blackwell was barely an improvement RTRT-wise, and that's because making RTRT faster is hard without going for some, mildly unorthodox ways to build a GPU shader core.

ToTTenTranz · Sep 11, 2025

adroc_thurston said:
It's not made for inference.
It's a cloud gaming part.

It's made for.. whatever AMD feels like selling a year+ from now.

Saylick · Sep 11, 2025

adroc_thurston said:
Yeah, but at least some are purpose-build parts.
RDNA5 is an all-star lineup of handmedowns from other markets.

Understandably so. Nvidia are re-purposing server AI parts for consumers and AMD are re-purposing parts from other, higher volume markets for consumers. Makes sense in my mind.

adroc_thurston said:
See the problem is that we're not really limited by box/tri testing for RTRT.
Blackwell was barely an improvement RTRT-wise, and that's because making RTRT faster is hard without going for some, mildly unorthodox ways to build a GPU shader core.

Queue providing a new AI feature that leverages FP4 out the wazoo as a crutch for the lackluster RT performance uplift.

stayfrosty · Sep 11, 2025

adroc_thurston said:
Nothing.

Large context inference market is the most lucrative. Dunno if AMD can afford to just give that away.

Saylick · Sep 11, 2025

stayfrosty said:
Large context inference market is the most lucrative. Dunno if AMD can afford to just give that away.

They aren't "giving it away". They just don't have an equivalent product to the CPX. For large scale inference, they'll provide MI450.

Win2012R2 · Sep 11, 2025

Saylick said:
They just don't have an equivalent product to the CPX

Why not just use AT0? Maybe it's not equivalent but if it is profitable to sell it for $3k for that then it can work easily as long as same code runs on it as well as MI400

Very odd Nvidia announced it now - they are clearly worried about AMD's threat or perhaps trying to counter custom ASICs.

stayfrosty said:
I'm guessing AT0 is not optimized for low-precision like Rubin CPX.

Why would not it be? RDNA4 already got FP4 and it's use clearly going even bigger in next gen, it's pretty clear single direction of travel for "AI" stuff and looks like 4 bits is the floor for the time being, so can go all in on it just like Nvidia appears to do.

On a real plus side that means we'll see faster GDDR7 quicker, perhaps 32 gbits will be normal parts, so should be very nice jump for RDNA5.

And also it means AMD must push out AT0 as soon as possible, might be my first AMD card ever since I moved to Riva128...

Kepler_L2 · Sep 11, 2025

Win2012R2 said:
Why not just use AT0? Maybe it's not equivalent but if it is profitable to sell it for $3k for that then it can work easily as long as same code runs on it as well as MI400

Cuz AT0 is only ~7 PF or so, AMD is not doing the same matrix cores on gaming dGPUs and DC GPUs like NVIDIA is doing with Rubin.

adroc_thurston · Sep 11, 2025

ToTTenTranz said:
It's made for.. whatever AMD feels like selling a year+ from now.

cloud gaming?

stayfrosty said:
Large context inference market is the most lucrative.

Yeah man throw a MI450X at it.

Win2012R2 · Sep 11, 2025

Kepler_L2 said:
Cuz AT0 is only ~7 PF or so, AMD is not doing the same matrix cores on gaming dGPUs and DC GPUs like NVIDIA is doing with Rubin.

So that means AMD at least got good chance to catch up in perf and exceed in top end card for gaming?

adroc_thurston · Sep 11, 2025

Win2012R2 said:
So that means AMD at least got good chance to catch up in perf and exceed in top end card for gaming?

It means they're gonna ship a smaller die than NV.

Saylick · Sep 11, 2025

Win2012R2 said:
So that means AMD at least got good chance to catch up in perf and exceed in top end card for gaming?

Who knows where it will land in gaming performance. It won't touch GR202, that's for sure. Nvidia can afford to bring that die to consumers at reduced margin because it's a hand-me-down from the newly announced CPX line, essentially. AMD doesn't have that luxury.

adroc_thurston · Sep 11, 2025

Saylick said:
AMD doesn't have that luxury.

Yeah they do, it's a cloud gaming part.

branch_suggestion · Sep 11, 2025

Kepler_L2 said:
Cuz AT0 is only ~7 PF or so, AMD is not doing the same matrix cores on gaming dGPUs and DC GPUs like NVIDIA is doing with Rubin.

I can get to ~6PF with the same rates as Blackwell, I doubt RDNA5 will go wider than that.

branch_suggestion · Sep 11, 2025

Saylick said:
Who knows where it will land in gaming performance. It won't touch GR202, that's for sure. Nvidia can afford to bring that die to consumers at reduced margin because it's a hand-me-down from the newly announced CPX line, essentially. AMD doesn't have that luxury.

Whilst NV certainly should win with a 800mm^2ish die, I'm pretty sure AMD will have better gaming PPA across the stack with how much die area NV has put in systolic arrays and associated stuff like TMEM.
And keeping the number of shaders the same it seems, so both should be 24756 FP32.
If the SM is the same across the stack, I can say that nobody is making dGPUs for the main purpose of gaming anymore. Don't give me the neural rendering forward looking talk, you simply cannot fire up that much low precision compute without running into memory and power bottlenecks.
Or NV is trolling everyone.

AMD really needs to build the trvthnvke now more than ever.

Saylick · Sep 11, 2025

adroc_thurston said:
Yeah they do, it's a cloud gaming part.

And what's the volume and margin expected for cloud gaming? Probably laughable compared to Nvidia's AI volumes.

branch_suggestion · Sep 11, 2025

Saylick said:
And what's the volume and margin expected for cloud gaming? Probably laughable compared to Nvidia's AI volumes.

AT0 should ship over a million units over its lifetime just in the cloud, and for several thousand a pop so very good margins.
RDNA is for games, past, present and future.
CDNA is for GPGPU, whatever is needed to win, it shall be made.

Discussion RDNA 5 / UDNA (CDNA Next) speculation

Member

Lifer

Diamond Member

Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Member

Diamond Member

Golden Member

Golden Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Senior member

Diamond Member

Senior member