Discussion RDNA 5 / UDNA (CDNA Next) speculation

Z O X · Sep 13, 2025

Kepler_L2 said:
RT benefits from stuff that CPUs do like out-of-order execution and branch prediction.

Hmmm... OoO will certainly increase IPC (very excited to see it implemented in a GPU), but I'm not sure if it's crucial for RT.
From my limited understanding, BHV structure is "fixed" per frame and once the ray is cast, calculating intersections is not a random task.
Changes to BHV structure with better caches should bring more benefits for RT I suppose ...

Josh128 · Sep 13, 2025

soresu said:
Lol, at this point I'm pretty sure Lisa Su would rather he just acts as if AMD doesn't exist.

His endorsement certainly isn't going to do them favors now.

Lisa doesnt give one single shat about his politics or ramblings. Whether or not he can move the sales needle is ALL that matters.

Joe NYC · Sep 14, 2025

adroc_thurston said:
Yeah they do, it's a cloud gaming part.

How big is the cloud gaming market?

Vs. the inference market Nvidia is targeting. It seems to me cloud gaming is much smaller, by an order of magnitude or more.

soresu · Sep 14, 2025

Z O X said:
Hmmm... OoO will certainly increase IPC (very excited to see it implemented in a GPU), but I'm not sure if it's crucial for RT.

There are several different ways to do physically correct light transport and to augment the performance of rendering through techniques like ReSTIR, path guiding and such.

I'd be surprised if OoO compute had no significant impact on them.

jpiniero · Sep 14, 2025

Josh128 said:
Whether or not he can move the sales needle is ALL that matters.

Probally doesn't even care about that. All that matters if he could move the stock price.

Joe NYC · Sep 14, 2025

tsamolotoff said:
Maybe it is just me, but for me CPX isn't some sort of innovative new market grab rather than the admission that the VRAM bandwidth chase has become too expensive even for nVidia and its godzilla $10000+ offerings

If it is about ratio of: Compute / Memory Bandwidth

And the big invention of Rubin CPX is to cut (save on) memory bandwidth, there is another way to affect the Compute / Memory Bandwidth ratio.

Namely, increasing the compute. If FP4 is what's it all about, couldn't AMD release Mi400 version with chiplets that are FP4 optimized?

AMD is already planning 2 version of Mi400, HPC and AI oriented. So AMD already has experience in turning the knobs to add more FP64, less FP64.

So this would be just another version, using the same platform, that is all FP4. Or FP4+FP8

MrMPFR · Sep 14, 2025

Z O X said:
Hmmm... OoO will certainly increase IPC (very excited to see it implemented in a GPU), but I'm not sure if it's crucial for RT.
From my limited understanding, BHV structure is "fixed" per frame and once the ray is cast, calculating intersections is not a random task.
Changes to BHV structure with better caches should bring more benefits for RT I suppose ...

It has to be significant otherwise Kepler wouldn't have mentioned it alongside branch prediction. But an explanation would be appreciated.

CMIIW but isn't OoO execution and branch prediction a slippery slope towards MIMD and CPU style mega-cores, and isn't the entire point of GPUs SIMD parallelism?
Seems like it would be preferable to prioritize data locality and other methods for boosting SIMD occupancy vs brute forcing the issue with complex branch prediction and OoO execution.

A future GPU design could accomplish this by implementing the following (non-exhaustive):

SWC/Thread coherency sorting
Other forms of coherency sorting like ray coherency sorting as seen in the PowerVR Photon.
Memory and scheduling changes prioritizing data locality and fine-grained control

Nr3 requires GPU Work Graphs API to maximize performance and benefits.

But OoO scheduling on GPU could still happen at some point. Not the kind CPUs do but seems like there's a method going more than half of the way towards idealized implementation with a tiny area overhead of 0.007%. 6.9% median speedup, up to 36%, no slowdowns. 100X less area overhead than implementing MIMD on a GPU via FSMs (not the same thing, just for comparison). This obviously won't be AMD's or NVIDIA's exact implementations but GhOST is the first method without the usual drawbacks like slowdowns and large area overhead, so they could resemble it in some areas.
- GhOST OoO paper: https://ieeexplore.ieee.org/document/10609594
- MIMD execution on GPU patent (AMD) https://www.patents-review.com/a/20...s-enabling-mimd-like-execution-flow-simd.html
^IIRC Kepler said this was for CDNA:

It's less about the BVH structure and more about how ray traversal related scheduling, execution and memory accesses is handled on a GPU. Ideally you would want data contained within SM from start to finish, even with multiple bounces. Should lower power consumption, slash memory and scheduling latency and boost performance.
But BVH still needs fundamental changes and RTX Mega Geometry isn't enough.

gdansk · Sep 14, 2025

Joe NYC said:
How big is the cloud gaming market?

Vs. the inference market Nvidia is targeting. It seems to me cloud gaming is much smaller, by an order of magnitude or more.

Since they actually have a shot there it is bigger than 3% of the inference market they could win.

Joe NYC · Sep 14, 2025

gdansk said:
Since they actually have a shot there it is bigger than 3% of the inference market they could win.

3% of inference may be more than all of cloud gaming. And then, there is the other 97% of inference.

https://twitter.com/x/status/1967205573503697395

gdansk · Sep 14, 2025

Joe NYC said:
3% of inference may be more than all of cloud gaming. And then, there is the other 97% of inference.

https://twitter.com/x/status/1967205573503697395

You're looking at an entirely projected market.
Still there is no scenario where AMD has an FP4-light part seizing any portion of that market.
"Cloud" visualization has market projections too, I have posted them before. I find them all questionable but larger than 3% of inference AMD could get with maximum effort and tailor made inference parts. Mind you they won't use AT0 for that market in any case. That's the other half of their graphics business.

adroc_thurston · Sep 14, 2025

Joe NYC said:
How big is the cloud gaming market?

Healthy TAM.

Joe NYC said:
It seems to me cloud gaming is much smaller, by an order of magnitude or more.

It's a market that's not gonna burst like a bubble.

igor_kavinski · Sep 14, 2025

adroc_thurston said:
It's a market that's not gonna burst like a bubble.

Why are people stupid enough to play games over a laggy streaming connection and then on top of that, pay for it?

adroc_thurston · Sep 14, 2025

igor_kavinski said:
Why are people stupid enough to play games over a laggy streaming connection and then on top of that, pay for it?

I've no idea but it's a an ok growth market.
Also the connection is only laggy if you have like, thirdworld internet.

ToTTenTranz · Sep 14, 2025

adroc_thurston said:
Healthy TAM.

Wishful thinking.

adroc_thurston · Sep 14, 2025

ToTTenTranz said:
Wishful thinking.

It's a bigger market market than PC 'handhelds' for sure.
Which is why it gets a chip, and them things, don't. Simple as.

ToTTenTranz · Sep 14, 2025

adroc_thurston said:
It's a bigger market market than PC 'handhelds' for sure.
Which is why it gets a chip, and them things, don't. Simple as.

Why would there be a need for a dedicated chip? Decent handheld chips can simply come from ultrabooks: same battery capacity, same thermal dissipation. Medusa + AT4 is going into handhelds for sure, and I bet some models will even use AT3.

And by the way, the PS6 handheld is a thing, which is pretty much PC hardware.

adroc_thurston · Sep 14, 2025

ToTTenTranz said:
Why would there be a need for a dedicated chip?

They're tablets dawg.
They need discrete 10W parts.

ToTTenTranz said:
Medusa + AT4 is going into handhelds for sure, and I bet some models will even use AT3.

That's 30 and 85W.
Idk none of that works.

ToTTenTranz said:
And by the way, the PS6 handheld is a thing, which is pretty much PC hardware.

They're wholly bespoke in their ecosystem.
And yes, that one will get custom silicon.
Volumes! Actual real volumes.

ToTTenTranz · Sep 15, 2025

adroc_thurston said:
They're tablets dawg.

Tablets are 6mm thick with passive cooling.
The Xbox Ally X is 50mm thick with 2x fans.

adroc_thurston said:
They need discrete 10W parts.

They've used Phoenix, Strix Point and Rembrandt so far. They're not 10W.
Not even the Switch 2 is a 10W part. It pulls 30W at the wall in docked mode.

adroc_thurston said:
That's 30 and 85W.

That's Strix Point and Strix Halo. There are gaming handhelds with both.
Those 30 and 85W can either be able to scale down in clocks and power or they won't get a single design win.

adroc_thurston · Sep 15, 2025

ToTTenTranz said:
they won't get a single design win.

Idk stx1/krk1 have a ton.

ToTTenTranz · Sep 15, 2025

adroc_thurston said:
Idk stx1/krk1 have a ton.

Of course.
In part, because their cTDP range is 15 to 54W. Probably lower than that in the Z2E SKU.

marees · Sep 15, 2025

ToTTenTranz said:
Of course.
In part, because their cTDP range is 15 to 54W. Probably lower than that in the Z2E SKU.

The MDS2 successor in 2029/2030 (with RDNA 5 or RDNA 6) would be the bees knees

marees · Sep 15, 2025

adroc_thurston said:
It's a bigger market market than PC 'handhelds' for sure.
Which is why it gets a chip, and them things, don't. Simple as.

I prefer the nv GeForce now model where you have prepaid/pay per play (& even free credits). Unfortunately nv not available in my region despite a compliant Samsung TV

Looks like Microsoft is gradually getting there with more affordable subscription plans.

ToTTenTranz · Sep 15, 2025

marees said:
The MDS2 successor in 2029/2030 (with RDNA 5 or RDNA 6) would be the bees knees

5 years is a ton of time, though.

Kaluan · Sep 16, 2025

marees said:
The MDS2 successor in 2029/2030 (with RDNA 5 or RDNA 6) would be the bees knees

Sorry, my brain went on vacation there for a second, what is "MDS2"?

PS: If CU=WGP is indeed a thing with RDNA5/UDNA (all of the other substantial changes aside), I'm guessing that means a CU would also house 2x the "Ray Accelerator" units as before, right? Next to the 128SPs and whatever else they double or beef up per unit.

marees · Sep 16, 2025

Kaluan said:
Sorry, my brain went on vacation there for a second, what is "MDS2"?

PS: If CU=WGP is indeed a thing with RDNA5/UDNA (all of the other substantial changes aside), I'm guessing that means a CU would also house 2x the "Ray Accelerator" units as before, right? Next to the 128SPs and whatever else they double or beef up per unit.

MDS-1 = medusa point aka strix point successor with 8 CUs & 50? TOPS

MDS-2 = medusa point "little" a new low power variant probably on a new TSMC low power node with 4? CUs & 50? TOPS

MDS-3 = medusa point "baby" aka bumblebee — future Mendocino replacement in theory but TSMC 3nm so will continue to be priced a tier above Mendocino

Now imagine zen-7 monolithic APUs in 2029-2030 with RDNA 5.5??

they will not need 50 or 75 TOPS NPU as RDNA 5.5 by itself has that capability. So let's assume the zen 7 successors to MDS1, MDS2, & MDS3 all have igpus that can do minimum of 75 TOPS

So MDS-2 successor with zen 7 & RDNA 5.5 would be ideal for a monolithic handheld to replace steam deck & compete with switch 2 & PS6 handheld

Discussion RDNA 5 / UDNA (CDNA Next) speculation

Member

Golden Member

Diamond Member

Diamond Member

Lifer

Diamond Member

Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Lifer

Diamond Member

Senior member

Diamond Member

Senior member

Diamond Member

Senior member

Diamond Member

Senior member

Platinum Member

Platinum Member

Senior member

Senior member

Platinum Member