Discussion RDNA 5 / UDNA (CDNA Next) speculation

poke01 · Jul 19, 2025

Win2012R2 said:
They need to double it again on per CU level, so more with more CUs. Nvidia got sloppy with Blackwell, but this is unlikely to continue, AMD must catch up proper next time, ideally be better.

Nvidia always strikes back hard and unexpectedly

gdansk · Jul 19, 2025

poke01 said:
Nvidia always strikes back hard and unexpectedly

If they *always* strike back hard and also you expect it how is it unexpectedly?
🤔
Really what was unexpected is nearly complete lack of perf per watt improvement of Blackwell.

poke01 · Jul 19, 2025

gdansk said:
If they *always* strike back hard and also you expect it how is it unexpectedly?

Because Nvidia is like a wasp make them mad they will sting you just when you least expect it. Biggest ego of a company nowadays

poke01 · Jul 19, 2025

gdansk said:
Really what was unexpected is nearly complete lack of perf per watt improvement of Blackwell.

Regressed in some ways too

GodisanAtheist · Jul 19, 2025

gdansk said:
If they *always* strike back hard and also you expect it how is it unexpectedly?
🤔
Really what was unexpected is nearly complete lack of perf per watt improvement of Blackwell.

-They always strike back but their method of striking back is unexpected, was my takeaway.

Compete on performance and price across the stack?

Maybe they'll swallow a lot of margin to compete on price.

Or maybe NV will compete by leveraging their market dominance and strongarming AIBs into spinning off crappy b-tier product lines for you.

Or maybe they'll strong arm devs into "hidden tesselation" or "extreme RT" or whatever software feature gives them an edge.

Or maybe it will be a marketing subterfuge campaign where suddenly a bunch of old accounts with 1 post each wake up and begin complaining about crappy drivers.

Or...

Makaveli · Jul 19, 2025

branch_suggestion said:
35% faster than the 7900XTX is a 4090.

The 4090 is 20-25% faster than a the XTX so this would be faster but still like 15-20% slower than a 5090.

however if the price is $1000 less than a 5090 its a win.

branch_suggestion · Jul 19, 2025

Makaveli said:
The 4090 is 20-25% faster than a the XTX

In raster, still hopefully RDNA5 makes everything a moot point.
TPU has it 31% ahead which is why I added a bit of safety margin.

Makaveli · Jul 19, 2025

branch_suggestion said:
In raster, still hopefully RDNA5 makes everything a moot point.
TPU has it 31% ahead which is why I added a bit of safety margin.

My percentage was also based on TPU data but from a while back now when I double check it I see the 31% there so the gain looks to have increased in like the last year and abit.

Jan Olšan · Jul 20, 2025

GodisanAtheist said:
-They always strike back but their method of striking back is unexpected, was my takeaway.

Compete on performance and price across the stack?

Maybe they'll swallow a lot of margin to compete on price.

Or maybe NV will compete by leveraging their market dominance and strongarming AIBs into spinning off crappy b-tier product lines for you.

Or maybe they'll strong arm devs into "hidden tesselation" or "extreme RT" or whatever software feature gives them an edge.

Or maybe it will be a marketing subterfuge campaign where suddenly a bunch of old accounts with 1 post each wake up and begin complaining about crappy drivers.

Or...

Or run guerilla marketing that floods reddits with notion that AMD card had to be unrealistically cheaper to consider and because it isn't... recite a "the more you buy the more you save" prayer and now their more expensive card is actually cheaper or something.

Either Nvidia spends a lot of effort to spam that narrative or they succeeded and the customers actually believe and parrot it for them, because I see it this BS all the time.

DAPUNISHER · Jul 20, 2025

Jan Olšan said:
Either Nvidia spends a lot of effort to spam that narrative or they succeeded and the customers actually believe and parrot it for them, because I see it this BS all the time.

It's both.

Gaming brought in less than Nvidia's R&D costs. AMD may have an opportunity ahead.

soresu · Jul 20, 2025

DAPUNISHER said:
Gaming brought in less than Nvidia's R&D costs

Realistically though that doesn't really matter to them.

AI/ML is making them so much money at the moment that the gaming market could crash and they would probably be happier to just dedicate that wafer capacity to pro/dc instead.

DAPUNISHER · Jul 20, 2025

soresu said:
Realistically though that doesn't really matter to them.

AI/ML is making them so much money at the moment that the gaming market could crash and they would probably be happier to just dedicate that wafer capacity to pro/dc instead.

LOL, yeah man, that was my point.

reaperrr3 · Jul 21, 2025

branch_suggestion said:
Lets see, N48 has roughly 25% more gaming perf/FLOP than N31.
RDNA5 can be expected to increase that more, let's say 35%.

I wouldn't be so optimistic about (raster) perf/FLOP, for 2 reasons:

1) RDNA4 was such a huge uplift and fixed so many weaknesses of RDNA3, that RDNA5 may still hit a lower real-world improvement per FLOP in raster, despite bigger changes on paper.

2) If some of the "IPC" improvement comes from considerable VOPD/dual-issue improvements, AMD will likely advertise dual-issue FLOPs again, and perf/FLOP will technically go down
(perf/WGP @ same clock would still be up considerably, of course)

ToTTenTranz said:
N31 config with >3GHz clocks, GDDR7 and higher IPC than RDNA4 sounds like a beast that could bite a GB202 in its heels, to be honest.

Yeah.

Assuming the 64CU N5x will be at least N48 x1.3 in raster, a 96CU N5x would need to reach at least N48 x1.7-x1.8 to make sense, which would put it virtually on par with the 5090 in most games.

basix · Jul 21, 2025

AMD is already advertising dual-issue FLOPS. They just do not do that on stream processor count level.

N31 has 6144 dual-issue stream processors but 12'288 FP32 units. The peak FLOPS throughput is based on the FP32 unit count.

Kepler_L2 · Jul 21, 2025

basix said:
AMD is already advertising dual-issue FLOPS. They just do not do that on stream processor count level.

N31 has 6144 dual-issue stream processors but 12'288 FP32 units. The peak FLOPS throughput is based on the FP32 unit count.

Yeah, I do expect them to change the advertised core count with RDNA5, since VOPD will go from best-case scenario to average-case-unless-something-weird-happened scenario.

ToTTenTranz · Jul 21, 2025

Kepler_L2 said:
Yeah, I do expect them to change the advertised core count with RDNA5, since VOPD will go from best-case scenario to average-case-unless-something-weird-happened scenario.

If true, this should mean a massive boost in effective compute throughput for real-world scenarios.

Saylick · Jul 21, 2025

ToTTenTranz said:
If true, this should mean a massive boost in effective compute throughput for real-world scenarios.

Too bad real-world scenarios likely won’t be able to net you 2x performance even with 2x effective compute.

basix · Jul 21, 2025

It will be far from 2x. But if we see something similar like Turing to Ampere, I would be impressed. Basically 1.3x or so performance per FLOPS with relatively little additional HW. The FP-Units are already there.

And who knows, what dynamic / OoO execution and updated caching systems bring to the table in addition to VOPD.

dangerman1337 · Jul 21, 2025

From the use of 384-bit bus in Magnus & 96 CU UDNA die, I suspect that AMD is maybe gutting Infinity Cache altogether from their cards and just having a lot of bandwith.

Mopetar · Jul 21, 2025

Win2012R2 said:
They need to double it again on per CU level, so more with more CUs. Nvidia got sloppy with Blackwell, but this is unlikely to continue, AMD must catch up proper next time, ideally be better.

Nvidia is too busy making truckloads of money and being the most valuable company of all time to care about consumer GPUs. The AI market doesn't seem to be dying down any so I don't see why Nvidia would devote time and effort to such a tiny part of their bottom line.

If the AI market dies they can change their tune then and everyone will gladly welcome them back. Until then they can just sell cutdown versions of massive dies made for different markets. Anyone talented enough to work on a killer gaming GPU can make an even more profitable GPU for other markets.

dangerman1337 · Jul 21, 2025

Mopetar said:
Nvidia is too busy making truckloads of money and being the most valuable company of all time to care about consumer GPUs. The AI market doesn't seem to be dying down any so I don't see why Nvidia would devote time and effort to such a tiny part of their bottom line.

If the AI market dies they can change their tune then and everyone will gladly welcome them back. Until then they can just sell cutdown versions of massive dies made for different markets. Anyone talented enough to work on a killer gaming GPU can make an even more profitable GPU for other markets.

Besides Nvidia is aiming for robotics with AI/ML being the stepping stone towards that. Jensen is playing the long game.

ToTTenTranz · Jul 21, 2025

Saylick said:
Too bad real-world scenarios likely won’t be able to net you 2x performance even with 2x effective compute.

No one suggested 2x performance nor even remotely similar to that, and putting those words into others' doesn't seem very honest IMO.

Even if "real performance" per-CU and per-clock increases only 15% it's already a big difference if coming from a single additional feature that usually takes a rather small increase in transistor count (less than simply adding 15% more execution units and caches).

maddie · Jul 21, 2025

dangerman1337 said:
From the use of 384-bit bus in Magnus & 96 CU UDNA die, I suspect that AMD is maybe gutting Infinity Cache altogether from their cards and just having a lot of bandwith.

Nah. The bus/CU ratio is the same as RDNA4. They will probably need even better caching schemes to handle the expected improvement in instruction throughput execution rate.

ToTTenTranz · Jul 22, 2025

maddie said:
Nah. The bus/CU ratio is the same as RDNA4. They will probably need even better caching schemes to handle the expected improvement in instruction throughput execution rate.

Or, you know, GDDR7.

maddie · Jul 22, 2025

ToTTenTranz said:
Or, you know, GDDR7.

True, but almost certainly not abandoning IF caching schemes. GDDR7 alone cannot replace the bandwidth amplification of a large cache.

Discussion RDNA 5 / UDNA (CDNA Next) speculation

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Senior member

Super Moderator CPU Forum Mod and Elite Member

Diamond Member

Super Moderator CPU Forum Mod and Elite Member

Member

Senior member

Golden Member

Senior member

Diamond Member

Senior member

Senior member

Diamond Member

Senior member

Senior member

Diamond Member

Senior member

Diamond Member