Discussion RDNA 5 / UDNA (CDNA Next) speculation

511 · Jul 22, 2025

when is it going to launch cause i know for sure it will be on N3E/P.

marees · Jul 22, 2025

511 said:
when is it going to launch cause i know for sure it will be on N3E/P.

Along with xbox next year

Xbox nov
RDNA 5 probably dec-31st

511 · Jul 22, 2025

marees said:
Along with xbox next year

Xbox nov
RDNA 5 probably dec-31st

Xbox I don't have high hopes for next year lol ngl.

GodisanAtheist · Jul 22, 2025

maddie said:
True, but almost certainly not abandoning IF caching schemes. GDDR7 alone cannot replace the bandwidth amplification of a large cache.

- New GDDR probably means they can shrink IC and claw back some die space though.

IC is a good crutch while AMD uses GDDR6, while NV went with more exotic RAM.

GDDR7 won't eliminate the need for IC, but will likely minimize it.

basix · Jul 22, 2025

"Magnus" could use 384bit + 48 MByte IF$ instead of 96 MByte. In the end a tradeoff between effective bandwidth and amount of VRAM.

maddie · Jul 22, 2025

GodisanAtheist said:
- New GDDR probably means they can shrink IC and claw back some die space though.

IC is a good crutch while AMD uses GDDR6, while NV went with more exotic RAM.

GDDR7 won't eliminate the need for IC, but will likely minimize it.

IC was used effectively even when NVIDIA was also on GDDR6.

basix said:
"Magnus" could use 384bit + 48 MByte IF$ instead of 96 MByte. In the end a tradeoff between effective bandwidth and amount of VRAM.

I don't think cache can reduce the need for "amount of VRAM".

It amplifies bandwidth, so allows a smaller bus width. We should remember, the space needed for memory controllers is reduced, so this needs to be factored in the total cache+memory controllers area. The penalty might be a lot smaller than we think as logic shrinks faster than analog. Reduced power is another benefit.

itsmydamnation · Jul 22, 2025

Also rtrt drives latency , reducing cache will hurt rtrt regardless of overall bandwidth.

reaperrr3 · Jul 22, 2025

IC is also more power efficient than VRAM access, this was one big factor of RDNA2's massive perf/W improvement.

48MB might be enough for 80 CUs running at more modest clocks in a console (remember previous console SoCs had no GPU IC at all), but for a 96 CU desktop GPU with 20+% higher CU IPC and clocked to 3+ GHz, 96MB might be necessary to hit their perf targets at 4K and RT/PT in general, despite GDDR7.

FWIW, Nvidia didn't back down on their L2 sizes with Blackwell despite GDDR7 either, and NV is no less margin-oriented than AMD.

basix · Jul 23, 2025

maddie said:
I don't think cache can reduce the need for "amount of VRAM".

It amplifies bandwidth, so allows a smaller bus width.

No, IF$ cannot reduce the amount of needed VRAM.
But if you want a certain amount of VRAM, you need to have a certain bus width. If you widen the bus, you can reduce the size of IF$ without running into bandwidth bottlenecks. That's why I was talking about a tradeoff in this regard.
You could go the other way round: 128bit and 512MByte IF$. But the narrow bus width then limits the maximum VRAM capacity.

Other topics like energy efficiency and effective memory latency will get worse with wider bus width and less IF$. If those parameters are not your primary concern, a widened bus can be the right choice.

adroc_thurston · Jul 23, 2025

reaperrr3 said:
this was one big factor of RDNA2's massive perf/W improvement.

No, the massive RDNA2 p/w bump was just going off the physical design.
MALL allowed them to ship a 256b flagship-ish parts and it made NV really, really scared.
Fortunately, RDNA3 did not go so well.

soresu · Jul 23, 2025

reaperrr3 said:
IC is also more power efficient than VRAM access, this was one big factor of RDNA2's massive perf/W improvement.

Yes, but it's not a magic bullet for the same reason that people are bickering over the 8/16 GB cards issue at the moment.

More efficient software techniques (virtual texturing etc) and hardware µArch can mitigate this need to a limited extent, but it won't ever go away.

If they ever get around to stacking HBM directly on the GPU (or under) it will also mitigate some of the power efficiency and latency issues from VRAM access.

ToTTenTranz · Jul 23, 2025

GodisanAtheist said:
GDDR7 won't eliminate the need for IC, but will likely minimize it.

Truth be told, AMD has downscaled MALL-per-Performance a lot since RDNA2 already.
N21 had a whopping 128MB IC, then N31 ~50% was faster with 96MB albeit with a wider IC and VRAM bus, and now N48 is ~50% faster than N21 with only 64MB and same VRAM bus width (though clocked 25% faster).

maddie said:
The penalty might be a lot smaller than we think as logic shrinks faster than analog.

Cache area has also been practically stagnant across process nodes for a while and that is only changing after 3nm.

reaperrr3 said:
FWIW, Nvidia didn't back down on their L2 sizes with Blackwell despite GDDR7 either, and NV is no less margin-oriented than AMD.

Most probably because Nvidia was also planning for Blackwell to clock a lot higher than it did in the end, resulting in overkill effective bandwidth.
It was their Vega/RDNA3 moment.

Tuna-Fish · Jul 23, 2025

ToTTenTranz said:
Truth be told, AMD has downscaled MALL-per-Performance a lot since RDNA2 already.
N21 had a whopping 128MB IC, then N31 ~50% was faster with 96MB albeit with a wider IC and VRAM bus, and now N48 is ~50% faster than N21 with only 64MB and same VRAM bus width (though clocked 25% faster).

That's because MALL/performance is a completely nonsense thing that doesn't matter!

You do not need a specific amount of MALL for specific amount of performance. You need a specific amount of MALL for a given target render resolution. It doesn't matter how complex your scene is or how much time you spend in your shaders, the main bandwidth amplification is about getting your render target fit in cache across frames, so it never needs to hit ram. The only meaningful measure is is MALL/resolution. This is why MALL wasn't a thing before, you couldn't provide enough for the target resolution until cache got cheap enough.

MALL has slightly decreased since introduction because AMD has gotten better at optimizing. Both with better FB compression, and by better excluding things from being cached in the MALL.

dr1337 · Jul 23, 2025

Tuna-Fish said:
You need a specific amount of MALL for a given target render resolution

you can take that logic and shove it in the 8gb vram doesn't matter thread, these products should be built to perform as best as possible, period.

itsmydamnation · Jul 23, 2025

dr1337 said:
you can take that logic and shove it in the 8gb vram doesn't matter thread, these products should be built to perform as best as possible, period.

what? this makes no sense no matter what direction i approach it in

edit: you got 60 points to spend ( in effect cost) what is best?

dr1337 · Jul 24, 2025

itsmydamnation said:
what? this makes no sense no matter what direction i approach it in

really? you think a card faster than the 6900xt in 4k should have less less infinity cache because someone in marketing decided its not a 4k GPU?

itsmydamnation · Jul 24, 2025

dr1337 said:
really? you think a card faster than the 6900xt in 4k should have less less infinity cache because someone in marketing decided its not a 4k GPU?

thats not how reality works

adroc_thurston · Jul 24, 2025

dr1337 said:
you think a card faster than the 6900xt in 4k should have less less infinity cache because someone in marketing decided its not a 4k GPU?

Yeah that's how product planning/segmentation works.

ToTTenTranz · Jul 24, 2025

Tuna-Fish said:
That's because MALL/performance is a completely nonsense thing that doesn't matter!

I meant MALL per Performance Bracket. And it obviously matters to the engineers designing the chip and trying to balance cost (die area) towards a target selling price.

Tuna-Fish said:
You do not need a specific amount of MALL for specific amount of performance.

Zero people said this.

Tuna-Fish said:
You need a specific amount of MALL for a given target render resolution.

Yes. And for any resolution between FHD and 4K, N48 bests N21 with half the amount of MALL.

Not only that, the performance advantage of the 9070XT actually increases as resolution goes up, despite using half the MALL.
Considering AMD doubled the bus width per cell for RDNA3, we should expect around the same MALL bandwidth-per-clock in N48 versus N21.

Tuna-Fish said:
MALL has slightly decreased since introduction because AMD has gotten better at optimizing.

Halving the amount of MALL for a faster-all-around-SKU isn't a slight decrease. It's a major one.

Tuna-Fish said:
Both with better FB compression, and by better excluding things from being cached in the MALL.

This is the motive that leads to my initial statement, that AMD has significantly downsized the MALL per performance bracket since RDNA2.
Not sure why you thought that needed some kind of rebuttal.

Win2012R2 · Jul 24, 2025

basix said:
"Magnus" could use 384bit + 48 MByte IF$ instead of 96 MByte. In the end a tradeoff between effective bandwidth and amount of VRAM.

384bit is 4k card, no chance cache goes so low

adroc_thurston said:
Fortunately, RDNA3 did not go so well.

Why "fortunately"?

SolidQ · Jul 24, 2025

Next steam deck is RDNA5 apu?

Steam Deck 2 rumored to be in the works — and it may arrive with a massive AMD APU upgrade

Will Valve finally deliver a home console, too?

www.tomsguide.com

marees · Jul 24, 2025

SolidQ said:
Next steam deck is RDNA5 apu?

Steam Deck 2 rumored to be in the works — and it may arrive with a massive AMD APU upgrade

Will Valve finally deliver a home console, too?

www.tomsguide.com

I am guessing that valve will wait for 3nm to get cheap. However long that takes

Right now only 6nm & 7nm are cheap

SolidQ · Jul 24, 2025

Mlid show some RDNA5 documents(don't know real or not)

Kepler_L2 · Jul 24, 2025

SolidQ said:
Mlid show some RDNA5 documents(don't know real or not)

It's real but some of the numbers like CU count are slightly wrong (maybe intentionally to find out who leaks this).

SolidQ · Jul 24, 2025

Kepler_L2 said:
maybe intentionally to find out who leaks this

he saying one of AMD sources leak. instead 184 is 144CU?
if slightly wrong, so they gonna try compete vs 6090 with near tripple CU compare to N48?

Discussion RDNA 5 / UDNA (CDNA Next) speculation

Diamond Member

Golden Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Diamond Member

Member

Senior member

Diamond Member

Diamond Member

Senior member

Golden Member

Senior member

Diamond Member

Senior member

Diamond Member

Diamond Member

Senior member

Golden Member

Attachments

Golden Member

Golden Member

Golden Member

Golden Member

Golden Member