Discussion RDNA4 + CDNA3 Architectures Thread

DisEnchantment · Mar 23, 2022

With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits

History for llvm/lib/Target/AMDGPU - llvm/llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. - History for llvm/lib/Target/AMDGPU - llvm/llvm-project

github.com

Or Phoronix

More AMD "GFX940" Enablement Work Landing In LLVM - Phoronix

www.phoronix.com

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it

This is nuts, MI100/200/300 cadence is impressive.

Previous thread on CDNA2 and RDNA3 here

Question - Speculation: RDNA3 + CDNA2 Architectures Thread

Man I have been dying to make this one for a while now. First rumours for RDNA3 are here so new thread time! Just going to start off with this one for now: kopite7kimi on Twitter: "@VideoCardz Ah, I mean a simple mcm design with 10240 cores is not enough. Because the lift from RDNA2 to RDNA3...

forums.anandtech.com

adroc_thurston · Sep 24, 2025

lucasworais said:
So the 192cu part is cutdown?

no, you're getting 160CU maybe if you're lucky for AT0.

marees · Sep 24, 2025

lucasworais said:
So the 192cu part is cutdown?

192 is max

Cut down Gaming would be around 150 (as per rumours)

regen1 · Sep 24, 2025

poke01 said:
HIP Ray Tracing - AMD GPUOpen

HIP RT is a ray tracing library for HIP, making it easy to write ray tracing applications in HIP.

gpuopen.com

RDNA4 support. Hopefully we see improvements in Blender

HIP RT 3 was supposed to be supported in Blender 5.0 but due to some issues support has been delayed to Blender 5.1

https://projects.blender.org/blender/blender/pulls/145281

basix · Sep 26, 2025

marees said:
192 is max

Cut down Gaming would be around 150 (as per rumours)

154 would be an obvious number: -1x SE with 24 CU and -7x 2 CU (1 WGP) per SE. But afaik RDNA5 does not need symmetric salvaging anymore, so I expect something in the range of 154...168 CU.

We could see more if AMD sees a way to beat Nvidia by unlocking more CUs.

RnR_au · Sep 26, 2025

Can anyone tell me why the R9700 32GB card is so hard to buy?

Its almost as if AMD don't like to sell gpus....

gdansk · Sep 26, 2025

There's some evidence that AMD isn't making a lot of N48. Every market share estimate points that way, to different degrees.

But also their partners don't want to make a bunch of 32GB cards which need to be discounted shortly.

RnR_au · Sep 26, 2025

gdansk said:
But also their partners don't want to make a bunch of 32GB cards which need to be discounted shortly.

I thought the current rumours were that there are no new cards coming out for something like 12 -18 months?

gdansk · Sep 26, 2025

RnR_au said:
I thought the current rumours were that there are no new cards coming out for something like 12 -18 months?

? It's a niche version of a small volume part. They did not hold inventory for a launch, you're just going to see them as they trickle in. If they did, they'd have to discount it if ML weirdos realize it is slower, most quantized models target 24GB, and it is overall a false saving of $700.

soresu · Sep 26, 2025

RnR_au said:
Its almost as if AMD don't like to sell gpus....

*It's almost as if AMD don't like to sell RDNA GPUs when the TSMC capacity could be used for CDNA that makes them a whooooooollllleeeee lot more moneh per wafer.

soresu · Sep 26, 2025

HBM distribution may be a much more obvious and visible example of prioritising more profitable market segments, but I don't doubt that this trend will bite further and further into the consumer GPU segment as far as orders/capacity goes until the AI bubble bursts or at least starts to significantly slow down.

poke01 · Sep 26, 2025

soresu said:
*It's almost as if AMD don't like to sell RDNA GPUs when the TSMC capacity could be used for CDNA that makes them a whooooooollllleeeee lot more moneh per wafer.

I can go and buy Blackwell 96GB GPU in Australia right now . Funny, how AMD can’t do the same for such a low tier card.

adroc_thurston · Sep 26, 2025

soresu said:
*It's almost as if AMD don't like to sell RDNA GPUs when the TSMC capacity could be used for CDNA that makes them a whooooooollllleeeee lot more moneh per wafer.

well neither CDNA nor RDNA are wafer limited.
AMD just exerts extremely tight inventory control in client graphics. It's all lessons learned from Hawaii and Polaris and stuff.

soresu · Sep 26, 2025

adroc_thurston said:
It's all lessons learned from Hawaii and Polaris and stuff.

To say nothing of RDNA3.

adroc_thurston · Sep 26, 2025

soresu said:
To say nothing of RDNA3.

That one never had any inventory issues though.

gdansk · Sep 26, 2025

soresu said:
To say nothing of RDNA3.

Actually... not a problem. It dried up mainly as planned.
The problem was RDNA2 - took years to sell most of that.

RnR_au · Sep 27, 2025

poke01 said:
I can go and buy Blackwell 96GB GPU in Australia right now . Funny, how AMD can’t do the same for such a low tier card.

And just to show the value proposition....

https://www.scorptec.com.au/compute...Blackwell-Workstation-Edition,-96gb-Gddr7-Ecc - $AU17000 for single card

https://www.scorptec.com.au/product/ready-to-run-pcs/workstation-&-creation/120585-r2r10520 - $AU9499 for threadripper system with a single R97000

poke01 · Sep 27, 2025

RnR_au said:
And just to show the value proposition....

https://www.scorptec.com.au/compute...Blackwell-Workstation-Edition,-96gb-Gddr7-Ecc - $AU17000 for single card

https://www.scorptec.com.au/product/ready-to-run-pcs/workstation-&-creation/120585-r2r10520 - $AU9499 for threadripper system with a single R97000

And whales are buying the 96GB card. The demand is there. AMD can surely make sure Pro R9790 is available to buy.

adroc_thurston · Sep 27, 2025

poke01 said:
AMD can surely make sure Pro R9790 is available to buy.

Who even cares about that one.

RnR_au · Sep 27, 2025

adroc_thurston said:
Who even cares about that one.

My mate did. Just in the last few days he got scammed, with a large bunch of others, on an Amazon deal for the 7900xtx. So he looked around what else was available. He was kinda keen on the R9700 Pro, but balked at buying a whole new system. He'll now wait to see what the Nvidia Supers will field.

Joe NYC · Sunday at 7:40 AM

soresu said:
*It's almost as if AMD don't like to sell RDNA GPUs when the TSMC capacity could be used for CDNA that makes them a whooooooollllleeeee lot more moneh per wafer.

Current CDNA Mi355 is on N3P while client is on N4P

MrMPFR · Sunday at 7:54 AM

basix said:
Even if neural rendering stuff gets pushed into games by Nvidia, I expect RDNA5 to be no slouch on that front as well. AMD will for sure add FP4 support and might also double matrix core width. That is not as extreme as 8x width as on Rubin CPX (my expectation: gaming cards will likely get cut-down to 4x) but already very decent for many neural rendering use cases. Nvidias card might be faster, but we are talking about a few percent in probably most cases (and by far less than 2x).

154 CUs * 2.8 Ghz * 8,192 FP4 sparse/clock * 2 / 1000 = 7,065 PetaFLOPs matching Kepler's figure. That's based a quadrupling per CU vs RDNA 4 and doubling vs Blackwell. FP8 -> FP4 = 2X, raw increase = 2x.
IIRC All gaming implementations rn use INT8 and/or FP8 so effectively up to 4X increase vs RDNA 4 and Blackwell/Ada Lovelace. NVFP4 is fine and AMD will match for sure. DLSS5 and FSR5 will prob use NVFP4 and "AMD"FP4 to deliver reduced ms cost.

Let's wait for Rubin CPX's specs sheet at GTC 2026. Haven't heard anyone confirm this is the 6090 die.

basix said:
Why AMD will likely extend matrix core performance:

Neural rendering is kinda new but there are papers out there since at least 2021 (the original Neural Radiance Caching paper) and AMD will bring their own "neural rendering" stuff with FSR Redstone

Neural rendering techniques can cut down cost. E.g. neural texture compression allows less VRAM and hard disk size. If we extend "neural rendering" to SR, FG and RR it gets even more obvious: You can use a smaller chip to get to similar visual and performance results

AMD, Microsoft and Sony should look far into the future towards PS7 and Xbox-Next-Next. The more "neural rendering" is supported with strong matrix core acceleration, the easier will be crossgen of PS6 with PS7

This trend is kinda obvious, already today: Usage of reural rendering techniques will get more and more prevalent in the future (at least for some parts of the rendering pipeline)

Neural asset compression isn't neural rendering but yeah can accomplish similar things for MB overhead at iso-image/asset quality.

SR, FG and RR are already neural rendering and they're already accomplishing that rn on NVIDIA side.

Consoles are not planned like that and it's impossible to say what will change in the next 10 years leading up to PS7 launch.
As for neural rendering is really just ReSTIR+ neurally augmented path tracing. Devs can and will make a scalable lighting solution, that works on PS5, nextgen handhelds and in many cases prob even the Switch 2. For all UE5 games baseline will probably be MegaLights + an AMD derived proper BVH SDK similar to RTX Mega Geometry for PS5 and XSX with derivatives of this pairing arriving in other engines during the later part of PS5/PS6 crossgen. This solution will be well ahead of current probe based RTGI and feel like another gen on gen uplift in RT and for even wider support many could could stick a full worldspace (PT) + probe based (DDGI) or mix solution. Essentially keep the old current version from PS5 gen alongside the new MegaLights derivative for PS5/PS6 gen. Neural rendering isn't a cutoff for PS5/PS6 crossgen.
AI LLMs can be offloaded to cloud too so that's another reason for even longer nextgen crossgen.

Strongly suspect the real cutoff for PS5/PS6 crossgen is API support (GPU work graphs) and derived tech (procedural geometry, self-budgeting rendering systems...) and fundamental implementations of ML essential to the core gameplay: Stuff like ML destruction, physics, combat mechanics etc... While some games could implement early version of this in the late 2020s, most game will probably wait pushing true nextgen PS6 games to 7-10 years from now. As for what lies beyond PS6-PS7 crossgen impossible to know other than PS6 will be useless and you'll need a PS7.

soresu · Sunday at 1:27 PM

Joe NYC said:
Current CDNA Mi355 is on N3P while client is on N4P

Ahhh k, didn't realise they had already migrated to that node.

I wonder what RDNA5 will be fabbed on?

branch_suggestion · Sunday at 1:38 PM

soresu said:
Ahhh k, didn't realise they had already migrated to that node.

I wonder what RDNA5 will be fabbed on?

N3P across the stack.

basix · Monday at 2:54 AM

There are only 2 viable options: N2 and N3P, because N4 is getting too old now. As N2 will become a high demand node by end of next year (smartphone SoCs, HPC accelerators, CPUs), the older and cheaper N3P makes more sense. N3P brings a very nice logic chip density increase, which is very good for GPUs.

N3P is also the choice for Zen 6 IODs (Medusa etc.). Keeping the IP on the same node for all RDNA5 implementations makes also sense. Less R&D effort.

marees · Monday at 2:59 AM

basix said:
There are only 2 viable options: N2 and N3P, because N4 is getting too old now. As N2 will become a high demand node by end of next year (smartphone SoCs, HPC accelerators, CPUs), the older and cheaper N3P makes more sense. N3P brings a very nice logic chip density increase, which is very good for GPUs.

N3P is also the choice for Zen 6 IODs (Medusa etc.). Keeping the IP on the same node for all RDNA5 implementations makes also sense. Less R&D effort.

What are the chances of a RDNA 4 refresh (on N4 or N4X etc.) ?

Discussion RDNA4 + CDNA3 Architectures Thread

Golden Member

Diamond Member

Golden Member

Member

Senior member

Platinum Member

Diamond Member

Platinum Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Platinum Member

Diamond Member

Diamond Member

Platinum Member

Diamond Member

Member

Diamond Member

Senior member

Senior member

Golden Member