Discussion RDNA4 + CDNA3 Architectures Thread

DisEnchantment · Mar 23, 2022

With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits

History for llvm/lib/Target/AMDGPU - llvm/llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. - History for llvm/lib/Target/AMDGPU - llvm/llvm-project

github.com

Or Phoronix

More AMD "GFX940" Enablement Work Landing In LLVM - Phoronix

www.phoronix.com

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it

This is nuts, MI100/200/300 cadence is impressive.

Previous thread on CDNA2 and RDNA3 here

Question - Speculation: RDNA3 + CDNA2 Architectures Thread

Man I have been dying to make this one for a while now. First rumours for RDNA3 are here so new thread time! Just going to start off with this one for now: kopite7kimi on Twitter: "@VideoCardz Ah, I mean a simple mcm design with 10240 cores is not enough. Because the lift from RDNA2 to RDNA3...

forums.anandtech.com

mostwanted002 · Jul 15, 2025

Z O X said:
Is there a toggle to switch to lower res than 4k?
9070XT isn't a 4k GPU and in such GPU heavy situations differences are minimal ...

You can switch down resolutions within games. Or use upscaling (FSR3/4) which essentially renders at lower resolution and then upscales to target 4k, gives performance boost while maintaining fedility.

Z O X · Jul 15, 2025

mostwanted002 said:
You can switch down resolutions within games. Or use upscaling (FSR3/4) which essentially renders at lower resolution and then upscales to target 4k, gives performance boost while maintaining fedility.

By comparing averages with native res and up-scaling, it turns out bigger gains are on 4k than lower resolutions/fsr.

jpiniero · Jul 16, 2025

https://videocardz.com/newz/amd-radeon-ai-pro-r9700-rdna4-workstation-gpu-has-been-listed-at-1240

The Radeon AI Pro R9700 MSRP appears to be about $1200-1250.

mostwanted002 · Jul 16, 2025

LETSSS GOOOOO!!!!

igor_kavinski · Jul 16, 2025

Take that, 5090!

SolidQ · Jul 16, 2025

FSR4 gonna be aviable on C2077

So that why Azor was meeting with CDproject team

https://twitter.com/x/status/1920526199542943905

ToTTenTranz · Jul 17, 2025

jpiniero said:
https://videocardz.com/newz/amd-radeon-ai-pro-r9700-rdna4-workstation-gpu-has-been-listed-at-1240

The Radeon AI Pro R9700 MSRP appears to be about $1200-1250.

This is actually pretty decent for AI hobbyists.

gdansk · Jul 17, 2025

ToTTenTranz said:
This is actually pretty decent for AI hobbyists.

Yeah, it's a big discount to previous 32GB workstation Radeons.

ToTTenTranz · Jul 17, 2025

gdansk said:
Yeah, it's a big discount to previous 32GB workstation Radeons.

I guess AMD is finally trying harder to get people to adopt their GPUs for AI workloads.

And it's not like putting another 16GB GDDR6 on the back of the PCB is costing them too much. It's only ~$40 at the moment, so with a $500 premium over the regular 9070XT they're getting more than a 1000% markup on the memory upgrade.

mostwanted002 · Jul 18, 2025

https://www.amd.com/en/blogs/2025/amd-radeon-ai-pro-r9700-to-be-available-in-workstation.html

1. 23rd July for R9700 included in solutions provided by System Integrators
2. Standalone AIB availability in later Q3 '25

jpiniero · Jul 18, 2025

Newegg is offering a $50 GC on the Gigabyte 9060 XT 8 GB at MSRP 😱

Can't give it away I guess.

igor_kavinski · Jul 18, 2025

jpiniero said:
Can't give it away I guess.

They gonna have to dump it in third world markets.

Shmee · Jul 18, 2025

ToTTenTranz said:
This is actually pretty decent for AI hobbyists.

I am curious, how does it fair in AI against the 7900XTX, with its 24GB of memory? Obviously the 7900XTX is RDNA3 and not a pro card, and has a bit less memory, but it is a bit cheaper.

igor_kavinski · Jul 18, 2025

Shmee said:
I am curious, how does it fair in AI against the 7900XTX, with its 24GB of memory? Obviously the 7900XTX is RDNA3 and not a pro card, and has a bit less memory, but it is a bit cheaper.

AMD Radeon AI PRO R9700 Specs

AMD Navi 48, 2920 MHz, 4096 Cores, 256 TMUs, 128 ROPs, 32768 MB GDDR6, 2518 MHz, 256 bit

www.techpowerup.com

AMD Radeon RX 7900 XTX Specs

AMD Navi 31, 2498 MHz, 6144 Cores, 384 TMUs, 192 ROPs, 24576 MB GDDR6, 2500 MHz, 384 bit

www.techpowerup.com

Would be roughly 25% slower in brute speed unless significant bottlenecks have been removed in the RDNA4 drivers.

mostwanted002 · Jul 19, 2025

igor_kavinski said:
Would be roughly 25% slower in brute speed unless significant bottlenecks have been removed in the RDNA4 drivers.

FP16 is usually the training territory.
On the inference side of things, it may be faster than XTX since RDNA4 now has dedicated hardware for computing FP8 and 4 times the throughput for INT4/8.
The majority of out-of-the-box inferencing models are served in INT8 or INT4 quantizations to reduce VRAM cost and faster tokens/sec.

RnR_au · Jul 19, 2025

igor_kavinski said:
AMD Radeon AI PRO R9700 Specs

AMD Navi 48, 2920 MHz, 4096 Cores, 256 TMUs, 128 ROPs, 32768 MB GDDR6, 2518 MHz, 256 bit

www.techpowerup.com

AMD Radeon RX 7900 XTX Specs

AMD Navi 31, 2498 MHz, 6144 Cores, 384 TMUs, 192 ROPs, 24576 MB GDDR6, 2500 MHz, 384 bit

www.techpowerup.com

View attachment 127345
View attachment 127346

Would be roughly 25% slower in brute speed unless significant bottlenecks have been removed in the RDNA4 drivers.

Compute is not that important in inference. Memory bandwidth is king and the 7900XTX has nearly 50% more membw over the 9700. So the 7900XTX should be a fair bit faster.

igor_kavinski · Jul 19, 2025

RnR_au said:
Compute is not that important in inference. Memory bandwidth is king and the 7900XTX has nearly 50% more membw over the 9700. So the 7900XTX should be a fair bit faster.

Good point. So the R9700 is going to be starved massively for bandwidth 🙁

Shmee · Jul 19, 2025

Huh so the 7900XTX is actually faster with AI? Interesting. How would it compare to a 3090 or 4090?

igor_kavinski · Jul 19, 2025

Shmee said:
Huh so the 7900XTX is actually faster with AI? Interesting. How would it compare to a 3090 or 4090?

Much faster than a 3090.

4090, too huge too power hungry a card hence it gets to be really fast.

mostwanted002 · Jul 19, 2025

RnR_au said:
Compute is not that important in inference. Memory bandwidth is king and the 7900XTX has nearly 50% more membw over the 9700. So the 7900XTX should be a fair bit faster.

Indeed. R9700 will be a mix of perf stats. Upside of newer compute accelerators, alongside being capped by memory bandwidth. Gonna be interesting to see initial 3rd party numbers after embargo.

Vikv1918 · Jul 19, 2025

igor_kavinski said:
They gonna have to dump it in third world markets.

Or just shift production to the 16GB version and only use like 5% of dies as the 8GB version so that they dont have to do discounts to move stock.

marees · Jul 19, 2025

jpiniero said:
Newegg is offering a $50 GC on the Gigabyte 9060 XT 8 GB at MSRP 😱

Can't give it away I guess.

At what price is it worth buying the trap version ?

$200 ??

itsmydamnation · Jul 19, 2025

igor_kavinski said:
Much faster than a 3090.

4090, too huge too power hungry a card hence it gets to be really fast.

Depends on workload a 7900XTX ~- 4090 on deepseek-r1 14B/32B , i got 30tps stock , 33tps memory OC, 4090 was 36.
Also 3090 is just slightly behind if not equal to a 7900XTX , 3090 is kinda a local DIY AI GOAT.

gdansk · Jul 19, 2025

marees said:
At what price is it worth buying the trap version ?

$200 ??

At $250 isn't it much better than a B580 or RTX 5050?
It should be low enough, in my opinion.

The BoM savings aren't actually wise for board partners. Just make the 16GB. No one in DIY wants the 8GB version. OEMs only want Nvidia. So board partners should learn quickly to make very few of them.

igor_kavinski · Jul 19, 2025

itsmydamnation said:
Depends on workload a 7900XTX ~- 4090 on deepseek-r1 14B/32B , i got 30tps stock , 33tps memory OC, 4090 was 36.

Wow. 4090 doesn't look that impressive then if it only manages 36 tps. Maybe with max 600W, it could've done more?

Discussion RDNA4 + CDNA3 Architectures Thread

Golden Member

Member

Member

Lifer

Member

Lifer

Golden Member

Golden Member

Diamond Member

Golden Member

Member

Lifer

Lifer

Memory & Storage, Graphics Cards Mod Elite Member

Lifer

Member

Attachments

Platinum Member

Lifer

Memory & Storage, Graphics Cards Mod Elite Member

Lifer

Member

Member

Platinum Member

Diamond Member

Diamond Member

Lifer