Discussion RDNA4 + CDNA3 Architectures Thread

DisEnchantment · Mar 23, 2022

With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits

History for llvm/lib/Target/AMDGPU - llvm/llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. - History for llvm/lib/Target/AMDGPU - llvm/llvm-project

github.com

Or Phoronix

More AMD "GFX940" Enablement Work Landing In LLVM - Phoronix

www.phoronix.com

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it

This is nuts, MI100/200/300 cadence is impressive.

Previous thread on CDNA2 and RDNA3 here

Question - Speculation: RDNA3 + CDNA2 Architectures Thread

Man I have been dying to make this one for a while now. First rumours for RDNA3 are here so new thread time! Just going to start off with this one for now: kopite7kimi on Twitter: "@VideoCardz Ah, I mean a simple mcm design with 10240 cores is not enough. Because the lift from RDNA2 to RDNA3...

forums.anandtech.com

itsmydamnation · Sep 30, 2024

Kepler_L2 said:
It's Int8 WMMA running on the GPU. A 300 TOPs NPU would be almost 200mm² by itself.

also they didnt double memory bandwidth so what would be the point of dedicated hardware.

NV SM's are sized exactly because of this reason , the tensor cores will consume 100% of the bandwidth if active 100% of the time.

ToTTenTranz · Sep 30, 2024

Kepler_L2 said:
It's not an NPU why do people keep repeating this?

I think the folks at Digital Foundry have been insisting on the dedicated NPU for a while now.

Kepler_L2 said:
It's Int8 WMMA running on the GPU. A 300 TOPs NPU would be almost 200mm² by itself.

I don't get how they're reaching 300 TOPs at the 2.21GHz clocks, though.

Even if they're using UINT4 to reach maximum throughput, RDNA3's WMMA should do 1024 operations per compute unit per clock.
On the PS5 Pro's 60 CUs at 2.21GHz that would mean 1024 * 60 * 2.21 = 136 TOPs.

So, either I'm doing some math wrong, or the 300 TOPs number is wrong, or RDNA4's WMMA is bringing a significant boost in tensor operations per clock per CU.

itsmydamnation · Sep 30, 2024

@ToTTenTranz if we go from patents, Sparsity

maddie · Sep 30, 2024

Kepler_L2 said:
It's Int8 WMMA running on the GPU. A 300 TOPs NPU would be almost 200mm² by itself.

From the 50TOPS Strixpoint die shot that 200mm^2 seems way too large.

Kepler_L2 · Sep 30, 2024

ToTTenTranz said:
I think the folks at Digital Foundry have been insisting on the dedicated NPU for a while now.

I don't get how they're reaching 300 TOPs at the 2.21GHz clocks, though.

Even if they're using UINT4 to reach maximum throughput, RDNA3's WMMA should do 1024 operations per compute unit per clock.
On the PS5 Pro's 60 CUs at 2.21GHz that would mean 1024 * 60 * 2.21 = 136 TOPs.

So, either I'm doing some math wrong, or the 300 TOPs number is wrong, or RDNA4's WMMA is bringing a significant boost in tensor operations per clock per CU.

It's not RDNA3's WMMA engine but RDNA4's, which doubles rates per cycle and adds sparsity on top of that. That's how you get to 300 (well 288 actually) TOPs.

ToTTenTranz · Sep 30, 2024

Kepler_L2 said:
It's not RDNA3's WMMA engine but RDNA4's, which doubles rates per cycle and adds sparsity on top of that. That's how you get to 300 (well 288 actually) TOPs.

So it's safe to assume FSR4 will have a bigger performance hit on RDNA3/3.5 than on RDNA4?

Kepler_L2 · Sep 30, 2024

ToTTenTranz said:
So it's safe to assume FSR4 will have a bigger performance hit on RDNA3/3.5 than on RDNA4?

Very likely.

SolidQ · Sep 30, 2024

ToTTenTranz said:
So it's safe to assume FSR4 will have a bigger performance hit on RDNA3/3.5 than on RDNA4?

You can see similar hit on DLSS FG, compared to FSR FG

That's how you get to 300 (well 288 actually) TOPs.

That mean RDNA4 desktop can have 400+?

Kepler_L2 · Sep 30, 2024

SolidQ said:
You can see similar hit on DLSS FG, compared to FSR FG

That mean RDNA4 desktop can have 400+?

If Navi48 reaches the target 3.46 GHz clocks it would have 453 Int8 TOPs.

ToTTenTranz · Sep 30, 2024

itsmydamnation said:
also they didnt double memory bandwidth so what would be the point of dedicated hardware.

NV SM's are sized exactly because of this reason , the tensor cores will consume 100% of the bandwidth if active 100% of the time.

This was my thought when the Digital Foundry guys started mentioning a dedicated block.

The Xilinx block in AMD's SoCs is there mostly to perform tensor operations at a lower power consumption, so that the iGPU doesn't light up pushing 25W while editing a document in Office with LLM-based Copilot.

300 TOPs UINT8 would be massive and would probably push for the full 576GB/s on the PS5 Pro, let alone having that + 36 TFLOPs GPU + 8-core Zen2 as clients for the memory controller, all at the same time.

poke01 · Sep 30, 2024

If that’s the case FSR4 can run on Ampere and later cards too. sparsity has been on Nvidia cards since 2020. If FSR4 is open source then it would be interesting.

adroc_thurston · Sep 30, 2024

poke01 said:
If that’s the case FSR4 can run on Ampere and later cards too

nope.
You need NVAPI to touch WMMA (the NV one) from gfx side of things.

poke01 · Sep 30, 2024

adroc_thurston said:
nope.
You need NVAPI to touch WMMA (the NV one) from gfx side of things.

ahh ok. Then if FSR4 is open source one can implement the NV dev tools too. Like make a fork for Nvidia GPUs.

del42sa · Oct 1, 2024

https://overclock3d.net/news/gpu-displays/amd-reportedly-delays-rdna-4-due-to-rdna-3-oversupply/

AMD has delayed the launch of its upcoming RDNA 4 graphics cards until Q1 2025. This RDNA 4 delay comes thanks to an oversupply of AMD’s current-generation products. While lower-end RDNA 3 GPUs are selling “ok,” AMD’s higher-end models aren’t selling fast enough. Simply put, high-end PC gamers are opting for Nvidia’s RTX 40 SUPER series GPU models.

Bigos · Oct 1, 2024

Source: MLID. So the article is an utter garbage.

del42sa · Oct 1, 2024

Bigos said:
Source: MLID. So the article is an utter garbage.

well it's actually the opposite, MLID was the only one who claimed that AMD will release RDNA4 this year, while all other sources pointed to next year. Therefore, if the MLID mentions next year (for whatever reason) it is a confirmation that the release will not happen until next year 😉

marees · Oct 1, 2024

del42sa said:
While lower-end RDNA 3 GPUs are selling “ok,” AMD’s higher-end models aren’t selling fast enough. Simply put, high-end PC gamers are opting for Nvidia’s RTX 40 SUPER series GPU models.

So will AMD announce a 'clearance' sale of 7900xtx, 7900xt, & 7900 gre ???

coercitiv · Oct 1, 2024

del42sa said:
well it's actually the opposite, MLID was the only one who claimed that AMD will release RDNA4 this year, while all other sources pointed to next year. Therefore, if the MLID mentions next year (for whatever reason) it is a confirmation that the release will not happen until next year 😉

Excellent display of the logical fallacy "two wrongs make a right".

MrTeal · Oct 1, 2024

marees said:
So will AMD announce a 'clearance' sale of 7900xtx, 7900xt, & 7900 gre ???

If they don't their board partners seem to be. An MSI Gaming Trio 7900 XTX was going for CAD1000 the other day on Amazon before selling out, which is ~US$740.

gdansk · Oct 1, 2024

MLID claims it is delayed because it didn't meet his timeline.

Meanwhile I've been saying CES for months. It seems my source needs fewer corrections than his source(s). And I don't even have a source.

marees · Oct 1, 2024

gdansk said:
MLID claims it is delayed because it didn't meet his timeline.

Meanwhile I've been saying CES for months. It seems my source needs fewer corrections than his source(s). And I don't even have a source.

Kyle Bennett from hardocp insisted right after an early quarterly results (where for the first time console sales started to tank & Lisa gave further bad guidance for rest of 2024) that we won't see stock on shelves until 2025

This was confirmed officially by power color rep in computex & unofficially by other board partners in same event

soresu · Oct 1, 2024

Mahboi said:
I don't think 8K was ever going to be worth

You can just stop right there - no further words necessary 😂🤣

SolidQ · Oct 1, 2024

That prepare for RDNA4? Interesting HYPR-RX have FSR2?? 🙄

Ranulf · Oct 1, 2024

Yay, more long acronyms.

Mopetar · Oct 1, 2024

Can anyone tell me if AMD's BS fake frame bars are as big as NVidia's BA fake frame bars? I'm a cretinous pillock easily impressed by gimmicks and am glad that now that AMD is a software company they can focus on delivering the kind of thing that gamers really want. I just want to know if they're delivering that experience better than Nvidia. Okay, I actually just want to know if it's close enough to make Nvidia lower their prices.

/s

Discussion RDNA4 + CDNA3 Architectures Thread

Golden Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Golden Member

Golden Member

Golden Member

Golden Member

Golden Member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Member

Senior member

Member

Platinum Member

Diamond Member

Diamond Member

Diamond Member

Platinum Member

Diamond Member

Golden Member

Platinum Member

Diamond Member