Discussion RDNA4 + CDNA3 Architectures Thread

DisEnchantment · Mar 23, 2022

With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits

History for llvm/lib/Target/AMDGPU - llvm/llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. - History for llvm/lib/Target/AMDGPU - llvm/llvm-project

github.com

Or Phoronix

More AMD "GFX940" Enablement Work Landing In LLVM - Phoronix

www.phoronix.com

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it

This is nuts, MI100/200/300 cadence is impressive.

Previous thread on CDNA2 and RDNA3 here

Question - Speculation: RDNA3 + CDNA2 Architectures Thread

Man I have been dying to make this one for a while now. First rumours for RDNA3 are here so new thread time! Just going to start off with this one for now: kopite7kimi on Twitter: "@VideoCardz Ah, I mean a simple mcm design with 10240 cores is not enough. Because the lift from RDNA2 to RDNA3...

forums.anandtech.com

Joe NYC · Aug 22, 2025

bender250 said:
According to Morgan Stanley Research, MI300X and MI355X are not profitable to run LLM inference, having worse returns than even Huawei platform. I would get it for training, but for inference? The claim is software is to blame. I could not find the original report to understand their methodology, but just common sense seems to be off (no one would be buying such accelerators). But maybe there is still some truth and this is why the sales are not so high as "investors in the second shovel company in AI gold rush" hoped?
View attachment 129071

Secondary source: https://wccftech.com/nvidia-blackwe...ns-miles-ahead-of-amd-software-optimizations/

I have seen that disputed (on Twitter) by someone who run the cloud company using these.

jpiniero · Aug 22, 2025

Joe NYC said:
I have seen that disputed (on Twitter) by someone who run the cloud company using these.

The other products are very likely to be not profitable either.

Mopetar · Aug 22, 2025

jpiniero said:
The other products are very likely to be not profitable either.

HPC actually stands for Hype per (venture) Capital months of operation. AMD had a somewhat higher BB (bubble burst) factor so companies are more hesitant to squander some else's money on (invest in) their product. Current LLMs (largesse luring models) are proving to be unsustainable regardless of vendor though based on reports of a man in a leather jacket trying to figure out who to sell shovels to next.

adroc_thurston · Aug 22, 2025

Mopetar said:
Current LLMs (largesse luring models) are proving to be unsustainable regardless of vendor

Oh no they are actually useful in places.
Just that the current amount of cash sunk into them is silly.

dr1337 · Aug 22, 2025

bender250 said:
According to Morgan Stanley Research, MI300X and MI355X are not profitable to run LLM inference, having worse returns than even Huawei platform. I would get it for training, but for inference? The claim is software is to blame. I could not find the original report to understand their methodology, but just common sense seems to be off (no one would be buying such accelerators). But maybe there is still some truth and this is why the sales are not so high as "investors in the second shovel company in AI gold rush" hoped?
View attachment 129071

Secondary source: https://wccftech.com/nvidia-blackwe...ns-miles-ahead-of-amd-software-optimizations/

it would be very interesting to know if these are benchmark numbers, or vendor numbers

if this is purely considering customers only, then the data is bupkis. nvidia has been long entrenched with their customers and in terms of profit count, there should be more profitable clusters operating on nvidia hardware entirely from marketshare alone.

Like where are the google numbers coming from? How are they deciding profit on that? Gemini licensing sales? Are people actually paying google big money on business accounts to use gemini?

Joe NYC · Aug 22, 2025

dr1337 said:
it would be very interesting to know if these are benchmark numbers, or vendor numbers

if this is purely considering customers only, then the data is bupkis. nvidia has been long entrenched with their customers and in terms of profit count, there should be more profitable clusters operating on nvidia hardware entirely from marketshare alone.

Like where are the google numbers coming from? How are they deciding profit on that? Gemini licensing sales? Are people actually paying google big money on business accounts to use gemini?

Precisely. It is just "data" points derived through analysis, based on a set of assumptions, where half of the assumptions may be incorrect.

There is no raw data.

marees · Aug 26, 2025

AMD RDNA 4 GPU Architecture at Hot Chips 2025

By Ryan Smith - August 25, 2025

Recapping AMD’s logical design, with a single GPU being made of (up to) multiple shader engines. The L2 cache was enlarged this generation to better prep the architecture for RT workloads. This also marks the 3rd generation of AMD’s Infinity Cache. All of which works to help keep the core fed.

RT performance in a nutshell: doubling BVH throughput has added most of RDNA 4’s RT performance improvements. But OoO memory, hardware instance transform, and oriented bounding boxes all further add to that, allowing for ~2x RT performance versus RDNA 3.

RDNA 4 also makes some updates to the shader engine with dynamic register allocations.
RT tends to eat up a lot of registers, but not during all stages of RT execution. Traversal uses relatively few registers, for example.
RDNA 3 would allocate registers based on the worst case scenario. RDNA 4, on the other hand, can allocate registers dynamically, allowing only the needed registers to be used, and then released once they’re no longer needed.
In practice, this has allowed AMD to increase the number of waves in flight versus RDNA 3, by squeezing in another wave to the freed-up registers.

The media block received some major encoder updates, such as adding B-frames for AV1 encoding, and overall lower latency.

Meanwhile the display block adds some features such as integrating Radeon image sharpening 2 into the block itself, rather than processing it as a shader effect.

AMD RDNA 4 GPU Architecture at Hot Chips 2025

Kicking off this afternoon’s graphics track at Hot Chips 2025 is AMD. The company launched its RDNA 4 architecture and associated Radeon RX 9000 series video cards earlier this year, releasing two GPUs thus far. As AMD is now well into this generation of Radeon GPUs, the company doesn’t...

www.servethehome.com

GodisanAtheist · Aug 26, 2025

Didn't realize Ryan had a new home...

Vikv1918 · Sep 1, 2025

AMD was supposed to provide driver support for Shader Execution Re-ordering in "Summer 2025". Now maybe AMD considers September to be a summer month, or its just yet another missed deadline.

igor_kavinski · Sep 1, 2025

Probably gonna wait until an actual game supports SER. Developers may already have a working beta driver.

coercitiv · Sep 3, 2025

One year ago AMD announced their shift in strategy, supposedly focused on regaining market share:

So, my number one priority right now is to build scale, to get us to 40 to 50 percent of the market faster. Do I want to go after 10% of the TAM [Total Addressable Market] or 80%? I’m an 80% kind of guy because I don’t want AMD to be the company that only people who can afford Porsches and Ferraris can buy. We want to build gaming systems for millions of users.

AMD’s Jack Huynh

So how does this pan out? AMD market share has increased by... checks notes... -2.1%. Yep, that's a minus.

Q2’25 PC graphics add-in board shipments increased 27.0% from last quarter

igor_kavinski · Sep 3, 2025

coercitiv said:
So how does this pan out? AMD market share has increased by... checks notes... -2.1%. Yep, that's a minus.

Having nothing for laptops will do that.

People (translation: ignoramuses) see only Nvidia in laptops and assume they are the best. Having a sticker on laptops helps a lot.

adroc_thurston · Sep 3, 2025

coercitiv said:
So how does this pan out? AMD market share has increased by... checks notes... -2.1%. Yep, that's a minus.

well, yeah.
As I've said, the only way to gain mss is to *win*.
Until they ship a 800W, 300+ CU monster, they lose. Until their MSS is 0%.

igor_kavinski said:
People (translation: ignoramuses) see only Nvidia in laptops and assume they are the best. Having a sticker on laptops helps a lot.

laptops are a collateral market to AIB DT.
You need to win here to win there.

igor_kavinski · Sep 3, 2025

adroc_thurston said:
You need to win here to win there.

Or AMD marketing is inept.

ToTTenTranz · Sep 3, 2025

coercitiv said:
One year ago AMD announced their shift in strategy, supposedly focused on regaining market share:

AMD’s Jack Huynh

So how does this pan out? AMD market share has increased by... checks notes... -2.1%. Yep, that's a minus.

Q2’25 PC graphics add-in board shipments increased 27.0% from last quarter
View attachment 129581

The 9070XT only released in March, and the 9060XT in June. Nvidia introduced 6 SKUs out of 4 different chips between January and May.
AMD insisted on waiting for being ultra sure about their competitors' prices before even starting to think about launching their own products.

AMD was months late with RDNA4, for playing that game. Stupidest thing was I (and many others) saw a fully functioning RX9070 XT at CES inside a PC running FSR4 on Ratchet and Clank, and some others on display. But then AMD went silent for 2 whole months.

adroc_thurston · Sep 3, 2025

igor_kavinski said:
Or AMD marketing is inept.

You can't market a product that doesn't win.

Either they ship 320CUs@800W and win, or they lose.
Pretty dang simple!

adroc_thurston · Sep 3, 2025

ToTTenTranz said:
The 9070XT only released in March, and the 9060XT in June. Nvidia introduced 6 SKUs out of 4 different chips between January and May.
AMD insisted on waiting for being ultra sure about their competitors' prices before even starting to think about launching their own products.

AMD was months late with RDNA4, for playing that game. Stupidest thing was I (and many others) saw a fully functioning RX9070 XT at CES inside a PC running FSR4 on Ratchet and Clank, and some others on display. But then AMD went silent for 2 whole months.

None of that matters.
They just don't win, and you can't move boards when you don't win.
It's pretty dang simple (but ATi people fail to grasp for 15+ years).

branch_suggestion · Sep 3, 2025

coercitiv said:
One year ago AMD announced their shift in strategy, supposedly focused on regaining market share:

AMD’s Jack Huynh

So how does this pan out? AMD market share has increased by... checks notes... -2.1%. Yep, that's a minus.

Q2’25 PC graphics add-in board shipments increased 27.0% from last quarter
View attachment 129581

Radeon sales are up q/q for the last couple, just that NV sales are bonkers due to releasing built up supply for the low end and bubble effects at the high end.
This cannot last just like both crypto bubbles, eventually there will be a oversupply and a normalisation of things, partially offset by the mid gen refresh.
Still the market is officially peak Intel levels of cooked, but eventually the AIB share will become less useful as laptops are progressively taken over by APUs.
Desktop DIY share is okay for AMD right now, but that is only one part of the market, Laptop/SI/OEM share for NV is >95%.
But when you factor in all GPUs shipped inc consoles and APUs, it is not impossible to challenge such a moat.

igor_kavinski · Sep 3, 2025

adroc_thurston said:
Either they ship 320CUs@800W and win, or they lose.

It's a huge risk they aren't willing to take unless they can rebadge that part as a compelling workstation product in case they don't win because "it doesn't do CUDA" or "no DLSS" or "I just hate the color red!".

adroc_thurston · Sep 3, 2025

igor_kavinski said:
It's a huge risk they aren't willing to take

duh.

igor_kavinski said:
unless they can rebadge that part as a compelling workstation product

AMD share in WS graphics is exactly 0%.

igor_kavinski said:
because "it doesn't do CUDA" or "no DLSS" or "I just hate the color red!".

That's all goalpoast cope over not having to buy the default.

adroc_thurston · Sep 3, 2025

branch_suggestion said:
But when you factor in all GPUs shipped inc consoles and APUs, it is not impossible to challenge such a moat.

yeah it is.
Either they win, or they don't.
And they don't win.

inquiss · Sep 3, 2025

adroc_thurston said:
yeah it is.
Either they win, or they don't.
And they don't win.

AT0 looks like it can win, no?

adroc_thurston · Sep 3, 2025

inquiss said:
AT0 looks like it can win, no?

No, it's a part made for careful SR-IOV slicing into cloud gaming instances.
Gamers(tm) will get a mutilated chop with castrated membw that will be a solid chunk slower than the client halo Rubin.

marees · Sep 3, 2025

ToTTenTranz said:
The 9070XT only released in March, and the 9060XT in June. Nvidia introduced 6 SKUs out of 4 different chips between January and May.
AMD insisted on waiting for being ultra sure about their competitors' prices before even starting to think about launching their own products.

AMD was months late with RDNA4, for playing that game. Stupidest thing was I (and many others) saw a fully functioning RX9070 XT at CES inside a PC running FSR4 on Ratchet and Clank, and some others on display. But then AMD went silent for 2 whole months.

Looks like what Jack Hyunh said was spin

They are not willing to go far an all out war for market share / revenue at the expense of margin

RDNA 5 strategy seems sustainable

AT0 (xx90xt) — shared with cloud gaming / professional use cases
AT2 (xx70xt) — shared with xbox
AT4 (xx50xt) — shared with Z3E (PC & 'xbox' handhelds )
AT3 (xx60xt) — shared with edge inferencing / premium laptops, tablets, NUCs

Only AT3 looks a bit dodgy. They may have to sacrifice margins there to scale

Once they master chiplets then future archs (RDNA 6 or 7) should be more scalable

adroc_thurston · Sep 3, 2025

marees said:
Looks like what Jack Hyunh said was spin

yeah lmao.

marees said:
RDNA 5 strategy seems sustainable

It's sustainable in a way that no single tapeout is wasted on client discrete graphics yes.
It's a really dang creative approach to building a stack, but also a de-facto surrender in DIY discrete graphics.

marees said:
AT4 (xx50xt) — shared with Z3E (PC & 'xbox' handhelds )

handhelds are too poor for mdsP.

Discussion RDNA4 + CDNA3 Architectures Thread

Golden Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Senior member

Diamond Member

Golden Member

AMD RDNA 4 GPU Architecture at Hot Chips 2025​

Diamond Member

Member

Lifer

Diamond Member

Lifer

Diamond Member

Lifer

Senior member

Diamond Member

Diamond Member

Senior member

Lifer

Diamond Member

Diamond Member

Senior member

Diamond Member

Golden Member

Diamond Member

AMD RDNA 4 GPU Architecture at Hot Chips 2025