Discussion RDNA4 + CDNA3 Architectures Thread

DisEnchantment · Mar 23, 2022

With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits

History for llvm/lib/Target/AMDGPU - llvm/llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. - History for llvm/lib/Target/AMDGPU - llvm/llvm-project

github.com

Or Phoronix

More AMD "GFX940" Enablement Work Landing In LLVM - Phoronix

www.phoronix.com

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it

This is nuts, MI100/200/300 cadence is impressive.

Previous thread on CDNA2 and RDNA3 here

Question - Speculation: RDNA3 + CDNA2 Architectures Thread

Man I have been dying to make this one for a while now. First rumours for RDNA3 are here so new thread time! Just going to start off with this one for now: kopite7kimi on Twitter: "@VideoCardz Ah, I mean a simple mcm design with 10240 cores is not enough. Because the lift from RDNA2 to RDNA3...

forums.anandtech.com

adroc_thurston · Sep 9, 2025

SolidQ said:
RDNA4 refresh?

No.

DAPUNISHER · Sep 11, 2025

I picked up a used Red Devil. I am testing both bios, but I pick silent for my own usage.

ToTTenTranz · Sep 11, 2025

DAPUNISHER said:
I picked up a used Red Devil. I am testing both bios, but I pick silent for my own usage.

I wonder if anyone is picking the "performance bios" setting outside of overclocking and benchmarking.

Like yeah sure, I'm totally on board of making my GPU push 20% more power and make twice the audible noise in exchange for a 1.2% average performance boost.

DAPUNISHER · Sep 11, 2025

ToTTenTranz said:
I wonder if anyone is picking the "performance bios" setting outside of overclocking and benchmarking.

Like yeah sure, I'm totally on board of making my GPU push 20% more power and make twice the audible noise in exchange for a 1.2% average performance boost.

Exactly. If it was the difference between silky smooth or not, I'd use it. Hell, I'd overclock it even further if it gets it there. But performance so far compared to the 7900XTX is

Speedway stress test is rock solid. https://www.3dmark.com/swst/1061556

Vikv1918 · Sep 11, 2025

AMD Software: Adrenalin Developer Preview Edition 25.10.07.01 for Microsoft® Agility SDK Support Release Notes

https://www.amd.com/en/resources/support-articles/release-notes/RN-RAD-MS-AGILITY-SDK-25-10-07-01.html

Expanded AgilitySDK Support

AMD Radeon™ RX 7000 and 9000 series graphics products will support:
- Tiled Resource Tier 4
- Fence Barriers
- AppSpecificDriverState
- Shader Execution Reordering (“MaybeReorderThreads” does not move threads)

Only AMD Radeon™ RX 7000 series graphics products will support:
- Video Encoding Update to DDI 112 with the following features:
  - Video encode subregion (e.g slice/tile) notifications
  - Video encode GPU texture input QP map
  - Video encode Dirty map full frame skip
  - Video encode GPU texture/CPU buffer motion vector hints
- AppSpecificDriverState + RecreateAtGPUVA

Only AMD Radeon™ RX 9000 series graphics products will support:
- Cooperative Vectors 1.0

marees · Sep 14, 2025

On RDNA4, AMD added a “Radeon Image Sharpening” filter, letting the display engine sharpen the final image. Using dedicated hardware at the display engine instead of the GPU’s programmable shaders means that the sharpening filter won’t impact performance and can be carried out with better power efficiency.

RDNA4’s scalar unit gains a few floating point instructions, expanding scalar offload opportunities. This capability debuted on RDNA3.5, but RDNA4 brings it to discrete GPUs.

Split Barriers

GPUs use barriers to synchronize threads and enforce memory ordering. For example, a s_barrier instruction on older AMD GPUs would cause a thread to wait until all of its peers in the workgroup also reached the s_barrier instruction. Barriers degrade performance because any thread that happened to reach the barrier faster would have to stall until its peers catch up.
RDNA4 splits the barrier into separate “signal” and “wait” actions. Instead of s_barrier, RDNA4 has s_barrier_signal and s_barrier_wait. A thread can “signal” the barrier once it produces data that other threads might need. It can then do independent work, and only wait on the barrier once it needs to use data produced by other threads. The s_barrier_wait will then stall the thread until all other threads in the workgroup have signalled the barrier.

AMD found that difficult workloads like raytracing benefit from the larger L2. Raytracing involves pointer chasing during BVH traversal, and it’s not surprising that it’s more sensitive to accesses getting serviced from the slower Infinity Cache as opposed to L2

AMD’s RDNA4 GPU Architecture at Hot Chips 2025

RDNA4 is AMD’s latest graphics-focused architecture, and fills out their RX 9000 line of discrete GPUs.

chipsandcheese.com

marees · Sep 14, 2025

Gideon said:
Speaking of the devil:

Micron SOCAMM Memory Powers Next-Gen NVIDIA Grace GB300 Servers

This is the new Micron SOCAMM with LPDDR5X memory that will power next-generation NVIDIA Grace GB300 servers

www.servethehome.com

So it seems SOCAMMs work for next-gen Nvidia Grace GB300 servers:

View attachment 120651

View attachment 120650

While it's not classical sockets ( no CAMM modules are) they surely seem replacable. Yeah it won't give GDDR7 level bandwidth, but judging by Strix Halo, M4 Max, etc .... it should be enough for even some mid-range GPUs.

NVIDIA Cancels SOCAMM1, Moves to SOCAMM2 — Opportunity for Samsung and SK

https://twitter.com/x/status/1967145562966544733

soresu · Sep 14, 2025

marees said:
NVIDIA Cancels SOCAMM1, Moves to SOCAMM2 — Opportunity for Samsung and SK

https://twitter.com/x/status/1967145562966544733

nVidia really need to think things through a little more.

Clearly the new PCIe power connector debacle taught them nothing.

adroc_thurston · Sep 14, 2025

soresu said:
nVidia really need to think things through a little more.

They're not AMD, have no experience in pushing memory standards out into the market.

basix · Sep 15, 2025

soresu said:
nVidia really need to think things through a little more.

Clearly the new PCIe power connector debacle taught them nothing.

As far as I understand it is mechanically compatible, but runs at 9600 MT/s instead of 8533 MT/s. So not a huge difference. Your Host-Chip, PCB and wiring needs to be capable of the faster speeds but that seems to be it. Not sure if that is enough the warrant the version bump to "2".

SOCAMM2 will probably also get JEDEC approval, whereas SOCAMM has not. SOCAMM2 might add some additional requirements and to make a clear distinction with version 1 as well as ensure proper compatibility, I don't know. Maybe it could be super simple because there is a CAMM2 and LPCAMM2 JEDEC standard out there in the wild. To keep the naming with SOCAMM2 close together it was "renamed" to SOCAMM2

Vikv1918 · Sep 15, 2025

Some people are claiming they got the INT8 version of FSR4 working, which was accidentally leaked last month by AMD on Github.

https://www.reddit.com/r/radeon/comments/1nhkkr8/fsr_sdk_leak_contained_fsr_4_files_that_work_on

It works even on nvidia GPUs lmao. On 7800 XT it takes 2ms to upscale 1440p balanced, not bad. And it works on windows unlike the emulated Linux method.

Edit: Theyre also claiming it should work well on RDNA2 as the performance shown here is without WMMA acceleration, which means this performance should be identical to something like a 6800 XT. That would be great for RDNA2 users if true.

soresu · Sep 16, 2025

Vikv1918 said:
Some people are claiming they got the INT8 version of FSR4 working, which was accidentally leaked last month by AMD on Github.

https://www.reddit.com/r/radeon/comments/1nhkkr8/fsr_sdk_leak_contained_fsr_4_files_that_work_on

It works even on nvidia GPUs lmao. On 7800 XT it takes 2ms to upscale 1440p balanced, not bad. And it works on windows unlike the emulated Linux method.

Edit: Theyre also claiming it should work well on RDNA2 as the performance shown here is without WMMA acceleration, which means this performance should be identical to something like a 6800 XT. That would be great for RDNA2 users if true.

Doubtful that Redstone will be able to run of pre RDNA4 GPUs tho (with perhaps the exception of 7900 XTX for its sheer compute powah)

Too much ML work and those µArchs simply aren't optimal for it any more than Pascal and Kepler are.

Thunder 57 · Sep 16, 2025

I wouldn't be happy if I bought RDNA3.

marees · Sep 16, 2025

soresu said:
Doubtful that Redstone will be able to run of pre RDNA4 GPUs tho (with perhaps the exception of 7900 XTX for its sheer compute powah)

Too much ML work and those µArchs simply aren't optimal for it any more than Pascal and Kepler are.

ML2CODE framework converts inferencing code to shader binary

Any sufficiently powerful RDNA 2 or 3 gpu should be able to do path tracing using Redstone

soresu · Sep 16, 2025

marees said:
ML2CODE framework converts inferencing code to shader binary

Any sufficiently powerful RDNA 2 or 3 gpu should be able to do path tracing using Redstone

We'll see.

For now I'm patiently pessimistic (and not a little smug now I have a 9060 😅).

soresu · Sep 16, 2025

Thunder 57 said:
I wouldn't be happy if I bought RDNA3.

My general impression of it was of the RDNA equivalent to Vega 10.

Interesting new directions, but largely botched implementation.

marees · Sep 16, 2025

soresu said:
We'll see.

For now I'm patiently pessimistic (and not a little smug now I have a 9060 😅).

I expect 2 different path tracing branch. One for RDNA 4 & above
Another for RDNA 3 & below

I would suppose Chris Hall knows what he is talking about
👇👇

marees said:
A Japanese tech journalist came to USA & interviewed Chris Hall AMD's software head on ROCm & FSR 4 Firestone

It appears Firestone path tracing software can run on any GPU since it is compiled to shader code using ML2CODE framework

View attachment 130158

https://twitter.com/x/status/1933748231139098877

soresu · Sep 16, 2025

marees said:
NVIDIA Cancels SOCAMM1, Moves to SOCAMM2 — Opportunity for Samsung and SK

https://twitter.com/x/status/1967145562966544733

Should be noted that this really doesn't belong in this thread as it's extremely unlikely to affect any RDNA4 or CDNA3 SKUs.

soresu · Sep 16, 2025

marees said:
I expect 2 different path tracing branch. One for RDNA 4 & above
Another for RDNA 3 & below

I would suppose Chris Hall knows what he is talking about
👇👇

Emphasis on the Japanese journalist part.

I wouldn't entirely trust the translation unless it came from a human, as Google has tripped over translating basic stuff from non European languages in my experience.

Trying to find meaning of basic repeated Japanese terms from Naruto ends up spitting out meaningless words in English.

marees · Sep 16, 2025

soresu said:
Emphasis on the Japanese journalist part.

I wouldn't entirely trust the translation unless it came from a human, as Google has tripped over translating basic stuff from non European languages in my experience.

Trying to find meaning of basic repeated Japanese terms from Naruto ends up spitting out meaningless words in English.

It is widely quoted in Japanese forums & AMD hasn't refuted it yet.

I would go with this is what AMD is trying to do with Redstone. Get it to work on base PS5

soresu · Sep 16, 2025

marees said:
Get it to work on base PS5

Yes, but that is mainly down to Sony, and it's not like they are slouching on this.

On AMD's end it's far more likely about getting it all working on their Zen4 and 5 APUs given they don't have any RDNA4 APUs as yet, and probably won't until Medusa Halo (if that even happens).

marees · Sep 16, 2025

soresu said:
Yes, but that is mainly down to Sony, and it's not like they are slouching on this.

On AMD's end it's far more likely about getting it all working on their Zen4 and 5 APUs given they don't have any RDNA4 APUs as yet, and probably won't until Medusa Halo (if that even happens).

As I said, I would take Chris Hall on his word, unless otherwise there is something or someone who contradicts that. So far none.

adroc_thurston · Sep 16, 2025

soresu said:
and probably won't until Medusa Halo (if that even happens).

mdsH is gfx13, not 12.

soresu · Sep 16, 2025

adroc_thurston said:
mdsH is gfx13, not 12.

That's what I hoped, but IMHO Medusa Point should have at least gfx12, if not gfx13 too.

adroc_thurston · Sep 16, 2025

soresu said:
but IMHO Medusa Point should have at least gfx12, if not gfx13 too.

Medusa Point is like 6 different parts.
Which one?

Discussion RDNA4 + CDNA3 Architectures Thread

Golden Member

Diamond Member

Super Moderator CPU Forum Mod and Elite Member

Senior member

Super Moderator CPU Forum Mod and Elite Member

Member

AMD Software: Adrenalin Developer Preview Edition 25.10.07.01 for Microsoft® Agility SDK Support Release Notes​

Golden Member

Split Barriers​

Golden Member

Diamond Member

Diamond Member

Senior member

Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Diamond Member

AMD Software: Adrenalin Developer Preview Edition 25.10.07.01 for Microsoft® Agility SDK Support Release Notes

Split Barriers