Discussion RDNA4 + CDNA3 Architectures Thread

Page 472 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DisEnchantment

Golden Member
Mar 3, 2017
1,777
6,791
136
1655034287489.png
1655034259690.png

1655034485504.png

With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits
Or Phoronix

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it :grimacing:

This is nuts, MI100/200/300 cadence is impressive.

1655034362046.png

Previous thread on CDNA2 and RDNA3 here

 
Last edited:

ToTTenTranz

Senior member
Feb 4, 2021
686
1,146
136
I picked up a used Red Devil. I am testing both bios, but I pick silent for my own usage.


I wonder if anyone is picking the "performance bios" setting outside of overclocking and benchmarking.

Like yeah sure, I'm totally on board of making my GPU push 20% more power and make twice the audible noise in exchange for a 1.2% average performance boost.
 

DAPUNISHER

Super Moderator CPU Forum Mod and Elite Member
Super Moderator
Aug 22, 2001
32,011
32,462
146
I wonder if anyone is picking the "performance bios" setting outside of overclocking and benchmarking.

Like yeah sure, I'm totally on board of making my GPU push 20% more power and make twice the audible noise in exchange for a 1.2% average performance boost.
Exactly. If it was the difference between silky smooth or not, I'd use it. Hell, I'd overclock it even further if it gets it there. But performance so far compared to the 7900XTX is

a5si54.jpg

Speedway stress test is rock solid. https://www.3dmark.com/swst/1061556
 

Vikv1918

Member
Mar 12, 2025
42
135
66

AMD Software: Adrenalin Developer Preview Edition 25.10.07.01 for Microsoft® Agility SDK Support Release Notes​



Expanded AgilitySDK Support
  • AMD Radeon™ RX 7000 and 9000 series graphics products will support:
    • Tiled Resource Tier 4
    • Fence Barriers
    • AppSpecificDriverState
    • Shader Execution Reordering (“MaybeReorderThreads” does not move threads)

  • Only AMD Radeon™ RX 7000 series graphics products will support:
    • Video Encoding Update to DDI 112 with the following features:
      • Video encode subregion (e.g slice/tile) notifications
      • Video encode GPU texture input QP map
      • Video encode Dirty map full frame skip
      • Video encode GPU texture/CPU buffer motion vector hints
    • AppSpecificDriverState + RecreateAtGPUVA

  • Only AMD Radeon™ RX 9000 series graphics products will support:
    • Cooperative Vectors 1.0
 

marees

Golden Member
Apr 28, 2024
1,727
2,367
96
On RDNA4, AMD added a “Radeon Image Sharpening” filter, letting the display engine sharpen the final image. Using dedicated hardware at the display engine instead of the GPU’s programmable shaders means that the sharpening filter won’t impact performance and can be carried out with better power efficiency.


RDNA4’s scalar unit gains a few floating point instructions, expanding scalar offload opportunities. This capability debuted on RDNA3.5, but RDNA4 brings it to discrete GPUs.


Split Barriers​

GPUs use barriers to synchronize threads and enforce memory ordering. For example, a s_barrier instruction on older AMD GPUs would cause a thread to wait until all of its peers in the workgroup also reached the s_barrier instruction. Barriers degrade performance because any thread that happened to reach the barrier faster would have to stall until its peers catch up.
RDNA4 splits the barrier into separate “signal” and “wait” actions. Instead of s_barrier, RDNA4 has s_barrier_signal and s_barrier_wait. A thread can “signal” the barrier once it produces data that other threads might need. It can then do independent work, and only wait on the barrier once it needs to use data produced by other threads. The s_barrier_wait will then stall the thread until all other threads in the workgroup have signalled the barrier.


AMD found that difficult workloads like raytracing benefit from the larger L2. Raytracing involves pointer chasing during BVH traversal, and it’s not surprising that it’s more sensitive to accesses getting serviced from the slower Infinity Cache as opposed to L2


 

marees

Golden Member
Apr 28, 2024
1,727
2,367
96
Speaking of the devil:


So it seems SOCAMMs work for next-gen Nvidia Grace GB300 servers:

View attachment 120651

View attachment 120650

While it's not classical sockets ( no CAMM modules are) they surely seem replacable. Yeah it won't give GDDR7 level bandwidth, but judging by Strix Halo, M4 Max, etc .... it should be enough for even some mid-range GPUs.
NVIDIA Cancels SOCAMM1, Moves to SOCAMM2 — Opportunity for Samsung and SK

 
  • Haha
Reactions: soresu

basix

Senior member
Oct 4, 2024
241
493
96
nVidia really need to think things through a little more.

Clearly the new PCIe power connector debacle taught them nothing.

As far as I understand it is mechanically compatible, but runs at 9600 MT/s instead of 8533 MT/s. So not a huge difference. Your Host-Chip, PCB and wiring needs to be capable of the faster speeds but that seems to be it. Not sure if that is enough the warrant the version bump to "2".

SOCAMM2 will probably also get JEDEC approval, whereas SOCAMM has not. SOCAMM2 might add some additional requirements and to make a clear distinction with version 1 as well as ensure proper compatibility, I don't know. Maybe it could be super simple because there is a CAMM2 and LPCAMM2 JEDEC standard out there in the wild. To keep the naming with SOCAMM2 close together it was "renamed" to SOCAMM2
 
Last edited:

Vikv1918

Member
Mar 12, 2025
42
135
66
Some people are claiming they got the INT8 version of FSR4 working, which was accidentally leaked last month by AMD on Github.


It works even on nvidia GPUs lmao. On 7800 XT it takes 2ms to upscale 1440p balanced, not bad. And it works on windows unlike the emulated Linux method.

Edit: Theyre also claiming it should work well on RDNA2 as the performance shown here is without WMMA acceleration, which means this performance should be identical to something like a 6800 XT. That would be great for RDNA2 users if true.
 
Last edited:

soresu

Diamond Member
Dec 19, 2014
4,099
3,553
136
Some people are claiming they got the INT8 version of FSR4 working, which was accidentally leaked last month by AMD on Github.


It works even on nvidia GPUs lmao. On 7800 XT it takes 2ms to upscale 1440p balanced, not bad. And it works on windows unlike the emulated Linux method.

Edit: Theyre also claiming it should work well on RDNA2 as the performance shown here is without WMMA acceleration, which means this performance should be identical to something like a 6800 XT. That would be great for RDNA2 users if true.
Doubtful that Redstone will be able to run of pre RDNA4 GPUs tho (with perhaps the exception of 7900 XTX for its sheer compute powah)

Too much ML work and those µArchs simply aren't optimal for it any more than Pascal and Kepler are.
 

marees

Golden Member
Apr 28, 2024
1,727
2,367
96
Doubtful that Redstone will be able to run of pre RDNA4 GPUs tho (with perhaps the exception of 7900 XTX for its sheer compute powah)

Too much ML work and those µArchs simply aren't optimal for it any more than Pascal and Kepler are.
ML2CODE framework converts inferencing code to shader binary

Any sufficiently powerful RDNA 2 or 3 gpu should be able to do path tracing using Redstone
 
  • Like
Reactions: Tlh97

soresu

Diamond Member
Dec 19, 2014
4,099
3,553
136
ML2CODE framework converts inferencing code to shader binary

Any sufficiently powerful RDNA 2 or 3 gpu should be able to do path tracing using Redstone
We'll see.

For now I'm patiently pessimistic (and not a little smug now I have a 9060 😅).
 

marees

Golden Member
Apr 28, 2024
1,727
2,367
96
We'll see.

For now I'm patiently pessimistic (and not a little smug now I have a 9060 😅).
I expect 2 different path tracing branch. One for RDNA 4 & above
Another for RDNA 3 & below

I would suppose Chris Hall knows what he is talking about
👇👇

A Japanese tech journalist came to USA & interviewed Chris Hall AMD's software head on ROCm & FSR 4 Firestone

It appears Firestone path tracing software can run on any GPU since it is compiled to shader code using ML2CODE framework


View attachment 130158
 
  • Like
Reactions: Tlh97

soresu

Diamond Member
Dec 19, 2014
4,099
3,553
136
I expect 2 different path tracing branch. One for RDNA 4 & above
Another for RDNA 3 & below

I would suppose Chris Hall knows what he is talking about
👇👇
Emphasis on the Japanese journalist part.

I wouldn't entirely trust the translation unless it came from a human, as Google has tripped over translating basic stuff from non European languages in my experience.

Trying to find meaning of basic repeated Japanese terms from Naruto ends up spitting out meaningless words in English.
 

marees

Golden Member
Apr 28, 2024
1,727
2,367
96
Emphasis on the Japanese journalist part.

I wouldn't entirely trust the translation unless it came from a human, as Google has tripped over translating basic stuff from non European languages in my experience.

Trying to find meaning of basic repeated Japanese terms from Naruto ends up spitting out meaningless words in English.
It is widely quoted in Japanese forums & AMD hasn't refuted it yet.

I would go with this is what AMD is trying to do with Redstone. Get it to work on base PS5
 

soresu

Diamond Member
Dec 19, 2014
4,099
3,553
136
Get it to work on base PS5
Yes, but that is mainly down to Sony, and it's not like they are slouching on this.

On AMD's end it's far more likely about getting it all working on their Zen4 and 5 APUs given they don't have any RDNA4 APUs as yet, and probably won't until Medusa Halo (if that even happens).
 

marees

Golden Member
Apr 28, 2024
1,727
2,367
96
Yes, but that is mainly down to Sony, and it's not like they are slouching on this.

On AMD's end it's far more likely about getting it all working on their Zen4 and 5 APUs given they don't have any RDNA4 APUs as yet, and probably won't until Medusa Halo (if that even happens).
As I said, I would take Chris Hall on his word, unless otherwise there is something or someone who contradicts that. So far none.