Discussion RDNA4 + CDNA3 Architectures Thread

DisEnchantment · Mar 23, 2022

With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits

History for llvm/lib/Target/AMDGPU - llvm/llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. - History for llvm/lib/Target/AMDGPU - llvm/llvm-project

github.com

Or Phoronix

More AMD "GFX940" Enablement Work Landing In LLVM - Phoronix

www.phoronix.com

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it

This is nuts, MI100/200/300 cadence is impressive.

Previous thread on CDNA2 and RDNA3 here

Question - Speculation: RDNA3 + CDNA2 Architectures Thread

Man I have been dying to make this one for a while now. First rumours for RDNA3 are here so new thread time! Just going to start off with this one for now: kopite7kimi on Twitter: "@VideoCardz Ah, I mean a simple mcm design with 10240 cores is not enough. Because the lift from RDNA2 to RDNA3...

forums.anandtech.com

gaav87 · Apr 29, 2024

adroc_thurston said:
It doesn't.

That's not the point.
It's just a very opportunistic throughput hack (for w64, VOPD w32 is memey).

FLOPs don't matter.
Just do clock by shader core count, designs are very convergent anyway.

Not happening.
It's not that good.
Still a cool part.

Well what if the 2 simd's do not need to have equal results and still can use pararell simd wave slots ? That would double gaming throughput. equal results=bad for gaming.

adroc_thurston · Apr 29, 2024

gaav87 said:
and still can use pararell simd wave slots ?

?
The scheduler is shared between the baseline and the castrated SIMDs.

gaav87 said:
That would double gaming throughput

?
Already does for w64.
Again, flops don't matter.
FLOPS are cheap.

gaav87 · Apr 29, 2024

adroc_thurston said:
?
The scheduler is shared between the baseline and the castrated SIMDs.

?
Already does for w64.
Again, flops don't matter.
FLOPS are cheap.

In w64 SIMD will try to execute a 64wide wavefront in a single cycle only if the instruction can be dual issued. But w32 have to have equal results they can carry two packages at once but only to neighbors. What if they did not have to have equal results limit the data dependency with larger wave slot count ?

adroc_thurston · Apr 29, 2024

gaav87 said:
But w32 have to have equal results they can carry two packages at once but only to neighbors

Not really equal results. read the manual.
It compiles shaders to w64 for throughput-bound segments anyway.

gaav87 said:
What if they did not have to have equal results limit the data dependency with larger wave slot count ?

?

again, flops don't matter.

Aapje · Apr 29, 2024

I just hope that it doesn't FLOP.

gaav87 · Apr 29, 2024

adroc_thurston said:
Not really equal results. read the manual.
It compiles shaders to w64 for throughput-bound segments anyway.

?

again, flops don't matter.

"Not really equal results. read the manual.
It compiles shaders to w64 for throughput-bound segments anyway"
You sure about that?
In RDNA3, there exists a second set ofALUs. However, these additional ALUs are not primarily utilized to enhance the execution speed of 64-element waves, as described in section 2.1. Instead, they come into play under specific conditions: executing two distinct instructions in parallel within a single wave, as outlined in section 7.6.The dual-issue VALU feature is only accessible in wave32.
This implies that the ALUs could potentially contribute to more efficient wave64 execution. But i am not sure.

32 waves operating on 32 ALUs. Each wave can executes within a single clock cycle. The scalar processors are doubled up, allowing them to handle an instruction per clock. There are only two waves in flight per CU in rdna, What if they could have FOUR waves.

Games are often optimized for wave32 execution. In this mode, efficiency improves significantly, completion times are faster, and fewer resources are consumed for data access. However, when operating in wave64 mode, the likelihood decreases that all elements within a wave will need to execute the same instruction. Consequently, clearing out a wave in wave64 mode takes more clock cycles, leading to performance drops.

Consider the trade-offs: as the number of elements increases, so does the cache size and die area. A larger low-level cache can result in higher latency. If they enhance how wave32 operates, as i suggested, it could potentially lead to substantial gaming performance improvements.

Anyway i'm off to sleep.

Aapje · Apr 29, 2024

gaav87 said:
Games are often optimized for wave32 execution.

This is not really true, as shaders are compiled exactly so that the final code can be optimized for the architecture of the card. And Nvidia and AMD can handcode parts of the shader for certain games to optimize it further.

An issue is that the compiler for dual issue was really poor and probably only made modest gains. See the compiler section in: https://chipsandcheese.com/2023/01/07/microbenchmarking-amds-rdna-3-graphics-architecture/

Mopetar · Apr 29, 2024

Glo. said:
Best perf/dollar GPU of current offerings.

Unfortunately in a market segment that almost no one here cares about. Also not that surprising for the cheapest GPU to have the best performance per dollar.

Also FWIW TPU's latest GPU review (if a 7900 GRE) has the Arc A580 at $165 as a better value than an RX 6600 at $200, but I don't know how accurate those prices are.

There are a few other cards (5700 XT at $200 and RX 580 (!!!) at $90) that are also listed above it, but those are older so the comparison isn't quite as fair.

MrTeal · Apr 30, 2024

At least according to pcpartpicker, the cheapest 6600 in the US is $190 while the cheapest A580 is $160. Moving to Intel is even a more adventurous step for the $150-$200 buyer than choosing AMD.

marees · Apr 30, 2024

MrTeal said:
At least according to pcpartpicker, the cheapest 6600 in the US is $190 while the cheapest A580 is $160. Moving to Intel is even a more adventurous step for the $150-$200 buyer than choosing AMD.

If AMD is still selling 6600 (remember it is TSMC 7nm that is supposed to be replaced by 6nm 7600) then AMD must have screwed up big time by overproducing them during the crypto boom, as I am guessing production must have stopped long ago

Does this mean
a) RDNA 4 will be delayed until RDNA 2 is emptied out
b) RDNA 4 will debut at a higher launch price & reduce street price gradually until old RDNA 2 (& RDNA 3) stocks are cleared out ?

blckgrffn · Apr 30, 2024

marees said:
If AMD is still selling 6600 (remember it is TSMC 7nm that is supposed to be replaced by 6nm 7600) then AMD must have screwed up big time by overproducing them during the crypto boom, as I am guessing production must have stopped long ago

Does this mean
a) RDNA 4 will be delayed until RDNA 2 is emptied out
b) RDNA 4 will debut at a higher launch price & reduce street price gradually until old RDNA 2 (& RDNA 3) stocks are cleared out ?

That would make sense, but it doesn't look that is corroborated by anything outside some forum posts in Chinese.

Certainly the 6700xt and 6600 vanilla seem to have stuck around quite awhile, its possible that as N7 prices came down and RNDA3 didn't play out quite as expected they decided to keep those alive to fill in their lineup holes. It's as much speculation as anything, but it doesn't seem like AMD really stopped making them in ~October 2023 as widely reported.

jpiniero · Apr 30, 2024

blckgrffn said:
but it doesn't seem like AMD really stopped making them in ~October 2023 as widely reported.

I just kind of assume that they've been piecemealing RDNA2 dies since Crypto crashed and AIBs are producing new boards with years old chips since then. And it's taken this long to get rid of them.

GodisanAtheist · May 1, 2024

Oh Lord, look at what you guys are doing to the internet.

AMD's cancelled RDNA 4 GPU could have doubled the 7900 XTX's performance

Data-mined code that recently emerged on the Anandtech forums indicates that AMD was working on an RDNA 4 GPU at one point, doubling most of the RX...

www.techspot.com

Also, which one of you is Daniel Sims?

Saylick · May 1, 2024

LOL

Welcome to the current state of tech journalism. Where the only thing that matters is making sure you’re the first to regurgitate the news, and if you’re not first you better make sure you’re regurgitating what others have regurgitated.

SolidQ · May 1, 2024

Fun part
"Another user claimed that the number of compute units would have increased from 96 to 200."

Real

modern journalism (c)

RnR_au · May 1, 2024

Saylick said:
LOL

Welcome to the current state of tech journalism. Where the only thing that matters is making sure you’re the first to regurgitate the news, and if you’re not first you better make sure you’re regurgitating what others have regurgitated.

igor_kavinski · May 1, 2024

RnR_au said:
View attachment 98193

You got skillz, man!

Timorous · May 1, 2024

RnR_au said:
View attachment 98193

That's the real hype train right there.

MrTeal · May 1, 2024

AMD 8970 XTXTX
2 GCD - 144 CU each, 288 CU total
3500MHz boost clock
8 MCD - 32GB GDDR7 28Gbps
$799

AMD 8990 XOXO
Dual 8970XTXTX in Crossfire
64GB total VRAM
$1399

Launching April 1, 2025.

Let's goooo...

SolidQ · May 1, 2024

@Kepler_L2
Saw twit about new RT in RDNA4.
So it's compete with Lovelace or future Blackwell?

Kepler_L2 · May 1, 2024

SolidQ said:
@Kepler_L2
Saw twit about new RT in RDNA4.
So it's compete with Lovelace or future Blackwell?

It just looks very different from RDNA2 and 3 RT.
gfx10/11:

gfx12:

SolidQ · May 1, 2024

Kepler_L2 said:
It just looks very different from RDNA2 and 3 RT.

No BVH8 in code yet? As PS5pro have it

Kepler_L2 · May 1, 2024

SolidQ said:
No BVH8 in code yet? As PS5pro have it

There is no mention of BVH8 yet. But all BVH4 code is gone.

Hans Gruber · May 1, 2024

Blackwell is going to smoke AMD and Intel. The price/performance that RDNA 4 offers is what will make RDNA 4 compelling. The AMD of old was masterful at providing value in the price/performance category. AMD is doing quite well with their GPU drivers in my opinion. They need to spend more money and time dialing in their drivers.

The rumor that AMD is going to use 18Gb/s is not good news. Nvidia is going to be on GDDR7 while AMD will continue to use GDDR6. Why would AMD use slow GDDR6 modules when faster GDDR6 is available?

Ghostsonplanets · May 1, 2024

Because it's cheaper for their bom. I also wouldn’t count on AMD undercutting Nvidia. They have their margins to maintain and people has showed them that they can pay up when it's about Nvidia GPUs prices. So they want the same.

Discussion RDNA4 + CDNA3 Architectures Thread

Golden Member

Senior member

Diamond Member

Senior member

Diamond Member

Golden Member

Senior member

Golden Member

Diamond Member

Diamond Member

Platinum Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Golden Member

Platinum Member

Lifer

Golden Member

Diamond Member

Golden Member

Golden Member

Golden Member

Golden Member

Platinum Member

Senior member