Discussion RDNA4 + CDNA3 Architectures Thread

Page 97 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DisEnchantment

Golden Member
Mar 3, 2017
1,747
6,598
136
1655034287489.png
1655034259690.png

1655034485504.png

With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits
Or Phoronix

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it :grimacing:

This is nuts, MI100/200/300 cadence is impressive.

1655034362046.png

Previous thread on CDNA2 and RDNA3 here

 
Last edited:

gaav87

Member
Apr 27, 2024
120
159
76
It doesn't.

That's not the point.
It's just a very opportunistic throughput hack (for w64, VOPD w32 is memey).

FLOPs don't matter.
Just do clock by shader core count, designs are very convergent anyway.

Not happening.
It's not that good.
Still a cool part.
Well what if the 2 simd's do not need to have equal results and still can use pararell simd wave slots ? That would double gaming throughput. equal results=bad for gaming.
 

gaav87

Member
Apr 27, 2024
120
159
76
?
The scheduler is shared between the baseline and the castrated SIMDs.

?
Already does for w64.
Again, flops don't matter.
FLOPS are cheap.
In w64 SIMD will try to execute a 64wide wavefront in a single cycle only if the instruction can be dual issued. But w32 have to have equal results they can carry two packages at once but only to neighbors. What if they did not have to have equal results limit the data dependency with larger wave slot count ?
 

gaav87

Member
Apr 27, 2024
120
159
76
Not really equal results. read the manual.
It compiles shaders to w64 for throughput-bound segments anyway.

?

again, flops don't matter.
"Not really equal results. read the manual.
It compiles shaders to w64 for throughput-bound segments anyway"
You sure about that?
In RDNA3, there exists a second set ofALUs. However, these additional ALUs are not primarily utilized to enhance the execution speed of 64-element waves, as described in section 2.1. Instead, they come into play under specific conditions: executing two distinct instructions in parallel within a single wave, as outlined in section 7.6.The dual-issue VALU feature is only accessible in wave32.
This implies that the ALUs could potentially contribute to more efficient wave64 execution. But i am not sure.

32 waves operating on 32 ALUs. Each wave can executes within a single clock cycle. The scalar processors are doubled up, allowing them to handle an instruction per clock. There are only two waves in flight per CU in rdna, What if they could have FOUR waves.

Games are often optimized for wave32 execution. In this mode, efficiency improves significantly, completion times are faster, and fewer resources are consumed for data access. However, when operating in wave64 mode, the likelihood decreases that all elements within a wave will need to execute the same instruction. Consequently, clearing out a wave in wave64 mode takes more clock cycles, leading to performance drops.

Consider the trade-offs: as the number of elements increases, so does the cache size and die area. A larger low-level cache can result in higher latency. If they enhance how wave32 operates, as i suggested, it could potentially lead to substantial gaming performance improvements.

Anyway i'm off to sleep.
 

Aapje

Golden Member
Mar 21, 2022
1,508
2,061
106
Games are often optimized for wave32 execution.
This is not really true, as shaders are compiled exactly so that the final code can be optimized for the architecture of the card. And Nvidia and AMD can handcode parts of the shader for certain games to optimize it further.

An issue is that the compiler for dual issue was really poor and probably only made modest gains. See the compiler section in: https://chipsandcheese.com/2023/01/07/microbenchmarking-amds-rdna-3-graphics-architecture/
 

Mopetar

Diamond Member
Jan 31, 2011
8,110
6,754
136
Best perf/dollar GPU of current offerings.

Unfortunately in a market segment that almost no one here cares about. Also not that surprising for the cheapest GPU to have the best performance per dollar.

Also FWIW TPU's latest GPU review (if a 7900 GRE) has the Arc A580 at $165 as a better value than an RX 6600 at $200, but I don't know how accurate those prices are.

There are a few other cards (5700 XT at $200 and RX 580 (!!!) at $90) that are also listed above it, but those are older so the comparison isn't quite as fair.
 

MrTeal

Diamond Member
Dec 7, 2003
3,611
1,813
136
At least according to pcpartpicker, the cheapest 6600 in the US is $190 while the cheapest A580 is $160. Moving to Intel is even a more adventurous step for the $150-$200 buyer than choosing AMD.
 
  • Like
Reactions: Tlh97 and marees

marees

Senior member
Apr 28, 2024
426
496
96
At least according to pcpartpicker, the cheapest 6600 in the US is $190 while the cheapest A580 is $160. Moving to Intel is even a more adventurous step for the $150-$200 buyer than choosing AMD.
If AMD is still selling 6600 (remember it is TSMC 7nm that is supposed to be replaced by 6nm 7600) then AMD must have screwed up big time by overproducing them during the crypto boom, as I am guessing production must have stopped long ago

Does this mean
a) RDNA 4 will be delayed until RDNA 2 is emptied out
b) RDNA 4 will debut at a higher launch price & reduce street price gradually until old RDNA 2 (& RDNA 3) stocks are cleared out ?
 

blckgrffn

Diamond Member
May 1, 2003
9,294
3,436
136
www.teamjuchems.com
If AMD is still selling 6600 (remember it is TSMC 7nm that is supposed to be replaced by 6nm 7600) then AMD must have screwed up big time by overproducing them during the crypto boom, as I am guessing production must have stopped long ago

Does this mean
a) RDNA 4 will be delayed until RDNA 2 is emptied out
b) RDNA 4 will debut at a higher launch price & reduce street price gradually until old RDNA 2 (& RDNA 3) stocks are cleared out ?

That would make sense, but it doesn't look that is corroborated by anything outside some forum posts in Chinese.

Certainly the 6700xt and 6600 vanilla seem to have stuck around quite awhile, its possible that as N7 prices came down and RNDA3 didn't play out quite as expected they decided to keep those alive to fill in their lineup holes. It's as much speculation as anything, but it doesn't seem like AMD really stopped making them in ~October 2023 as widely reported.
 
  • Like
Reactions: Tlh97 and marees

SolidQ

Senior member
Jul 13, 2023
521
605
96
Fun part
"Another user claimed that the number of compute units would have increased from 96 to 200."

Real
71857daf8f9069af490b3f590d46a8d9.png


modern journalism (c)
 

Hans Gruber

Platinum Member
Dec 23, 2006
2,303
1,216
136
Blackwell is going to smoke AMD and Intel. The price/performance that RDNA 4 offers is what will make RDNA 4 compelling. The AMD of old was masterful at providing value in the price/performance category. AMD is doing quite well with their GPU drivers in my opinion. They need to spend more money and time dialing in their drivers.

The rumor that AMD is going to use 18Gb/s is not good news. Nvidia is going to be on GDDR7 while AMD will continue to use GDDR6. Why would AMD use slow GDDR6 modules when faster GDDR6 is available?
 

Ghostsonplanets

Senior member
Mar 1, 2024
688
1,113
96
Because it's cheaper for their bom. I also wouldn’t count on AMD undercutting Nvidia. They have their margins to maintain and people has showed them that they can pay up when it's about Nvidia GPUs prices. So they want the same.
 
  • Like
Reactions: Tlh97 and soresu