Discussion RDNA4 + CDNA3 Architectures Thread

Page 53 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DisEnchantment

Golden Member
Mar 3, 2017
1,584
5,685
136
1655034287489.png
1655034259690.png

1655034485504.png

With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits
Or Phoronix

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it :grimacing:

This is nuts, MI100/200/300 cadence is impressive.

1655034362046.png

Previous thread on CDNA2 and RDNA3 here

 
Last edited:

MrTeal

Diamond Member
Dec 7, 2003
3,514
1,597
136
On the Nvidia side I can only think of the 750Ti recently. Mobile's always weird, but putting out GA107 as the 2050 a year after the 3050/3050 Ti came out is definitely an outlier and not really the same as launching new gen into current gen. It was updating the previous gen out of the current gen.

Bonaire is even weirder, because they released it as the 7790 in March 2013. They'd already refreshed GCN 1 (and TeraScale) as the 8000 series earlier in 2013 for OEMs though, so by the time there was a GCN1 7790 there was already a GCN1 8000 series and then the Bonaire 8770 in Sept 2013.

AFAIK there's never really been a full launch of a new architecture reusing the old naming scheme. Naming shenanigans almost always work the other way, with existing cards getting a +1 to the name to sell updated products. It's not impossible of course, but I'd be prety surprised.
 

Bigos

Member
Jun 2, 2019
123
269
136
There is also Radeon 285 (later 380). In Radeon 200 series I believe there was gfx6 (like 280), gfx7 (like 290) and gfx8 (285) together.

But I think by that point AMD was struggling to develop many dies for each uarch (Bulldozer effect). I don't think they would do that again now.
 

Glo.

Diamond Member
Apr 25, 2015
5,622
4,356
136
I've had quite a bit of thinking about that AMD patent, and read it twice already.

What AMD has done is a masterpiece.

There is nothing in that patent that limits it to specific dies. You can scale the geometry between multiple GPUs, regardless of their size, and effectively see them as one unit, so if you have an APU with this architecture, and a dGPU with this architecture, they still can be seen as ONE graphics compute unit, or more correctly, from technical perspective - as an extension of integrated GPU.

You can also design monolithic APUs with this architecture, and pair them together to create something like AMD's M2 Ultra chips, but without any of M2 Ultra chip drawbacks(lack of gaming performance scaling).

So you can have with this architecture: Dual GPU chiplet APUs, Dual monolithic APUs, APU+dGPU.

If my understanding is correct for this, Smart Access Memory makes absolute sense and is taken to another, another level.
 
  • Like
Reactions: Tlh97 and Joe NYC

PJVol

Senior member
May 25, 2020
505
422
106
What AMD has done is a masterpiece.
On paper. The case is left for small: (make sure it doesn't eat watts like there's no tomorrow)
In various implementations, the crosslink 260 is a bus, an interconnect chip such as a high density crosslink (HDCL) die interposer, or other inter-chiplet communications mechanism.
 
Last edited:

TESKATLIPOKA

Platinum Member
May 1, 2020
2,312
2,798
106
If the bigger RDNA4 performs under N31, let's say managing 90%, then It would mean 38% higher than RX 7800XT.
Supposedly It's pretty small -> 200-250mm2.
Not sure If It's N4P or N3?

Now here comes speculation time. :D
As many of you remember, Navi 4C was canned. It had 9 SEDs(3x3). According to @adroc_thurston with ~270CU, that would mean 30 per SED.
I can imagine that other chiplet products could have been with 3(1x3) and 6(2x3) SEDs, meaning 90 and 180CUs.
Monoliths could have 30 and 60CU.
30CU per SED looks a bit weird, but maybe a single WGP is no longer made of 2CUs, but actually 3!
Stream Processors
(Dual-issue)
Compute unitsWorkgroup processorsShader arraysShader engines
RDNA 4 SED1920301531
N33 (comparison)2048321642

And here is how the whole product stack could have looked like:
Number of SEDsStream Processors
(Dual-issue)
Compute unitsWorkgroup processorsShader arraysShader engines
8950XT917280270135279
8900XT815360240120248
8850XT61152018090186
8800XT5960015075155
8750XT35760904593
8700XT34608723693
8650XT2 (monolith)3840603062
8600XT2 (monolith)3072482462
8550XT1 (monolith)1920301531
8500XT1 (monolith)1536241231
Honestly, this table is not the best, I can't make nice cutdown versions and in some cases the difference is too big or small.

8650XT with 60CU would need to have ~40% higher clockspeed to manage 90% of N31 or have some combination of IPC + clocks, or also a different number of units.
What do you think?

Edit: after correcting me about the number of CU and SA/SE, I edited my table
 
Last edited:
  • Like
Reactions: Tlh97 and Elfear

SolidQ

Member
Jul 13, 2023
163
193
76
Please post also the content, not everyone wants to make an account to view what is posted on Tweeter.
i thought you can see without register or when mask buyed twitter it's closed now?

Screen for you
9add7f620f9ea627c5bc6dee52a0aca2.png
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,312
2,798
106
i thought you can see without register or when mask buyed twitter it's closed now?

Screen for you
9add7f620f9ea627c5bc6dee52a0aca2.png
Thanks.
Then my guess is 30CU and 60CUs. It will have significantly higher clocks, at least 3GHz.

edit: and I think the big one will perform comparably to a 88CU(44+44) RDNA GPU. :D

@PJVol that's also possible, but then 3SA/SE can't be true.
 
Last edited:

TESKATLIPOKA

Platinum Member
May 1, 2020
2,312
2,798
106
Why must that be a challenge? One is twice the performance of the other. If not, it's a stupid waste of resources to have something twice as large and not be as performant. But this is RGT so maybe we can look forward to stupidity?
Read It once more. It's about WGP/CU/SP count for a single chip.
Can you guess It correctly? That's what is challenging.