Discussion RDNA4 + CDNA3 Architectures Thread

DisEnchantment · Mar 23, 2022

With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits

History for llvm/lib/Target/AMDGPU - llvm/llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. - History for llvm/lib/Target/AMDGPU - llvm/llvm-project

github.com

Or Phoronix

More AMD "GFX940" Enablement Work Landing In LLVM - Phoronix

www.phoronix.com

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it

This is nuts, MI100/200/300 cadence is impressive.

Previous thread on CDNA2 and RDNA3 here

Question - Speculation: RDNA3 + CDNA2 Architectures Thread

Man I have been dying to make this one for a while now. First rumours for RDNA3 are here so new thread time! Just going to start off with this one for now: kopite7kimi on Twitter: "@VideoCardz Ah, I mean a simple mcm design with 10240 cores is not enough. Because the lift from RDNA2 to RDNA3...

forums.anandtech.com

Timorous · Nov 29, 2023

TESKATLIPOKA said:
When did a new architecture use the naming of the previous generation?

7790 used the Bonaire chip on GCN 2.0

jpiniero · Nov 29, 2023

TESKATLIPOKA said:
When did a new architecture use the naming of the previous generation?

The 2050 Laptop?

soresu · Nov 29, 2023

Timorous said:
7790 used the Bonaire chip on GCN 2.0

Definitely an odd example - that gen only had 2 discrete GPUs with the top end Hawaii coming in as HD 290X only a few months later.

The big change in naming for Hawaii suggests they recognised 7790 as a misstep later.

MrTeal · Nov 29, 2023

On the Nvidia side I can only think of the 750Ti recently. Mobile's always weird, but putting out GA107 as the 2050 a year after the 3050/3050 Ti came out is definitely an outlier and not really the same as launching new gen into current gen. It was updating the previous gen out of the current gen.

Bonaire is even weirder, because they released it as the 7790 in March 2013. They'd already refreshed GCN 1 (and TeraScale) as the 8000 series earlier in 2013 for OEMs though, so by the time there was a GCN1 7790 there was already a GCN1 8000 series and then the Bonaire 8770 in Sept 2013.

AFAIK there's never really been a full launch of a new architecture reusing the old naming scheme. Naming shenanigans almost always work the other way, with existing cards getting a +1 to the name to sell updated products. It's not impossible of course, but I'd be prety surprised.

Bigos · Nov 29, 2023

There is also Radeon 285 (later 380). In Radeon 200 series I believe there was gfx6 (like 280), gfx7 (like 290) and gfx8 (285) together.

But I think by that point AMD was struggling to develop many dies for each uarch (Bulldozer effect). I don't think they would do that again now.

Glo. · Nov 29, 2023

I've had quite a bit of thinking about that AMD patent, and read it twice already.

What AMD has done is a masterpiece.

There is nothing in that patent that limits it to specific dies. You can scale the geometry between multiple GPUs, regardless of their size, and effectively see them as one unit, so if you have an APU with this architecture, and a dGPU with this architecture, they still can be seen as ONE graphics compute unit, or more correctly, from technical perspective - as an extension of integrated GPU.

You can also design monolithic APUs with this architecture, and pair them together to create something like AMD's M2 Ultra chips, but without any of M2 Ultra chip drawbacks(lack of gaming performance scaling).

So you can have with this architecture: Dual GPU chiplet APUs, Dual monolithic APUs, APU+dGPU.

If my understanding is correct for this, Smart Access Memory makes absolute sense and is taken to another, another level.

PJVol · Nov 29, 2023

Glo. said:
What AMD has done is a masterpiece.

On paper. The case is left for small: (make sure it doesn't eat watts like there's no tomorrow)

In various implementations, the crosslink 260 is a bus, an interconnect chip such as a high density crosslink (HDCL) die interposer, or other inter-chiplet communications mechanism.

TESKATLIPOKA · Nov 29, 2023

If the bigger RDNA4 performs under N31, let's say managing 90%, then It would mean 38% higher than RX 7800XT.
Supposedly It's pretty small -> 200-250mm2.
Not sure If It's N4P or N3?

Now here comes speculation time. 😀
As many of you remember, Navi 4C was canned. It had 9 SEDs(3x3). According to @adroc_thurston with ~270CU, that would mean 30 per SED.
I can imagine that other chiplet products could have been with 3(1x3) and 6(2x3) SEDs, meaning 90 and 180CUs.
Monoliths could have 30 and 60CU.
30CU per SED looks a bit weird, but maybe a single WGP is no longer made of 2CUs, but actually 3!

	Stream Processors (Dual-issue)	Compute units	Workgroup processors	Shader arrays	Shader engines
RDNA 4 SED	1920	30	15	3	1
N33 (comparison)	2048	32	16	4	2

And here is how the whole product stack could have looked like:

	Number of SEDs	Stream Processors (Dual-issue)	Compute units	Workgroup processors	Shader arrays	Shader engines
8950XT	9	17280	270	135	27	9
8900XT	8	15360	240	120	24	8
8850XT	6	11520	180	90	18	6
8800XT	5	9600	150	75	15	5
8750XT	3	5760	90	45	9	3
8700XT	3	4608	72	36	9	3
8650XT	2 (monolith)	3840	60	30	6	2
8600XT	2 (monolith)	3072	48	24	6	2
8550XT	1 (monolith)	1920	30	15	3	1
8500XT	1 (monolith)	1536	24	12	3	1

Honestly, this table is not the best, I can't make nice cutdown versions and in some cases the difference is too big or small.

8650XT with 60CU would need to have ~40% higher clockspeed to manage 90% of N31 or have some combination of IPC + clocks, or also a different number of units.
What do you think?

Edit: after correcting me about the number of CU and SA/SE, I edited my table

Glo. · Nov 29, 2023

AMD always could have made work the vALU part and we would have 256 ALUs per WGP instead of 128.

Glo. · Nov 29, 2023

PJVol said:
On paper. The case is left for small: (make sure it doesn't eat watts like there's no tomorrow)

Correct.

Its all about overcoming the technical difficulties and powweeer draw of such solution.

SolidQ · Nov 29, 2023

TESKATLIPOKA said:
8650XT with 60CU would need to have ~40% higher clockspeed to manage 90% of N31 or have some combination of IPC + clocks, or also a different number of units.
What do you think?

There was info about reworked WGP, also GDDR7, so it's hard to make conclusion for now

branch_suggestion · Nov 29, 2023

TESKATLIPOKA said:
30CU per SED looks a bit weird, but maybe a single WGP is no longer made of 2CUs, but actually 3!

We've been through this earlier, each SED was 3SA/SE. Not sure whether all RDNA4 parts will make this change but each WGP remains 2CU.

SolidQ · Dec 3, 2023

@Kepler_L2
It's puzzle for us?

https://twitter.com/x/status/1731070137938452984

Joe NYC · Dec 3, 2023

Ian showing 4 way Mi300a system by Gigabyte:

igor_kavinski · Dec 3, 2023

Joe NYC said:
Ian showing 4 way Mi300a system by Gigabyte:

Can it run Crysis?

Joe NYC · Dec 3, 2023

igor_kavinski said:
Can it run Crysis?

I am guessing that it can boot as any x86 and it has a PCI slot for GPU.

igor_kavinski · Dec 3, 2023

Joe NYC said:
it has a PCI slot for GPU.

Why would it need a dGPU? CDNA3 doesn't support OpenGL/Vulkan?

TESKATLIPOKA · Dec 3, 2023

SolidQ said:
@Kepler_L2
It's puzzle for us?

https://twitter.com/x/status/1731070137938452984

Please post also the content, not everyone wants to make an account to view what is posted on Tweeter.

SolidQ · Dec 3, 2023

TESKATLIPOKA said:
Please post also the content, not everyone wants to make an account to view what is posted on Tweeter.

i thought you can see without register or when mask buyed twitter it's closed now?

Screen for you

TESKATLIPOKA · Dec 3, 2023

SolidQ said:
i thought you can see without register or when mask buyed twitter it's closed now?

Screen for you

Thanks.
Then my guess is 30CU and 60CUs. It will have significantly higher clocks, at least 3GHz.

edit: and I think the big one will perform comparably to a 88CU(44+44) RDNA GPU. 😀

@PJVol that's also possible, but then 3SA/SE can't be true.

PJVol · Dec 3, 2023

TESKATLIPOKA said:
Thanks.
Then my guess is 30CU and 60CUs. It will have significantly higher clocks, at least 3GHz.

Hmm... RGT said "he's heard from multiple sources" that 32 and 20 WGPs congifs ard set in stone

SolidQ · Dec 3, 2023

TESKATLIPOKA said:
I think the big one will perform comparably to a 88CU(44+44) RDNA GPU.

There another

TESKATLIPOKA · Dec 3, 2023

SolidQ said:
There another

I like challenges. 😀
But this is hard even for me.😛

P.S. with this post I just became a Platinum member.
How do I feel? Great
Why? Because I am not at work 🙂

igor_kavinski · Dec 3, 2023

TESKATLIPOKA said:
I like challenges. 😀
But this is hard even for me.😛

Why must that be a challenge? One is twice the performance of the other. If not, it's a stupid waste of resources to have something twice as large and not be as performant. But this is the Radeon group so maybe we can look forward to stupidity?

TESKATLIPOKA · Dec 3, 2023

igor_kavinski said:
Why must that be a challenge? One is twice the performance of the other. If not, it's a stupid waste of resources to have something twice as large and not be as performant. But this is RGT so maybe we can look forward to stupidity?

Read It once more. It's about WGP/CU/SP count for a single chip.
Can you guess It correctly? That's what is challenging.

Discussion RDNA4 + CDNA3 Architectures Thread

Golden Member

Golden Member

Lifer

Diamond Member

Diamond Member

Senior member

Diamond Member

Senior member

Platinum Member

Diamond Member

Diamond Member

Golden Member

Senior member

Golden Member

Diamond Member

Lifer

Diamond Member

Lifer

Platinum Member

Golden Member

Platinum Member

Senior member

Golden Member

Platinum Member

Lifer

Platinum Member