Discussion RDNA4 + CDNA3 Architectures Thread

DisEnchantment · Mar 23, 2022

With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits

History for llvm/lib/Target/AMDGPU - llvm/llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. - History for llvm/lib/Target/AMDGPU - llvm/llvm-project

github.com

Or Phoronix

More AMD "GFX940" Enablement Work Landing In LLVM - Phoronix

www.phoronix.com

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it

This is nuts, MI100/200/300 cadence is impressive.

Previous thread on CDNA2 and RDNA3 here

Question - Speculation: RDNA3 + CDNA2 Architectures Thread

Man I have been dying to make this one for a while now. First rumours for RDNA3 are here so new thread time! Just going to start off with this one for now: kopite7kimi on Twitter: "@VideoCardz Ah, I mean a simple mcm design with 10240 cores is not enough. Because the lift from RDNA2 to RDNA3...

forums.anandtech.com

Saylick · Dec 6, 2023

Interesting. Although they didn't show off MI300C, it's readily obvious how they did it.

Screenshot_2023-12-06-13-01-44-42_40deb401b9ffe8e1df2f1cc5ba480b12.jpg

Saylick · Dec 6, 2023

Computerbase has the whole slide deck. Enjoy!

AMD MI300A & MI300X: Die neue Instinct-Serie ist ein Meilenstein in vielen Bereichen

Mit der neuen Produktreihe Instinct MI300A/MI300X geht AMD den Platzhirsch Nvidia an und will dank Modularität mitunter sogar mehr bieten.

www.computerbase.de

Edit: LOL, I just realized @Abwx posted it already like two posts prior. D'oh!

SteinFG · Dec 7, 2023

Timorous said:
GDDR7 will also have 3gb modules so 192 bit can support 18GB from 6 chips and 128 but could support 12GB from 4 chips.

But that's not coming right away. 2GB GDDR6 chips came out two years after 1GB GDDR6. I wouldn't expect 3GB GDDR7 on rtx 50 or rx 8000, probably a gen after that

adroc_thurston · Dec 7, 2023

SteinFG said:
But that's not coming right away

It is, same timeline as 24Gbit ICs for HBM3e.

SteinFG · Dec 7, 2023

adroc_thurston said:
It is, same timeline as 24Gbit ICs for HBM3e.

That would be great

Timorous · Dec 7, 2023

SteinFG said:
But that's not coming right away. 2GB GDDR6 chips came out two years after 1GB GDDR6. I wouldn't expect 3GB GDDR7 on rtx 50 or rx 8000, probably a gen after that

Micron should be making them available at launch unless things have changed.

TPU Source

Mid 2024ish according to this.

eek2121 · Dec 7, 2023

adroc_thurston said:
Yea that's the plan.

Yep. AMD would absolutely love to sell you a tiny GPU at insane clocks. Smaller chips mean higher margins. Tiny chip + overbuilt board cost less than multiple large chips + overbuilt board.

…when it works.

Would you care if your GPU had 40CUs at 5ghz or 60CUs at 2.5? Hypothetical example only.

moinmoin · Dec 7, 2023

eek2121 said:
Would you care if your GPU had 40CUs at 5ghz or 60CUs at 2.5? Hypothetical example only.

Personally I'd always pick the more efficient product among the same class of performance, so I'd prefer the latter. But DIY desktop users in general seem to be perfectly fine with the former, and the development of CPU and GPU TDPs over the past decade reflects just that.

adroc_thurston · Dec 7, 2023

moinmoin said:
Personally I'd always pick the more efficient product among the same class of performance, so I'd prefer the latter

That only ever worked back when xtors were getting cheaper which is just no more.

moinmoin said:
and the development of CPU and GPU TDPs over the past decade reflects just that.

Well that's just dennard scaling dying off.
Server CPUs had even granted power envelope bumps.
Romley Sandy-EP Xeons were like 95W and Turin will be 5 times that in a bit over a decade.

TESKATLIPOKA · Dec 7, 2023

eek2121 said:
Yep. AMD would absolutely love to sell you a tiny GPU at insane clocks. Smaller chips mean higher margins. Tiny chip + overbuilt board cost less than multiple large chips + overbuilt board.

…when it works.

Would you care if your GPU had 40CUs at 5ghz or 60CUs at 2.5? Hypothetical example only.

I would care because that 40CU at 5GHz would be faster than 60CU at 2.5GHz. 😉

If you fixed It to the same TFlops 40CU at 5GHz vs 80CU at 2.5GHz then from a performance point I wouldn't really care, but as @moinmoin mentioned, the later should be more power efficient, but because It's bigger It would be also more expensive to make.

adroc_thurston · Dec 7, 2023

TESKATLIPOKA said:
If you fixed It to the same TFlops 40CU at 5GHz vs 80CU at 2.5GHz then from a performance point I wouldn't really care, but as @moinmoin mentioned, the later should be more power efficient, but because It's bigger It would be also more expensive to make.

It's not quite that.
RDNA2 clocked higher while being more efficient.

TESKATLIPOKA · Dec 7, 2023

adroc_thurston said:
It's not quite that.
RDNA2 clocked higher while being more efficient.

Yeah, but that was because It was a different architecture.
In this case we are talking about the same architecture, but should have used realistic clocks.
The comparison should have been 40CU at 2.5GHz vs 80CU at 1.25GHz.

adroc_thurston · Dec 7, 2023

TESKATLIPOKA said:
The comparison should have been 40CU at 2.5GHz vs 80CU at 1.25GHz.

Wide and slow isn't a thing anymore.
Even GPGPU sticks like MI300 run 2.1GHz.

gdansk · Dec 7, 2023

Slow is just fast now and fast fast is 4GHz

Mopetar · Dec 7, 2023

Abwx said:
That s not even enough in respect of a 12 CUs APU that is 178mm2, let alone one that has 32CUs as well, indeed German etailer Mindfactory sales numbers are indicative that the 7600 lack perfs to be competitive, at 40CUs it would have sold much better while at 32 it s buried within a plethorous offering.

In this segment people get either a RX 6600 wich is cheaper and perform not that much worse or RTX 3060/4060/4060TI wich are either price competitive for the weakest or somewhat faster for the others.

https://twitter.com/x/status/1731450376677622246

The real story is that when you offer a great value product like the 7800XT you actually get a lot of people to buy your cards.

AMD could have definitely had more success with their other products, but they decided to chase extra $$. However, it's hard to blame them when it's probably much more profitable to sell under half as many cards if you can get an extra $50 on each of them.

branch_suggestion · Dec 8, 2023

N44 1SE/2SA/16WGP 24MB MALL 96-bit GDDR7 9GB
N48 2SE/4SA/32WGP 48MB MALL 192-bit GDDR7 18GB
N44 looks sad, but will be stronger than N33 so it should be a bargain.
N48 on the other hand could be sold for 7700XT prices while beating N32/N21 and giving cutdown N31 a real scare.

gdansk · Dec 8, 2023

Let's get the hypetrain rolling on 3.5GHz+

branch_suggestion · Dec 8, 2023

gdansk said:
Let's get the hypetrain rolling on 3.5GHz+

That is the goal, as clocks are still untapped potential in GPU design.
The traditional way of scaling GPU performance and cost structure, Moore's Law, is dead.
Increasing clocks, dark Si and trimming off all the fat to make a very performance dense part is the winning strategy, on client at least.
Now chiplets will lead to lower costs per equivalent yielded area, but packaging does incur another hit. Still compute demands will require beyond reticle designs to increase compute density, this is the way in DC and halo client. Memory is the really big thing that needs to try to keep up with compute demands, along with interconnects and scale out fabrics.

TESKATLIPOKA · Dec 8, 2023

branch_suggestion said:
N44 1SE/2SA/16WGP 24MB MALL 96-bit GDDR7 9GB
N48 2SE/4SA/32WGP 48MB MALL 192-bit GDDR7 18GB
N44 looks sad, but will be stronger than N33 so it should be a bargain.
N48 on the other hand could be sold for 7700XT prices while beating N32/N21 and giving cutdown N31 a real scare.

N44 in my opinion will have 128-bit 8-12GB. 9GB is just too exotic.
I think 32MB IC(MALL) is more realistic than only 24MB, which doesn't even have 50% hitrate at FullHD.

TESKATLIPOKA · Dec 8, 2023

gdansk said:
Let's get the hypetrain rolling on 3.5GHz+

Is It really just a hypetrain?

7700XT shows very high OC.

And that is with +15% power limit. With higher limit, I think you could clock It higher.
The architecture is capable to go >3GHz, but they need to fix power consumption.
The question is If they fixed It in RDNA3.5 and RDNA4 or not.

gdansk · Dec 8, 2023

TESKATLIPOKA said:
Is It really just a hypetrain?

7700XT shows very high OC.
View attachment 90028
And that is with +15% power limit. With higher limit, I think you could clock It higher.
The architecture is capable to go >3GHz, but they need to fix power consumption.
The question is If they fixed It in RDNA3.5 and RDNA4 or not.

I don't know if hypetrain is the right word but N48 will be a real flop if it isn't hitting such high clocks.

soresu · Dec 8, 2023

gdansk said:
I don't know if hypetrain is the right word but N48 will be a real flop if it isn't hitting such high clocks.

Being a flop depends more on value per dollar vs the competition.

As long as AMD prices it right they should do fine as most of the revenue from consumer space comes from mid to lower end cards.

adroc_thurston · Dec 8, 2023

soresu said:
As long as AMD prices it right they should do fine as most of the revenue from consumer space comes from mid to lower end cards.

They won't sell anything without a halo part.

HurleyBird · Dec 8, 2023

Abwx said:
Computerbase released all slides with numerous comparisons, they say thay had them since quite a time but were under NDA, as usual.

AMD MI300A & MI300X: Die neue Instinct-Serie ist ein Meilenstein in vielen Bereichen

Mit der neuen Produktreihe Instinct MI300A/MI300X geht AMD den Platzhirsch Nvidia an und will dank Modularität mitunter sogar mehr bieten.

www.computerbase.de

It's missing the Mi300X/H100 comparison chart. Might be missing other slides.

Abwx · Dec 8, 2023

HurleyBird said:
It's missing the Mi300X/H100 comparison chart. Might be missing other slides.

There are these ones at Computerbase :

AMD MI300A & MI300X: Die neue Instinct-Serie ist ein Meilenstein in vielen Bereichen

Mit der neuen Produktreihe Instinct MI300A/MI300X geht AMD den Platzhirsch Nvidia an und will dank Modularität mitunter sogar mehr bieten.

www.computerbase.de

Discussion RDNA4 + CDNA3 Architectures Thread

Golden Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Senior member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Platinum Member

Diamond Member

Platinum Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Senior member

Platinum Member

Platinum Member

Diamond Member

Diamond Member

Diamond Member

Platinum Member

Lifer