Discussion RDNA4 + CDNA3 Architectures Thread

DisEnchantment · Mar 23, 2022

With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits

History for llvm/lib/Target/AMDGPU - llvm/llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. - History for llvm/lib/Target/AMDGPU - llvm/llvm-project

github.com

Or Phoronix

More AMD "GFX940" Enablement Work Landing In LLVM - Phoronix

www.phoronix.com

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it

This is nuts, MI100/200/300 cadence is impressive.

Previous thread on CDNA2 and RDNA3 here

Question - Speculation: RDNA3 + CDNA2 Architectures Thread

Man I have been dying to make this one for a while now. First rumours for RDNA3 are here so new thread time! Just going to start off with this one for now: kopite7kimi on Twitter: "@VideoCardz Ah, I mean a simple mcm design with 10240 cores is not enough. Because the lift from RDNA2 to RDNA3...

forums.anandtech.com

SteinFG · May 6, 2024

igor_kavinski said:
Yes, please God! Smite 8GB!

*monkey paw curles* Now 9GB at 96bit is standard

igor_kavinski · May 6, 2024

SteinFG said:
*monkey paw curles* Now 9GB at 96bit is standard

Heck, I invite Jensen to do the 970 tiered VRAM crap again, with 8GB fast VRAM and 3 or 4GB slower VRAM. Still better than going out to system RAM.

Ghostsonplanets · May 6, 2024

Isn't the Navi 44 rumored to also be sold with 8GB offering?

Mahboi · May 6, 2024

What else would it have?
130mm² tiny thing, it's a 8500xt that they'll probably brand as 8600 xt.

I really doubt that we'll have 12Go as the baseline until GDDR7 becomes commonplace.

SolidQ · May 6, 2024

Ghostsonplanets said:
Isn't the Navi 44 rumored to also be sold with 8GB offering?

8gb, but question is, will be there 16gb model?

Timorous · May 6, 2024

Ghostsonplanets said:
Isn't the Navi 44 rumored to also be sold with 8GB offering?

It is rumoured to have a 128bit GDDR6 bus. That means 8GB or 16GB unless you cut the bus.

8GB has a ceiling price which if N44 is less than is fine but if it does perform between the 6700XT and 7700XT (so 4060Ti ISH) then AMD can probably up the price by more than the cost of the VRAM.

A cut version that performs like the 6700XT could also charge more than the extra vram if AMD chose to go 96bit + 12GB but 128bit + 8GB at $220 would probably be okay.

Abwx · May 6, 2024

igor_kavinski said:
Though the 7600 should've had 10GB minimum. AMD should've known better.

7600XT wich use the same chip has 16GB, now when it comes to perf/price a 4060 has only 10% advantage in raster but also half the RAM.
There s graphs at Computerbase for perf/price at each resolution with and without RT, you can check there.

Grafikkarten-Preis/Leistungs: So viele FPS pro Euro gibt es

Welche Gaming-Grafikkarte hat das beste Preis-Leistungs-Verhältnis gemessen in FPS pro Euro? Ein Überblick mit Benchmarks (Januar 2026).

www.computerbase.de

SolidQ · May 6, 2024

Is there something real? Except CU number

Mahboi · May 6, 2024

2 Shader Engines, 4 Shader Arrays, what's the 8-7 about?
And yes Kepler reposted it on his Twitter.

Ghostsonplanets · May 6, 2024

SolidQ said:
Is there something real? Except CU number

Everything should be real. It comes straight from Sony documentation.

Saylick · May 6, 2024

Mahboi said:
2 Shader Engines, 4 Shader Arrays, what's the 8-7 about?
And yes Kepler reposted it on his Twitter.

8-7-8-7 is the number of WGP in each Shader Array: two of the SAs have 8 WGPs and the other two have 7 WGPs.

blckgrffn · May 6, 2024

Saylick said:
8-7-8-7 is the number of WGP in each Shader Array: two of the SAs have 8 WGPs and the other two have 7 WGPs.

I trust Sony will have a way to squeeze everything out of this but isn’t the lack of symmetry in this type of hardware a little odd? Is there a purpose to this that’s obvious to those in the know?

Kepler_L2 · May 6, 2024

blckgrffn said:
I trust Sony will have a way to squeeze everything out of this but isn’t the lack of symmetry in this type of hardware a little odd? Is there a purpose to this that’s obvious to those in the know?

It's disabling 1 WGP per SE for yield purposes. Same for PS5, Xbox Series X, 6800 XT, 6700, etc.

DisEnchantment · May 6, 2024

SolidQ said:
Is there something real? Except CU number

GL0V --> sounds odd. Somebody just wrote that, I doubt it is from a specification because it I don't see this on RDNA3, unless the GPU is not based on RDNA3
RDNA2-->RDNA3 doubled the L0, so, 32K. But it is not global it is local to a CU, also not even shareable at WGP level.
RDNA2-->RDNA3 doubled the L1, so, 256K

Kepler_L2 · May 6, 2024

DisEnchantment said:
GL0V --> sounds odd. Somebody just wrote that, I doubt it is from a specification because it I don't see this on RDNA3, unless the GPU is not based on RDNA3
RDNA2-->RDNA3 doubled the L0, so, 32K. But it is not global it is local to a CU, also not even shareable at WGP level.
RDNA2-->RDNA3 doubled the L1, so, 256K
View attachment 98490

GL0V = Graphics L0 Vector cache, as opposed to Instruction or Scalar cache.

DisEnchantment · May 6, 2024

Kepler_L2 said:
GL0V = Graphics L0 Vector cache

I was imagining they were referring to Global L0, like how GL1 is referred to as. But I don't see that GL0 being used in GPUOpen documentation much, at least not the ISA docs.
But it does look like it carries some RDNA3 traits.

ToTTenTranz · May 7, 2024

DisEnchantment said:
RDNA2-->RDNA3 doubled the L1, so, 256K

256K per shader array. There are 8 SAs on the PS5 Pro, so 1MB total L1.
Then there's 64K L0V per WGP so total 1.9MB.

TESKATLIPOKA · May 7, 2024

ToTTenTranz said:
256K per shader array. There are 8 SAs on the PS5 Pro, so 1MB total L1.
Then there's 64K L0V per WGP so total 1.9MB.

If 256K per SA, then It would mean 256*8=2048K or 2MB.

ToTTenTranz · May 7, 2024

TESKATLIPOKA said:
If 256K per SA, then It would mean 256*8=2048K or 2MB.

I thought so too at first, but those specs mention only 4 SA with 7 or 8 active WGP each.

I would have thought a 32 WGP RDNA3-derivative would have 4 Shader Engines, not just 2.

CouncilorIrissa · May 7, 2024

ToTTenTranz said:
I thought so too at first, but those specs mention only 4 SA with 7 or 8 active WGP each.

I would have thought a 32 WGP RDNA3-derivative would have 4 Shader Engines, not just 2.

Those chips go into consoles whose manufacturers are notoriously stingy wrt/ die space.
Same reason why these seemingly gaming-oriented chips are equipped with only 8MB L3$ instead of 32MB for desktop equivalents, despite cache being very valuable for gaming.

Mopetar · May 7, 2024

I'm kind of surprised they don't use v-cache as a way to separate baseline and "better" model. Xbox used separate chips for their S and X models, but v-cache seems like another way to separate them as binning for hardware capability doesn't differentiate the parts enough.

Console APUs are practically perfect since they want to keep the TPD down anyway which means lower voltages.

adroc_thurston · May 7, 2024

Mopetar said:
I'm kind of surprised they don't use v-cache as a way to separate baseline and "better" model. Xbox used separate chips for their S and X models, but v-cache seems like another way to separate them as binning for hardware capability doesn't differentiate the parts enough

You're out of your mind if you think we can ship SoIC-X at console volumes.

Mopetar · May 7, 2024

Depends on the mix. If you want 85% of your sales to be the top model, then no. If it's 15%, much more doable.

At the end of the day it's really just a matter of money. What isn't feasible suddenly is for the right price.

It also depends on what you mean by console volumes. Apparently those haven't been all that great recently, or ever if you're Microsoft this generation.

CouncilorIrissa · May 7, 2024

Mopetar said:
Depends on the mix. If you want 85% of your sales to be the top model, then no. If it's 15%, much more doable.

At the end of the day it's really just a matter of money. What isn't feasible suddenly is for the right price.

It also depends on what you mean by console volumes. Apparently those haven't been all that great recently, or ever if you're Microsoft this generation.

As a console manufacturer, you don't exactly count on your volumes to be non-existent, that defeats the entire purpose.

Mahboi · May 7, 2024

Video game consoles lifetime sales worldwide 2026| Statista

One particular console outsold all others, a silent legend that conquered living rooms worldwide

www.statista.com

I didn't realise how poorly the PS5 has sold.
We're below XBONE.
It's between the NES and SNES, which is ridiculous considering how much bigger the market is than then.

This is pretty awful since the console is great. Sign of the economy, or sign of the future of consoles?

Discussion RDNA4 + CDNA3 Architectures Thread

Golden Member

Senior member

Lifer

Senior member

Golden Member

Golden Member

Golden Member

Lifer

Golden Member

Golden Member

Senior member

Diamond Member

Diamond Member

Golden Member

Golden Member

Golden Member

Golden Member

Golden Member

Platinum Member

Golden Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Senior member

Golden Member