Discussion RDNA4 + CDNA3 Architectures Thread

DisEnchantment · Mar 23, 2022

With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits

History for llvm/lib/Target/AMDGPU - llvm/llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. - History for llvm/lib/Target/AMDGPU - llvm/llvm-project

github.com

Or Phoronix

More AMD "GFX940" Enablement Work Landing In LLVM - Phoronix

www.phoronix.com

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it

This is nuts, MI100/200/300 cadence is impressive.

Previous thread on CDNA2 and RDNA3 here

Question - Speculation: RDNA3 + CDNA2 Architectures Thread

Man I have been dying to make this one for a while now. First rumours for RDNA3 are here so new thread time! Just going to start off with this one for now: kopite7kimi on Twitter: "@VideoCardz Ah, I mean a simple mcm design with 10240 cores is not enough. Because the lift from RDNA2 to RDNA3...

forums.anandtech.com

Ajay · Jul 24, 2023

TESKATLIPOKA said:
especially If they made 32Gbit chips.

I don't think it's a matter of when, not if. SS, at least, has to get power consumption down to increase the density; otherwise the dram will be tough to cool.

Joe NYC · Jul 24, 2023

udaemonia said:
Rick Bergman and David Wang stated that cost was a major factor in avoiding direct competition with the 4090. They're going to commit to the design and improve everywhere they can.

AMD: We Didn't Make RDNA 3 as Fast as RTX 4090 to Keep Costs and Power Down | Hardware Times

In an interview with Japanese outlet ITMedia, AMD Executives Rick Bergman and David Wang shared Team Radeon’s opinions on the state of the next-gen GPU market. As noted in our review of the Radeon RX 7900 XT, the RDNA 3 heavyweight is substantially slower than the GeForce RTX 4080 in ray-traced...

www.hardwaretimes.com

RDNA 3 was relatively cost-effective model for 2022.

Given all the changes and possibilities in packaging technologies, not advancing RDNA 4 design at all, and producing 2022 replica in 2025 would be very disappointing.

Ajay · Jul 24, 2023

Joe NYC said:
RDNA 3 was relatively cost-effective model for 2022.

Given all the changes and possibilities in packaging technologies, not advancing RDNA 4 design at all, and producing 2022 replica in 2025 would be very disappointing.

Yes it would be. Hopefully, AMD is at least willing to make an RX 8900 XTX for $1200 and actually beat the RTX 5080. There seem to be a reasonable number of people who shell $1600+ out for Nvidia's 4090. I mean, take the fight to Nvidia AMD!!

[rant]And I hope they fix the Win11 on AM4 stuttering problem for RDNA4- seriously, that’s 100% your platform. I wound up going back to NV because I was so miffed over it.[/rant]

GodisanAtheist · Jul 24, 2023

I can't believe AMD executives actually have the balls to say "We could have completed but we didn't [cause our tech wasn't performing to snuff]" on record where people can hear them.

Joe NYC · Jul 24, 2023

Ajay said:
Yes it would be. Hopefully, AMD is at least willing to make an RX 8900 XTX for $1200 and actually beat the RTX 5080. There seem to be a reasonable number of people who shell $1600+ out for Nvidia's 4090. I mean, take the fight to Nvidia AMD!!

[rant]And I hope they fix the Win11 on AM4 stuttering problem for RDNA4- seriously, that’s 100% your platform. I wound up going back to NV because I was so miffed over it.[/rant]

If they could make a modular GPU, where 1 module would be 8600, to 3 modules being current Navi31 size, they could add a 4th module.

That would be one way to compete on the highest end without having to divert inordinate resources for it. I think that is the only hope for people expecting AMD competition on the high end.

Same way AMD did not make 500mm2 RDNA 3 GPU now will apply to RDNA 4 models next year, if RDNA 4 is not modular.

Ajay · Jul 24, 2023

Joe NYC said:
Same way AMD did not make 500mm2 RDNA 3 GPU now will apply to RDNA 4 models next year, if RDNA 4 is not modular.

If they don't make RDNA4 GCDs modular, they will have to go bigger. That, or just cede the high end to Nvidia again.
If AMD hasn't gotten an ultrawide interconnect system working for RDNA4, I hope they got the perf/watt up high enough to make a larger Die.

GodisanAtheist said:
I can't believe AMD executives actually have the balls to say "We could have completed but we didn't [cause our tech wasn't performing to snuff]" on record where people can hear them.

Yeah, I wasn't sure what to make of the short statements. Like 'hey, people are better off spending the extra money on a better CPU' - from the GPU division. So.... if Intel is 'better' at the moment, you'd rather gamers buy a better Intel CPU then pay more for a better AMD GPU??? Anyway, just weird all round. It's like someone just like Raja took over the GPU division when he left; but without all the boasts.

beginner99 · Jul 25, 2023

SteinFG said:
Imagine this size of PCB, but for something like 4080. With VRAM on-package. something like this, here's a quick paint mockup:

But it's pointless because you still need a large cooler so you gain nothing really from a small PCB.

Mopetar · Jul 25, 2023

Yeah, you're not going to be able to cool a 4080 or even really a 4060 with a card that small.

If anything I'd anticipate it making low profile cards easier so we might see more of those.

adroc_thurston · Jul 25, 2023

DisEnchantment said:
Using LDS to augment L0 (depicted as L1 in pic) when unused and sharing L0 across SIMDs in a WGP appeared in a patent below.

This proposal is a whole lot more insane than what you've said, since it calls for eschewing CU/SM-private memories altogether, instead opting for a fat, shared, segmented regfile that dynamically allocates parts of itself as L1 and LDS/shmem.
Also given who filled it and when it's probably inside MI300 already.

Tuna-Fish · Jul 26, 2023

Ajay said:
As an aside - some people think the next round of consumer GPUs won’t hit till 2025. I have two opinions on this. First, I don’t see AMD or NV doing much to change the release schedules of the next generation of AIBs - in part because AIB manufacturers will want something new and shiny to sell, especially given the lackluster sales of this gen. Second, AMD really needs an improvement over the RDNA3 GPUs and, IMHO, can’t afford a delay. Nvidia surely isn’t going to sit still and likely be put in the rear view mirror when AMD top RDNA4 GPU is released.

nV has a lot of room for improving their product stack by rebranding every chip into one tier below it but in the 50-series. So much so, that unless AMD really, really succeeds with RDNA4, they'd have the better product line.

eek2121 · Jul 27, 2023

AMD will need to have a perf/$ leader (in all metrics, including RT) in order to succeed, and they will need to do it across multiple generations to win back mindshare/marketshare. Just being the “me too” second choice won’t get them anywhere.

Thus far they have shown no interest in doing this.

I honestly don’t think they need the absolute fastest card, but the cards they do release should be much faster (and more efficient) than anything else in that price bracket.

If they could get MCM properly working with multiple GCDs, they could probably do this pretty easily, they just need to stop letting NVIDIA set the price/performance tiers.

Oh and they need to invest more in their first party cards. Early cards this gen had cooling issues. NVIDIA’s FE cards are high quality. The 4090FE is the smallest and quietist card you can get. AMD needs a similar high quality design.

adroc_thurston · Jul 27, 2023

eek2121 said:
AMD will need to have a perf/$ leader

Not happening.
The moment they're winning it's pocket all the margin time.

eek2121 said:
to win back mindshare/marketshare

You win it by building the biggest halo part out there and they just gotta be willing.

eek2121 said:
I honestly don’t think they need the absolute fastest card

That's the only way to win in children-driven markets like client dGP.

eek2121 said:
If they could get MCM properly working with multiple GCDs

MCM GPU means abhorrently expensive parts like MI300, not better value.

Ajay · Jul 27, 2023

eek2121 said:
AMD will need to have a perf/$ leader (in all metrics, including RT) in order to succeed, and they will need to do it across multiple generations to win back mindshare/marketshare. Just being the “me too” second choice won’t get them anywhere.

Thus far they have shown no interest in doing this.

Yea, though parity with RT is sufficient, imho. Having competitive or better raster is the real game change still, as RT adoption is still limited.

eek2121 said:
I honestly don’t think they need the absolute fastest card, but the cards they do release should be much faster (and more efficient) than anything else in that price bracket.

NV has shown the benefits, at least marketing wise, over and over again. And yet, as you mention above, for some reason AMD doesn’t care enough to really go for it. Maybe it is because because getting a high speed fabric on package is really hard - plus trying to limit the cross traffic between the GCDs isn’t working well with the current RDNA architecture.

eek2121 said:
If they could get MCM properly working with multiple GCDs, they could probably do this pretty easily, they just need to stop letting NVIDIA set the price/performance tiers.

Not easily. Trying to coalesce data and instructions (particularly in cache) to one GCD as much as possible will be hard. Finding the right partitioning system to minimize cross communication while maximizing performance is tricky stuff. It's the only way to max out performance in a multi-GCD GPU. As fast as the interconnects can be, it just not the same as on-die performance. It's already hard enough to do this on one die with one WGP needing access to cache allocated to a different WGP (@DisEnchantment pointed this out).

eek2121 said:
Oh and they need to invest more in their first party cards. Early cards this gen had cooling issues. NVIDIA’s FE cards are high quality. The 4090FE is the smallest and quietist card you can get. AMD needs a similar high quality design.

I disagree. I think that it's fine that AMD prioritizes their AIBs. They have a smaller market share so 'stealing' sales from the AIBs would be counter-productive. IMHO.

adroc_thurston · Jul 27, 2023

Ajay said:
Maybe it is because because getting a high speed fabric on package is really hard - plus trying to limit the cross traffic between the GCDs isn’t working well with the current RDNA architecture.

That's not really the hard part.

Ajay · Jul 27, 2023

adroc_thurston said:
That's not really the hard part.

Which, the first part or the second? Also, what do you think is the hardest part of making multi-GCDs work?

adroc_thurston · Jul 27, 2023

Ajay said:
Also, what do you think is the hardest part of making multi-GCDs work?

The graphics APIs themselves.
They have those weirdly serial parts that never quite evolved from their mid 90s origins.

Ajay · Jul 27, 2023

adroc_thurston said:
The graphics APIs themselves.
They have those weirdly serial parts that never quite evolved from their mid 90s origins.

Geez, never would have thought of that. I’d be a bit annoyed if a couple old games I occasionally play broke honestly. But I’d get over it in return for big jumps in performance in future cards/games. Guess we’ve all been spoiled.

DAPUNISHER · Jul 27, 2023

Ajay said:
I’d be a bit annoyed if a couple old games I occasionally play broke honestly.

Plan ahead to keep/have a retro PC to handle those titles.

Saylick · Jul 27, 2023

Ajay said:
Geez, never would have thought of that. I’d be a bit annoyed if a couple old games I occasionally play broke honestly. But I’d get over it in return for big jumps in performance in future cards/games. Guess we’ve all been spoiled.

If an MCM approach doesn't scale with retro games due to the use of an old API, I don't see why there couldn't just be a fallback mode where only a single GCD is active. I can't imagine a scenario where even a single GCD couldn't give you the grunt you need to play a retro title.

adroc_thurston · Jul 27, 2023

Ajay said:
Geez, never would have thought of that. I’d be a bit annoyed if a couple old games I occasionally play broke honestly. But I’d get over it in return for big jumps in performance in future cards/games. Guess we’ve all been spoiled.

It's not about retro games.
It's about the fundamental ordering rules APIs have to this very day which make multidie scaling hard .

Ajay · Jul 27, 2023

adroc_thurston said:
It's not about retro games.
It's about the fundamental ordering rules APIs have to this very day which make multidie scaling hard .

Gah, so all the way up to Dx12 - swell. Thanks.

Joe NYC · Jul 27, 2023

eek2121 said:
If they could get MCM properly working with multiple GCDs, they could probably do this pretty easily, they just need to stop letting NVIDIA set the price/performance tiers.

I think they would use a single N6 die with 2 MCDs + Infinity Cache. with die size of 100-150 mm2

Then they would stack GCD of approximately same size (N5, N4, N3) on top of it. Which would be Navi43 (or Navi53). This part should be bordering non-brainer feasible, even in 2024.

The tricky pat would be to connect multiple of these stacked MCD/GCD combos (2, 3, 4) with extremely high bandwidth low latency link that would also have to be stacked and using Hybrid Bond. I have seen active silicon bridges (in AMD patents) spanning 2 dies, but it is unclear if this technology is feasible.

It could look something like this, it could have potentially only a 3 types of dies (or potentially 4 if they added an I/O die. I think that is the only way would approach competing on highest end, competing with 500-800 mm2 NVidia monolithic GPUs.

Joe NYC · Jul 27, 2023

DAPUNISHER said:
Plan ahead to keep/have a retro PC to handle those titles.

Different versions of DirectX on the same PC already handle this.

adroc_thurston · Jul 27, 2023

Ajay said:
so all the way up to Dx12 - swell

DX12 too.

Joe NYC said:
It could look something like this

It doesn't.

Joe NYC · Jul 27, 2023

adroc_thurston said:
It doesn't.

How many generations do you think AMD will then stick to monolithic GCDs? RDNA4, 5, 6?

Discussion RDNA4 + CDNA3 Architectures Thread

Golden Member

Lifer

Diamond Member

Lifer

Diamond Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Lifer

Diamond Member

Lifer

Diamond Member

Lifer

Super Moderator CPU Forum Mod and Elite Member

Diamond Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Diamond Member

Diamond Member