Discussion RDNA4 + CDNA3 Architectures Thread

Page 9 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DisEnchantment

Golden Member
Mar 3, 2017
1,777
6,791
136
1655034287489.png
1655034259690.png

1655034485504.png

With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits
Or Phoronix

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it :grimacing:

This is nuts, MI100/200/300 cadence is impressive.

1655034362046.png

Previous thread on CDNA2 and RDNA3 here

 
Last edited:

Joe NYC

Diamond Member
Jun 26, 2021
3,634
5,174
136
Rick Bergman and David Wang stated that cost was a major factor in avoiding direct competition with the 4090. They're going to commit to the design and improve everywhere they can.

RDNA 3 was relatively cost-effective model for 2022.

Given all the changes and possibilities in packaging technologies, not advancing RDNA 4 design at all, and producing 2022 replica in 2025 would be very disappointing.
 
  • Like
Reactions: Tlh97

Ajay

Lifer
Jan 8, 2001
16,094
8,114
136
RDNA 3 was relatively cost-effective model for 2022.

Given all the changes and possibilities in packaging technologies, not advancing RDNA 4 design at all, and producing 2022 replica in 2025 would be very disappointing.
Yes it would be. Hopefully, AMD is at least willing to make an RX 8900 XTX for $1200 and actually beat the RTX 5080. There seem to be a reasonable number of people who shell $1600+ out for Nvidia's 4090. I mean, take the fight to Nvidia AMD!!

[rant]And I hope they fix the Win11 on AM4 stuttering problem for RDNA4- seriously, that’s 100% your platform. I wound up going back to NV because I was so miffed over it.[/rant]
 

GodisanAtheist

Diamond Member
Nov 16, 2006
8,297
9,657
136
I can't believe AMD executives actually have the balls to say "We could have completed but we didn't [cause our tech wasn't performing to snuff]" on record where people can hear them.
 

Joe NYC

Diamond Member
Jun 26, 2021
3,634
5,174
136
Yes it would be. Hopefully, AMD is at least willing to make an RX 8900 XTX for $1200 and actually beat the RTX 5080. There seem to be a reasonable number of people who shell $1600+ out for Nvidia's 4090. I mean, take the fight to Nvidia AMD!!

[rant]And I hope they fix the Win11 on AM4 stuttering problem for RDNA4- seriously, that’s 100% your platform. I wound up going back to NV because I was so miffed over it.[/rant]
If they could make a modular GPU, where 1 module would be 8600, to 3 modules being current Navi31 size, they could add a 4th module.

That would be one way to compete on the highest end without having to divert inordinate resources for it. I think that is the only hope for people expecting AMD competition on the high end.

Same way AMD did not make 500mm2 RDNA 3 GPU now will apply to RDNA 4 models next year, if RDNA 4 is not modular.
 

Ajay

Lifer
Jan 8, 2001
16,094
8,114
136
Same way AMD did not make 500mm2 RDNA 3 GPU now will apply to RDNA 4 models next year, if RDNA 4 is not modular.
If they don't make RDNA4 GCDs modular, they will have to go bigger. That, or just cede the high end to Nvidia again.
If AMD hasn't gotten an ultrawide interconnect system working for RDNA4, I hope they got the perf/watt up high enough to make a larger Die.

I can't believe AMD executives actually have the balls to say "We could have completed but we didn't [cause our tech wasn't performing to snuff]" on record where people can hear them.
Yeah, I wasn't sure what to make of the short statements. Like 'hey, people are better off spending the extra money on a better CPU' - from the GPU division. So.... if Intel is 'better' at the moment, you'd rather gamers buy a better Intel CPU then pay more for a better AMD GPU??? Anyway, just weird all round. It's like someone just like Raja took over the GPU division when he left; but without all the boasts.
 
  • Like
Reactions: Joe NYC

Mopetar

Diamond Member
Jan 31, 2011
8,487
7,726
136
Yeah, you're not going to be able to cool a 4080 or even really a 4060 with a card that small.

If anything I'd anticipate it making low profile cards easier so we might see more of those.
 

adroc_thurston

Diamond Member
Jul 2, 2023
7,073
9,817
106
Using LDS to augment L0 (depicted as L1 in pic) when unused and sharing L0 across SIMDs in a WGP appeared in a patent below.
This proposal is a whole lot more insane than what you've said, since it calls for eschewing CU/SM-private memories altogether, instead opting for a fat, shared, segmented regfile that dynamically allocates parts of itself as L1 and LDS/shmem.
Also given who filled it and when it's probably inside MI300 already.
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,665
2,530
136
As an aside - some people think the next round of consumer GPUs won’t hit till 2025. I have two opinions on this. First, I don’t see AMD or NV doing much to change the release schedules of the next generation of AIBs - in part because AIB manufacturers will want something new and shiny to sell, especially given the lackluster sales of this gen. Second, AMD really needs an improvement over the RDNA3 GPUs and, IMHO, can’t afford a delay. Nvidia surely isn’t going to sit still and likely be put in the rear view mirror when AMD top RDNA4 GPU is released.

nV has a lot of room for improving their product stack by rebranding every chip into one tier below it but in the 50-series. So much so, that unless AMD really, really succeeds with RDNA4, they'd have the better product line.
 

eek2121

Diamond Member
Aug 2, 2005
3,408
5,046
136
AMD will need to have a perf/$ leader (in all metrics, including RT) in order to succeed, and they will need to do it across multiple generations to win back mindshare/marketshare. Just being the “me too” second choice won’t get them anywhere.

Thus far they have shown no interest in doing this.

I honestly don’t think they need the absolute fastest card, but the cards they do release should be much faster (and more efficient) than anything else in that price bracket.

If they could get MCM properly working with multiple GCDs, they could probably do this pretty easily, they just need to stop letting NVIDIA set the price/performance tiers.

Oh and they need to invest more in their first party cards. Early cards this gen had cooling issues. NVIDIA’s FE cards are high quality. The 4090FE is the smallest and quietist card you can get. AMD needs a similar high quality design.
 

adroc_thurston

Diamond Member
Jul 2, 2023
7,073
9,817
106
AMD will need to have a perf/$ leader
Not happening.
The moment they're winning it's pocket all the margin time.
to win back mindshare/marketshare
You win it by building the biggest halo part out there and they just gotta be willing.
I honestly don’t think they need the absolute fastest card
That's the only way to win in children-driven markets like client dGP.
If they could get MCM properly working with multiple GCDs
MCM GPU means abhorrently expensive parts like MI300, not better value.
 

Ajay

Lifer
Jan 8, 2001
16,094
8,114
136
AMD will need to have a perf/$ leader (in all metrics, including RT) in order to succeed, and they will need to do it across multiple generations to win back mindshare/marketshare. Just being the “me too” second choice won’t get them anywhere.

Thus far they have shown no interest in doing this.

Yea, though parity with RT is sufficient, imho. Having competitive or better raster is the real game change still, as RT adoption is still limited.
I honestly don’t think they need the absolute fastest card, but the cards they do release should be much faster (and more efficient) than anything else in that price bracket.

NV has shown the benefits, at least marketing wise, over and over again. And yet, as you mention above, for some reason AMD doesn’t care enough to really go for it. Maybe it is because because getting a high speed fabric on package is really hard - plus trying to limit the cross traffic between the GCDs isn’t working well with the current RDNA architecture.
If they could get MCM properly working with multiple GCDs, they could probably do this pretty easily, they just need to stop letting NVIDIA set the price/performance tiers.

Not easily. Trying to coalesce data and instructions (particularly in cache) to one GCD as much as possible will be hard. Finding the right partitioning system to minimize cross communication while maximizing performance is tricky stuff. It's the only way to max out performance in a multi-GCD GPU. As fast as the interconnects can be, it just not the same as on-die performance. It's already hard enough to do this on one die with one WGP needing access to cache allocated to a different WGP (@DisEnchantment pointed this out).
Oh and they need to invest more in their first party cards. Early cards this gen had cooling issues. NVIDIA’s FE cards are high quality. The 4090FE is the smallest and quietist card you can get. AMD needs a similar high quality design.
I disagree. I think that it's fine that AMD prioritizes their AIBs. They have a smaller market share so 'stealing' sales from the AIBs would be counter-productive. IMHO.
 

adroc_thurston

Diamond Member
Jul 2, 2023
7,073
9,817
106
Maybe it is because because getting a high speed fabric on package is really hard - plus trying to limit the cross traffic between the GCDs isn’t working well with the current RDNA architecture.
That's not really the hard part.
 

Ajay

Lifer
Jan 8, 2001
16,094
8,114
136
The graphics APIs themselves.
They have those weirdly serial parts that never quite evolved from their mid 90s origins.
Geez, never would have thought of that. I’d be a bit annoyed if a couple old games I occasionally play broke honestly. But I’d get over it in return for big jumps in performance in future cards/games. Guess we’ve all been spoiled.
 

Saylick

Diamond Member
Sep 10, 2012
4,035
9,454
136
Geez, never would have thought of that. I’d be a bit annoyed if a couple old games I occasionally play broke honestly. But I’d get over it in return for big jumps in performance in future cards/games. Guess we’ve all been spoiled.
If an MCM approach doesn't scale with retro games due to the use of an old API, I don't see why there couldn't just be a fallback mode where only a single GCD is active. I can't imagine a scenario where even a single GCD couldn't give you the grunt you need to play a retro title.
 

adroc_thurston

Diamond Member
Jul 2, 2023
7,073
9,817
106
Geez, never would have thought of that. I’d be a bit annoyed if a couple old games I occasionally play broke honestly. But I’d get over it in return for big jumps in performance in future cards/games. Guess we’ve all been spoiled.
It's not about retro games.
It's about the fundamental ordering rules APIs have to this very day which make multidie scaling hard .
 

Joe NYC

Diamond Member
Jun 26, 2021
3,634
5,174
136
If they could get MCM properly working with multiple GCDs, they could probably do this pretty easily, they just need to stop letting NVIDIA set the price/performance tiers.

I think they would use a single N6 die with 2 MCDs + Infinity Cache. with die size of 100-150 mm2

Then they would stack GCD of approximately same size (N5, N4, N3) on top of it. Which would be Navi43 (or Navi53). This part should be bordering non-brainer feasible, even in 2024.

The tricky pat would be to connect multiple of these stacked MCD/GCD combos (2, 3, 4) with extremely high bandwidth low latency link that would also have to be stacked and using Hybrid Bond. I have seen active silicon bridges (in AMD patents) spanning 2 dies, but it is unclear if this technology is feasible.

It could look something like this, it could have potentially only a 3 types of dies (or potentially 4 if they added an I/O die. I think that is the only way would approach competing on highest end, competing with 500-800 mm2 NVidia monolithic GPUs.

1690487618145.png