Discussion RDNA4 + CDNA3 Architectures Thread

DisEnchantment · Mar 23, 2022

With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits

History for llvm/lib/Target/AMDGPU - llvm/llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. - History for llvm/lib/Target/AMDGPU - llvm/llvm-project

github.com

Or Phoronix

More AMD "GFX940" Enablement Work Landing In LLVM - Phoronix

www.phoronix.com

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it

This is nuts, MI100/200/300 cadence is impressive.

Previous thread on CDNA2 and RDNA3 here

Question - Speculation: RDNA3 + CDNA2 Architectures Thread

Man I have been dying to make this one for a while now. First rumours for RDNA3 are here so new thread time! Just going to start off with this one for now: kopite7kimi on Twitter: "@VideoCardz Ah, I mean a simple mcm design with 10240 cores is not enough. Because the lift from RDNA2 to RDNA3...

forums.anandtech.com

soresu · Sep 27, 2023

Joe NYC said:
How much memory are you talking about, that would be needed? It seems increasing memory would have more of the benefit.

Probably ye it would.

Hopefully GDDR follows suit from the new 32 Gbit DDR5 modules for the RDNA4 generation, but I won't hold my breath.

Jaskalas · Sep 27, 2023

Ajay said:
Unfortunately. 8K is a fools errand on small(ish) monitors. It's really only useful for xtra large TVs (85" and above). But - the spec wars continue undaunted. Same with 500Hz refresh rate gaming monitors. Nobody's visual cortex processes images that fast.

I did not think I would appreciate 1440p as much as I have.
You'll have to drag me kicking and screaming back to 1080p.

SolidQ · Sep 27, 2023

PJVol said:
Well... n43 is dead, long live n48? )

You mean this?

Ajay · Sep 27, 2023

SolidQ said:
You mean this?

Geez, what a totally blown generation if true. Too bad AMD couldn't salvage one 60/64 WGP die for a decent performance gain - maybe even over the 7900XT at a lower price. Though, there is that AMD GPU engineer that posted a protest.

SolidQ · Sep 27, 2023

Ajay said:
Too bad AMD couldn't salvage one 60/64 WGP die for a decent performance gain

Maybe there will surprise, at least i'm hope for 256B/16GB

Frenetic Pony · Sep 27, 2023

For anyone thinking these leaks are true, you may have missed what appears to be a totally legitimate AMD engineer calling BS on all this and saying they were testing the new top end card as he was typing. So yeah, total BS.

Anway, just realized, Sony PS5 Pro leak put it at 60Cus (not 64). And the 3.5 leak puts it at 40CUs. So, erm, updated guessed specs because now it's 20CU per?:

Edit- Hmmm
(?)160CU N3E 384bit 32gbps 2.5ghz 450w 70% faster than a 4090.
100CU N3E 256bit 2.8ghz 350w 20% faster than a 4090
60CU N4P 192bit 3.0ghz 250w 4070ti < this < 4080
40CU N4P 128bit 3.1ghz 175w basically a 7700

jpiniero · Sep 27, 2023

Frenetic Pony said:
Anway, just realized, Sony PS5 Pro leak put it at 60Cus (not 64). And the 3.5 leak puts it at 40CUs. So, erm, updated guessed specs because now it's 20CU per?:

To be thorough the PS5 Pro does probably have 60 physically on the die but they will cut it down some bit.

TESKATLIPOKA · Sep 28, 2023

Frenetic Pony said:
Edit- Hmmm
(?)160CU N3E 384bit 32gbps 2.5ghz 450w 70% faster than a 4090.
100CU N3E 256bit 2.8ghz 350w 20% faster than a 4090
60CU N4P 192bit 3.0ghz 250w 4070ti < this < 4080
40CU N4P 128bit 3.1ghz 175w basically a 7700

Some of your performance projections are wrong.

7900 XTX vs RTX 4090
100% vs 124% (TPU at 4K raster)
This 160CU 2.5GHz GPU has only 67% more CU, at best It would be 67% faster than 7900XTX or 35% faster than RTX 4090, in reality less because of scaling.

100CU 2.8GHz also won't be 20% faster than 4090. Extra 4CU and 12% higher frequency would at best make it 17% faster, but in reality less. So still under RTX 4090.

60CU 3GHz would be at best a few % faster than 4070Ti, but 4080 would be 20% faster.

6700XT vs 7700XT
100% vs 128%
40CU 3.1GHz GPU could be more or less on the level of 7700XT, but fast GDDR7 is a must.

Timorous · Sep 28, 2023

Frenetic Pony said:
For anyone thinking these leaks are true, you may have missed what appears to be a totally legitimate AMD engineer calling BS on all this and saying they were testing the new top end card as he was typing. So yeah, total BS.

Anway, just realized, Sony PS5 Pro leak put it at 60Cus (not 64). And the 3.5 leak puts it at 40CUs. So, erm, updated guessed specs because now it's 20CU per?:

Edit- Hmmm
(?)160CU N3E 384bit 32gbps 2.5ghz 450w 70% faster than a 4090.
100CU N3E 256bit 2.8ghz 350w 20% faster than a 4090
60CU N4P 192bit 3.0ghz 250w 4070ti < this < 4080
40CU N4P 128bit 3.1ghz 175w basically a 7700

New top end card does not necessarily mean new enthusiast tier card. If the stack is truncated like it was with RDNA 1 then the top end card may be mid range.

What I would say though is that the supposed MI300 like GPU would be expensive. Like $2K + expensive to make it even worth existing and with that it would need to be by far the fastest thing money can buy so likely a 450W 295X2 style product without the crossfire drawbacks. I also do not see this kind of part having the perf/W to make the performance it needs viable at closer to 300W. It would need to be 2x the perf/W of RDNA 3 to stand a chance and that is just unrealistic IMO.

With that in mind it still leaves a 350W and below product stack so if we split the parts by power tiers we get something like 350W, 300W, 250W, 200W and 150W or there abouts.

a 50% perf/w gain at 350W would give you 4090 + 20% and given the perf/W miss on RDNA 3 I expect 50% is the minimum target. Based on TPU data the 7900XTX has 37% more perf/W than the 6900XT. AMD must be trying to make up for lost ground so I would not be surprised if the internal target must be around a 65% perf/W increase over RDNA 3 which would then net them a 125% perf/w gain over RDNA 2 which would put them back on track.

That would give the following as a rough performance stack vs current and an estimate of 5000 series.

5090 - peerless
350W ~ 4090 + 20% ~ 5080
300W ~ 4090 ~ 5070 ti
250W ~ 4080 / 7900XTX ~ 5070
200W ~ 4070Ti / 7900XT
175W ~ 4070 / 7800XT ~ 5060Ti
150W ~ 7700XT ~ 5060

Somewhat competitive depending on price, RT performance and feature set but nothing to compete with the 5090 imo.

eek2121 · Sep 28, 2023

Frenetic Pony said:
For anyone thinking these leaks are true, you may have missed what appears to be a totally legitimate AMD engineer calling BS on all this and saying they were testing the new top end card as he was typing. So yeah, total BS.

Anway, just realized, Sony PS5 Pro leak put it at 60Cus (not 64). And the 3.5 leak puts it at 40CUs. So, erm, updated guessed specs because now it's 20CU per?:

Edit- Hmmm
(?)160CU N3E 384bit 32gbps 2.5ghz 450w 70% faster than a 4090.
100CU N3E 256bit 2.8ghz 350w 20% faster than a 4090
60CU N4P 192bit 3.0ghz 250w 4070ti < this < 4080
40CU N4P 128bit 3.1ghz 175w basically a 7700

Engineers sign NDAs just like everyone else.

He likely was saying that to kill the conversation. It is a popular tactic used by lawyers when they aren’t in a court room to shut someone up.

Ajay · Sep 28, 2023

eek2121 said:
Engineers sign NDAs just like everyone else.

He likely was saying that to kill the conversation. It is a popular tactic used by lawyers when they aren’t in a court room to shut someone up.

He didn't disclose any IP, market strategy, financials, etc. I think he was venting. The question is, did AMD 1) realize the problems early enough to develop and alternative to the original design of 'BIG' RDNA4 to get 1 or two 60 WGP tiles into and enthusiast card, or 2) Did AMD already have a fall back plan given the know difficulty of fully implementing the multi-tile/chiplet architecture of RDNA4. In the case of the first alternative, the higher end cards will probably take a bit longer to get out. In the case of the second alternative, it may come out in a more timely manner if there were enough engineers working on it. Anyway, this is all 100% speculation until we get some update from AMD in a quarterly report or AMD public event.

Timorous · Sep 28, 2023

Ajay said:
He didn't disclose any IP, market strategy, financials, etc. I think he was venting. The question is, did AMD 1) realize the problems early enough to develop and alternative to the original design of 'BIG' RDNA4 to get 1 or two 60 WGP tiles into and enthusiast card, or 2) Did AMD already have a fall back plan given the know difficulty of fully implementing the multi-tile/chiplet architecture of RDNA4. In the case of the first alternative, the higher end cards will probably take a bit longer to get out. In the case of the second alternative, it may come out in a more timely manner if there were enough engineers working on it. Anyway, this is all 100% speculation until we get some update from AMD in a quarterly report or AMD public event.

Option 3 is the super RDNA 4 card was always a super halo tier product because anything that complex and using that much silicon needs to sell for a lot of money to make the margin remotely worthwhile and as such the rest of the stack below this tier was also planned and is still on schedule.

PJVol · Sep 28, 2023

eek2121 said:
He likely was saying that to kill the conversation.

He IS that "totally legitimate engineer"

Frenetic Pony said:
totally legitimate AMD engineer calling BS on all this and saying they were testing the new top end card as he was typing. So yeah, total BS

PJVol · Sep 28, 2023

Timorous said:
Option 3 is the super RDNA 4 card was always a super halo tier product because anything that complex and using that much silicon needs to sell for a lot of money to make the margin remotely worthwhile and as such the rest of the stack below this tier was also planned and is still on schedule.

I think the incentive to cancel such product is not margins related (don't see an issue selling it for 1.5K if the rest is assured), but yeah, the time (human resources) + money spent on R&D. It's too expensive atm in this sense.

jpiniero · Sep 28, 2023

PJVol said:
He IS that "totally legitimate engineer"

I could see that he is working on RDNA5 and doesn't realize people are talking about RDNA4.

Ajay · Sep 28, 2023

jpiniero said:
I could see that he is working on RDNA5 and doesn't realize people are talking about RDNA4.

Wish I could find his original statement. Search has failed me.

PJVol · Sep 28, 2023

jpiniero said:
I could see that he is working on RDNA5

By “he” I mean the author of the post I quoted, based on “So yeah, total BS”.
Sorry for not being clear )

Tigerick · Sep 28, 2023

Timorous said:
AMD must be trying to make up for lost ground so I would not be surprised if the internal target must be around a 65% perf/W increase over RDNA 3 which would then net them a 125% perf/w gain over RDNA 2 which would put them back on track.

AMD totally REDACTED up the design of N31, with such a huge memory bandwidth increasement, they only increase 20% CU. No wonder they have to price it below $1,000. With GDDR7 support on upcoming RDNA5, hopefully AMD will bump up amount of CU to align with the improvement of bandwidth. Let's assume the clock speeds are the same throughout generation, my rough estimate of CU cores needed to feed the bandwidth is slightly higher than your 125% figures, then it hit me that 210 CU might be the number cause there was rumor about 270CU before, we shall see...

Timorous said:
That would give the following as a rough performance stack vs current and an estimate of 5000 series.

5090 - peerless
350W ~ 4090 + 20% ~ 5080
300W ~ 4090 ~ 5070 ti
250W ~ 4080 / 7900XTX ~ 5070
200W ~ 4070Ti / 7900XT
175W ~ 4070 / 7800XT ~ 5060Ti
150W ~ 7700XT ~ 5060

Somewhat competitive depending on price, RT performance and feature set but nothing to compete with the 5090 imo.

Yeap, based on my estimate 5070Ti or whatever name NV given, would perform similarly with current 4090 at much lower price points. This model going to offer many incentives for current RTX3080 users to upgrade if NV prices it at $799.

7700XT and 7800XT are surprisely competitive against NV offerings, that's why I think AMD will keep selling these models for 2 years until N52/N53 comes up...

Ajay · Sep 28, 2023

Tigerick said:
Yeap, based on my estimate 5070Ti or whatever name NV given, would perform similarly with current 4090 at much lower price points. This model going to offer many incentives for current RTX3080 users to upgrade if NV prices it at $799.

Going by recent trends, if the 5070Ti has performance that good, it'll be $1000 - $1200 US. And, it'll probably on have 16GB of RAM.

blackangus · Sep 28, 2023

Ajay said:
Going by recent trends, if the 5070Ti has performance that good, it'll be $1000 - $1200 US. And, it'll probably on have 16GB of RAM.

Naa 8GB-12GB is enough, I mean its an x070 class GPU. <sarcasm>

adroc_thurston · Sep 28, 2023

Tigerick said:
AMD totally fucked up the design of N31, with such a huge memory bandwidth increasement, they only increase 20% CU

Because the thing is missing 30% the clockrate.

Tigerick said:
hopefully AMD will bump up amount of CU to align with the improvement of bandwidth

mmmmmmore speeeeeeeeeed.

Tigerick said:
Yeap, based on my estimate 5070Ti or whatever name NV given, would perform similarly with current 4090 at much lower price points.

N3e is barely a shrink off N4, especially for SRAM-heavy designs like modern GPU.

Mopetar · Sep 28, 2023

AMD put the infinity cache on the MCMs, which are using N6 so they aren't eating a lot of extra cost for little scaling. I don't know if it's necessary, but using v-cache on those seems like a good way to increase the cache size and give users a reason to drop extra $$$ on a model that's got extra bells and whistles.

adroc_thurston · Sep 28, 2023

Mopetar said:
but using v-cache on those seems like a good way to increase the cache size and give users a reason to drop extra $$$ on a model that's got extra bells and whistles.

yea but Navi31X3D is loooong dead.

Mopetar · Sep 28, 2023

I don't see much reason for them to move memory chiplets off of N6 for any future products that might use them.

The MCM is pretty much all IO and cache and isn't going to scale much or at all with a node shrink.

Joe NYC · Sep 28, 2023

Looking at the RDNA 4c design, AMD moved on from MCM in the future chiplet design. There will not be individual MCMs.

In RDNA 4c, the base die took over the functionality of 2 MCMs, including their Infinity / MALL caches.

The concept is still similar to MCM, but implementation will be different (in RDNA 5, chiplet based). And the odds are, it will continue to be on TSMC N6.

Discussion RDNA4 + CDNA3 Architectures Thread

Golden Member

Diamond Member

Lifer

Golden Member

Lifer

Golden Member

Senior member

Lifer

Platinum Member

Golden Member

Diamond Member

Lifer

Golden Member

Senior member

Senior member

Lifer

Lifer

Senior member

Senior member

Lifer

Senior member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member