Discussion RDNA4 + CDNA3 Architectures Thread

DisEnchantment · Mar 23, 2022

With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits

History for llvm/lib/Target/AMDGPU - llvm/llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. - History for llvm/lib/Target/AMDGPU - llvm/llvm-project

github.com

Or Phoronix

More AMD "GFX940" Enablement Work Landing In LLVM - Phoronix

www.phoronix.com

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it

This is nuts, MI100/200/300 cadence is impressive.

Previous thread on CDNA2 and RDNA3 here

Question - Speculation: RDNA3 + CDNA2 Architectures Thread

Man I have been dying to make this one for a while now. First rumours for RDNA3 are here so new thread time! Just going to start off with this one for now: kopite7kimi on Twitter: "@VideoCardz Ah, I mean a simple mcm design with 10240 cores is not enough. Because the lift from RDNA2 to RDNA3...

forums.anandtech.com

beginner99 · Aug 31, 2023

Joe NYC said:
If the revenue is from the datacenter graphics cards is going to be 100 billion, there has to be 10x that, or 1 trillion revenue generated by the buyers of these cards, and of that, 100 billion profit to be able to afford next years' worth of cards.

I don't see a trillion in new revenue being generated...

I agree.But: The big buyers are likley cloud provider including Google, MS, Amazon. MS for example for all their co-pilot offers. As far as I'm aware they are all also offering or starting to offer LLM instances which you can fine-tune with your own documents. So many companies will pay for it. So it's entirley possible MS and co can make a lot of money from this. Question is how long it takes for the service consumers to decide if the price is worth it or not. That is very hard to actually assess as LLMs can make a lot of stuff simpler like writing emails, making presentations and not only save time but also make the workers happier by reducing busy-work. Just having modern tools can attract better talent that helps otherwise. Really hard to quanitfy how much value AI/LLMs will actually generate.

As non-native english speaker, and let's not forget most of the world is in that bracket, support in writing "difficult" emails can save a ton of time.

soresu · Aug 31, 2023

Joe NYC said:
In the meantime, Huawei is claiming to have an A100 level home grown card.

Performance wise possibly, but certainly not in their perf/watt envelope unless TSMC or Samsung are fabbing it.

Perhaps if they have some sort of chiplet/MCM design that uses more silicon running at lower clocks/voltage it might be feasible though.

soresu · Aug 31, 2023

coercitiv said:
When will people stop lending their ear to these channels?

I routinely lend them my rear 🌙😂

TESKATLIPOKA · Aug 31, 2023

Ajay said:
The dual issue SIMDs aren't really straight up dual issue. The ability to dual issue is limited to certain conditions. I hope RDNA5 improves on this. NV has had a well functioning dual issue ALU unit for a while.

~70% of game code if FP and ~30% is INT - NV uses one FP plus a mixed FP/INT unit (IIRC) to cash in on this fact.

Yeah. AMD needs to improve dual-issue. It brings very little to performance. Nvidia's approach brought them ~25% of performance, It helped a lot in fighting off RDNA2's higher clocks.

Joe NYC · Aug 31, 2023

soresu said:
Performance wise possibly, but certainly not in their perf/watt envelope unless TSMC or Samsung are fabbing it.

Perhaps if they have some sort of chiplet/MCM design that uses more silicon running at lower clocks/voltage it might be feasible though.

A100 was fabbed at TSMC exclusively, not Samssung.

Whatever the Chinese chip is capable of, the US sanctions secured it a big and growing market for it.

There is now a new round of sanctions on AI cards announced today. After countries who used to be the staunchest US allies (Saudi Arabia, UAE, Egypt) switched sides from US to China / Russia side, the US announced sanctions of "certain" Mid-East countries.

In other words, the US just shrunk the TAM that NVidia, AMD, Intel can sell to and expanded TAM for the Chinese firms.

If somebody doubts this is going to happen, Saudi Arabia is announced contract with Chinese company to build nuclear power plant. So, this is not one off, but a long term trend.

One after another, these "neutral" countries in the world will start switching to Chinese suppliers. Simple reason: China is not stupid to sanction its own customers.

TESKATLIPOKA · Aug 31, 2023

Joe NYC said:
A100 was fabbed at TSMC exclusively, not Samssung.

Whatever the Chinese chip is capable of, the US sanctions secured it a big and growing market.

There is now a new round of sanctions on AI cards announced today. After countries who used to be the staunchest US allies (Saudi Arabia, UAE, Egypt) switched sides from US to China / Russia side, the US announced sanctions of "certain" Mid-East countries.

In other words, the US just shrunk the TAM that NVidia, AMD, Intel can sell to and expanded TAM for the Chinese firms.

If somebody doubts this is going to happen, Saudi Arabia is announced contract with Chinese company to build nuclear power plant. So this is not one off, but a long term trends.

One after another, these "neutral" countries in the world will start switching to Chinese suppliers. Simple reason: China is not stupid to sanction its own customers.

Why don't you make another thread for this?

After RDNA3 not ending well for AMD. I think they will have to be extra careful about RDNA4.
I wonder what will we see.
Higher clocks -> 3.5GHz?
Better dual-issue -> more HW used for It?
How much Vram -> 12GB would be low. 16GB would be good, but also not spectacular.
What about GDDR7?

Joe NYC · Aug 31, 2023

TESKATLIPOKA said:
Why don't you make another thread for this?

There is a CDNA3 thread, where this should go to, but it has not been very active.

TESKATLIPOKA · Aug 31, 2023

Joe NYC said:
There is a CDNA3 thread, where this should go to, but it has not been very active.

Is there a separate CDNA3 thread? Totally forgot about that.

jpiniero · Aug 31, 2023

TESKATLIPOKA said:
How much Vram -> 12GB would be low.

See I think if they are sticking to 200 mm2 for the bigger one, it will be basically a N33 replacement. Maybe even something like take N33 and just cram in however much RDNA4 can fit with the N4 logic shrink and add GDDR7.

Joe NYC · Aug 31, 2023

TESKATLIPOKA said:
Is there a separate CDNA3 thread? Totally forgot about that.

This is a CDNA3 / RDNA4 combined thread.

Given what we know now about Mi300 and what AMD was planning for Navi4c, about how many similarities there may be, it was quite prescient for @DisEnchantment to start it as a combined thread.

TESKATLIPOKA · Aug 31, 2023

Joe NYC said:
This is a CDNA3 / RDNA4 combined thread.

Given what we know now about Mi300 and what AMD was planning for Navi4c, about how many similarities there may be, it was quite prescient for @DisEnchantment to start it as a combined thread.

Thanks, I learned another new English word today.
Imagine having one Mi300 at your home.

TESKATLIPOKA · Aug 31, 2023

jpiniero said:
See I think if they are sticking to 200 mm2 for the bigger one, it will be basically a N33 replacement. Maybe even something like take N33 and just cram in however much RDNA4 can fit with the N4 logic shrink and add GDDR7.

There are supposedly 2 chips. I would expect the smaller one to have 12GB and the bigger to have more.

Frenetic Pony · Aug 31, 2023

jpiniero said:
See I think if they are sticking to 200 mm2 for the bigger one, it will be basically a N33 replacement. Maybe even something like take N33 and just cram in however much RDNA4 can fit with the N4 logic shrink and add GDDR7.

Heard they're not, at least not as such. Packaging was the problem with 7600 having a chiplet arch, and TSMC has been aggressively expanding that.

I can see RDNA4 being 40 and 60 CU's each. Based on comments made today by AMD, I wouldn't expect any revolution in efficiency, mostly improvements in raytracing and maybe a minor bump in clockspeed thanks to tweaks in scheduler/node/cache (2.8ghz boost, 3?).

So, call it 192bit GDDR7 12gb, 60CU. If it can reach 4070ti performance (RT included), instead of a 4070, for the same $500 price point that would be incredibly compelling for a mid ranger. What I don't see right now (RDNA3) is AMD really pushing the power/clocks. Push this hypothetical card to double stacked sram (96mb) 300 watts power and a 3ghz boost, and you've got a hypothetical 4080 competitor you can charge $600 for. Then the cut down version competes with a 4070ti for $500.

Which leaves the 40CU to beat the 4060ti at $400 and 12gb, and a cut down 128bit 8gb $300 version after that.

Note: AMD's "RDNA4 improves RT, not efficiency that much" lends credence to recent leaks about RDNA4 being skipped a bit to hit GFX5 sooner (2025 as an outside chance?). If so, looking forward to actual competition again. RDNA2 was a bit overly maligned, but the only thing 3 managed to accomplish of note is a cost a down.

Joe NYC · Aug 31, 2023

TESKATLIPOKA said:
Why don't you make another thread for this?

After RDNA3 not ending well for AMD. I think they will have to be extra careful about RDNA4.
I wonder what will we see.
Higher clocks -> 3.5GHz?
Better dual-issue -> more HW used for It?
How much Vram -> 12GB would be low. 16GB would be good, but also not spectacular.
What about GDDR7?

My guess would be that the low end card - Navi 44 - will be similar number of CUs as Navi 33, but it's performance will be unclogged, with higher utilization of resources and higher clock speeds.

If Navi 23 is 200 mm2 on N6, Navi 44 on N4 could be tiny, somewhere between 100 and 150 mm2.

And the bigger card, Navi 43 - probably 2x Navi 44.

Also, a proper performance per power scaling. All that RDNA3 was supposed to deliver and then some on top of that.

In effect something like RDNA 3.5 or RDNA 3.75. If launched early enough, early 2024, they could achieve some success in the market.

jpiniero · Aug 31, 2023

TESKATLIPOKA said:
There are supposedly 2 chips. I would expect the smaller one to have 12GB and the bigger to have more.

See I think the basic chiplet model was where the "have more" comes in.

Joe NYC said:
My guess would be that the low end card - Navi 44 - will be similar number of CUs as Navi 33, but it's performance will be unclogged, with higher utilization of resources and higher clock speeds.

If Navi 23 is 200 mm2 on N6, Navi 44 on N4 could be tiny, somewhere between 100 and 150 mm2.

Remember SRAM/IO scaling blows on N4.

Ajay · Aug 31, 2023

Frenetic Pony said:
I can see RDNA4 being 40 and 60 CU's each. Based on comments made today by AMD, I wouldn't expect any revolution in efficiency, mostly improvements in raytracing and maybe a minor bump in clockspeed thanks to tweaks in scheduler/node/cache (2.8ghz boost, 3?).

LOL - we really screwed up on power efficiency with RDNA3, but it's okay because gamers don't really care. While partly true, it's also revisionist.

Joe NYC said:
In effect something like RDNA 3.5 or RDNA 3.75. If launched early enough, early 2024, they could achieve some success in the market.

Well, that's not happening. Curious why you mentioned it.

Joe NYC · Aug 31, 2023

Ajay said:
Well, that's not happening. Curious why you mentioned it.

Sorry, I think I was a very unclear. I meant to call RDNA4 a fixed RDNA3, or RDNA 3.75

jpiniero · Aug 31, 2023

I'll add that if it does come out about a year from now (not a bad guesstimate), the best we are talking about from NV is basically the same Ada products as today but with GDDR7.

Ajay · Aug 31, 2023

jpiniero said:
I'll add that if it does come out about a year from now (not a bad guesstimate), the best we are talking about from NV is basically the same Ada products as today but with GDDR7.

I figure Nvidia will still with whatever their schedule is and release the follow on to Ada. Undoubtedly, NV didn't know how bad AMD was going to falter on RDNA4 before they were ready for TO themselves, or nearly so. I suppose they could delay part of their line up to milk Ada a bit, but not forever. RDNA5 better get it right.

TESKATLIPOKA · Sep 1, 2023

Frenetic Pony said:
I can see RDNA4 being 40 and 60 CU's each. Based on comments made today by AMD, I wouldn't expect any revolution in efficiency, mostly improvements in raytracing and maybe a minor bump in clockspeed thanks to tweaks in scheduler/node/cache (2.8ghz boost, 3?).

So, call it 192bit GDDR7 12gb, 60CU. If it can reach 4070ti performance (RT included), instead of a 4070, for the same $500 price point that would be incredibly compelling for a mid ranger. What I don't see right now (RDNA3) is AMD really pushing the power/clocks. Push this hypothetical card to double stacked sram (96mb) 300 watts power and a 3ghz boost, and you've got a hypothetical 4080 competitor you can charge $600 for. Then the cut down version competes with a 4070ti for $500.

Which leaves the 40CU to beat the 4060ti at $400 and 12gb, and a cut down 128bit 8gb $300 version after that.

I can see either 40CU(2SE) and 60CU(3SE) with 20CU/SE or 32CU(2SE) and 64CU(4SE) with 16CU/SE.

You are underestimating the performance difference.
RX7800XT at 2.45GHz is ~RX6800XT raster level of performance.
RTX4070Ti is 21% faster.
RTX4080 is 52% faster.
Unless RDNA4 will offer higher IPC then the bigger chip will have to have higher clocks.
To compete against 4070Ti It needs 3GHz.
To compete against 4080 It needs 3.75GHz.

Let's say they will manage It somehow including RT, then your price is totally unrealistic.
4070Ti is $799, why would AMD ask only $499?
~~4070Ti~~ 4080 is $1199, why would AMD ask only $599?

I agree, 40CU should perform well against 4060Ti.

edit: fixed a mistake
edit2: now It's fixed.

Tigerick · Sep 1, 2023

TESKATLIPOKA said:
I can see either 40CU(2SE) and 60CU(3SE) with 20CU/SE or 32CU(2SE) and 64CU(4SE) with 16CU/SE.

You are underestimating the performance difference.
RX7800XT at 2.45GHz is ~RX6800XT raster level of performance.
RTX4070Ti is 21% faster.
RTX4080 is 52% faster.
Unless RDNA4 will offer higher IPC then the bigger chip will have to have higher clocks.
To compete against 4070Ti It needs 3GHz.
To compete against 4080 It needs 3.75GHz.

Let's say they will manage It somehow including RT, then your price is totally unrealistic.
4070Ti is $799, why would AMD ask only $499?
4070Ti is $1199, why would AMD ask only $599?

I agree, 40CU should perform well against 4060Ti.

I think you are calculating backward; if N43 is performing 20% better due to real dual issue design then nVidia will response by dropping price like current 4060Ti 16GB or come out Super version to counter N43 offering. And remember by the time next year 7900XT should drop to around $700, so N43 pretty much is catered for replacing current N32 position.

Anyhow, I suspected AMD will position N43 and N44 as mobile first; with monolithic design and lower clock, AMD could potentially limit N43's TDP to around 150W, that is savings of about 100W compared to N32 which is crucial for notebook platform but not for desktop...we shall see

TESKATLIPOKA · Sep 1, 2023

Tigerick said:
I think you are calculating backward; if N43 is performing 20% better due to real dual issue design then nVidia will response by dropping price like current 4060Ti 16GB or come out Super version to counter N43 offering. And remember by the time next year 7900XT should drop to around $700, so N43 pretty much is catered for replacing current N32 position.

Anyhow, I suspected AMD will position N43 and N44 as mobile first; with monolithic design and lower clock, AMD could potentially limit N43's TDP to around 150W, that is savings of about 100W compared to N32 which is crucial for notebook platform but not for desktop...we shall see

We don't know performance or specs of N4* and by the time It's released, maybe Nvidia will release Its next gen.

Maybe It will end up as a 60CU GPU but with 50% higher clocks, that's already very close to 7900XTX in performance.
Or It will have less CU at lower clocks, who knows.
We only know that It's a monolith and the size is supposedly 200-250mm2.

In laptops, you don't have 250W GPUs. You are limited to 175W for Nvidia and AMD stopped at 165W with RDNA2.
Mobile N32 If released will be the same, end up to at comparable TDP. Of course, performance will drop because of lower clocks.

Ajay · Sep 1, 2023

TESKATLIPOKA said:
We don't know performance or specs of N4* and by the time It's released, maybe Nvidia will release Its next gen.

Maybe It will end up as a 60CU GPU but with 50% higher clocks, that's already very close to 7900XTX in performance.
Or It will have less CU at lower clocks, who knows.
We only know that It's a monolith and the size is supposedly 200-250mm2.

In laptops, you don't have 250W GPUs. You are limited to 175W for Nvidia and AMD stopped at 165W with RDNA2.
Mobile N32 If released will be the same, end up to at comparable TDP. Of course, performance will drop because of lower clocks.

I doubt that 50% higher clocks was the plan. Would have been murderous power consumption on the real enthusiast cards. What I do wonder, is where are people getting this 200 mm^2 number for the larger GPU die. Seems like we shouldn't know at all this far out.

jpiniero · Sep 1, 2023

Ajay said:
What I do wonder, is where are people getting this 200 mm^2 number for the larger GPU die. Seems like we shouldn't know at all this far out.

The idea is that the chiplet models were going to handle the higher tiers. They could shape a bigger die... but with AI hype, it's not hard to think they would rather use their resources to get MI300/400 out sooner.

TESKATLIPOKA · Sep 1, 2023

Ajay said:
I doubt that 50% higher clocks was the plan. Would have been murderous power consumption on the real enthusiast cards. What I do wonder, is where are people getting this 200 mm^2 number for the larger GPU die. Seems like we shouldn't know at all this far out.

It looks like for RDNA3 a lot higher clock was intended(3-3.4GHz)TweakTown, so It wouldn't be surprising If RDNA4 aimed even higher.

89532_05_amd-radeon-rx-7000-rdna-3-infinity-links-9-2gb-10x-higher-than-ryzen-epyc_full.jpg

If the architecture is designed to allow those clocks, then power consumption doesn't need to be murderous. Ok, in the case of that cancelled chiplet GPU N4X the power consumption would be very high, but also the performance.

@adroc_thurston said the size is 200-250mm2. Don't know If It's true or not.

Discussion RDNA4 + CDNA3 Architectures Thread

Golden Member

Diamond Member

Diamond Member

Diamond Member

Platinum Member

Diamond Member

Platinum Member

Diamond Member

Platinum Member

Lifer

Diamond Member

Platinum Member

Platinum Member

Senior member

Diamond Member

Lifer

Lifer

Diamond Member

Lifer

Lifer

Platinum Member

Senior member

Platinum Member

Lifer

Lifer

Platinum Member