Discussion RDNA4 + CDNA3 Architectures Thread

Page 28 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DisEnchantment

Golden Member
Mar 3, 2017
1,584
5,685
136
1655034287489.png
1655034259690.png

1655034485504.png

With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits
Or Phoronix

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it :grimacing:

This is nuts, MI100/200/300 cadence is impressive.

1655034362046.png

Previous thread on CDNA2 and RDNA3 here

 
Last edited:

beginner99

Diamond Member
Jun 2, 2009
5,205
1,579
136
If the revenue is from the datacenter graphics cards is going to be 100 billion, there has to be 10x that, or 1 trillion revenue generated by the buyers of these cards, and of that, 100 billion profit to be able to afford next years' worth of cards.

I don't see a trillion in new revenue being generated...
I agree.But: The big buyers are likley cloud provider including Google, MS, Amazon. MS for example for all their co-pilot offers. As far as I'm aware they are all also offering or starting to offer LLM instances which you can fine-tune with your own documents. So many companies will pay for it. So it's entirley possible MS and co can make a lot of money from this. Question is how long it takes for the service consumers to decide if the price is worth it or not. That is very hard to actually assess as LLMs can make a lot of stuff simpler like writing emails, making presentations and not only save time but also make the workers happier by reducing busy-work. Just having modern tools can attract better talent that helps otherwise. Really hard to quanitfy how much value AI/LLMs will actually generate.

As non-native english speaker, and let's not forget most of the world is in that bracket, support in writing "difficult" emails can save a ton of time.
 
  • Like
Reactions: Joe NYC

soresu

Platinum Member
Dec 19, 2014
2,525
1,711
136
In the meantime, Huawei is claiming to have an A100 level home grown card.
Performance wise possibly, but certainly not in their perf/watt envelope unless TSMC or Samsung are fabbing it.

Perhaps if they have some sort of chiplet/MCM design that uses more silicon running at lower clocks/voltage it might be feasible though.
 
  • Like
Reactions: Kaluan

TESKATLIPOKA

Platinum Member
May 1, 2020
2,312
2,798
106
The dual issue SIMDs aren't really straight up dual issue. The ability to dual issue is limited to certain conditions. I hope RDNA5 improves on this. NV has had a well functioning dual issue ALU unit for a while.

~70% of game code if FP and ~30% is INT - NV uses one FP plus a mixed FP/INT unit (IIRC) to cash in on this fact.
Yeah. AMD needs to improve dual-issue. It brings very little to performance. Nvidia's approach brought them ~25% of performance, It helped a lot in fighting off RDNA2's higher clocks.
 

Joe NYC

Golden Member
Jun 26, 2021
1,864
2,147
106
Performance wise possibly, but certainly not in their perf/watt envelope unless TSMC or Samsung are fabbing it.

Perhaps if they have some sort of chiplet/MCM design that uses more silicon running at lower clocks/voltage it might be feasible though.
A100 was fabbed at TSMC exclusively, not Samssung.

Whatever the Chinese chip is capable of, the US sanctions secured it a big and growing market for it.

There is now a new round of sanctions on AI cards announced today. After countries who used to be the staunchest US allies (Saudi Arabia, UAE, Egypt) switched sides from US to China / Russia side, the US announced sanctions of "certain" Mid-East countries.

In other words, the US just shrunk the TAM that NVidia, AMD, Intel can sell to and expanded TAM for the Chinese firms.

If somebody doubts this is going to happen, Saudi Arabia is announced contract with Chinese company to build nuclear power plant. So, this is not one off, but a long term trend.

One after another, these "neutral" countries in the world will start switching to Chinese suppliers. Simple reason: China is not stupid to sanction its own customers.
 
Last edited:

TESKATLIPOKA

Platinum Member
May 1, 2020
2,312
2,798
106
A100 was fabbed at TSMC exclusively, not Samssung.

Whatever the Chinese chip is capable of, the US sanctions secured it a big and growing market.

There is now a new round of sanctions on AI cards announced today. After countries who used to be the staunchest US allies (Saudi Arabia, UAE, Egypt) switched sides from US to China / Russia side, the US announced sanctions of "certain" Mid-East countries.

In other words, the US just shrunk the TAM that NVidia, AMD, Intel can sell to and expanded TAM for the Chinese firms.

If somebody doubts this is going to happen, Saudi Arabia is announced contract with Chinese company to build nuclear power plant. So this is not one off, but a long term trends.

One after another, these "neutral" countries in the world will start switching to Chinese suppliers. Simple reason: China is not stupid to sanction its own customers.
Why don't you make another thread for this?

After RDNA3 not ending well for AMD. I think they will have to be extra careful about RDNA4.
I wonder what will we see.
Higher clocks -> 3.5GHz?
Better dual-issue -> more HW used for It?
How much Vram -> 12GB would be low. 16GB would be good, but also not spectacular.
What about GDDR7?
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,312
2,798
106
See I think if they are sticking to 200 mm2 for the bigger one, it will be basically a N33 replacement. Maybe even something like take N33 and just cram in however much RDNA4 can fit with the N4 logic shrink and add GDDR7.
There are supposedly 2 chips. I would expect the smaller one to have 12GB and the bigger to have more.
 

Frenetic Pony

Senior member
May 1, 2012
218
179
116
See I think if they are sticking to 200 mm2 for the bigger one, it will be basically a N33 replacement. Maybe even something like take N33 and just cram in however much RDNA4 can fit with the N4 logic shrink and add GDDR7.
Heard they're not, at least not as such. Packaging was the problem with 7600 having a chiplet arch, and TSMC has been aggressively expanding that.

I can see RDNA4 being 40 and 60 CU's each. Based on comments made today by AMD, I wouldn't expect any revolution in efficiency, mostly improvements in raytracing and maybe a minor bump in clockspeed thanks to tweaks in scheduler/node/cache (2.8ghz boost, 3?).

So, call it 192bit GDDR7 12gb, 60CU. If it can reach 4070ti performance (RT included), instead of a 4070, for the same $500 price point that would be incredibly compelling for a mid ranger. What I don't see right now (RDNA3) is AMD really pushing the power/clocks. Push this hypothetical card to double stacked sram (96mb) 300 watts power and a 3ghz boost, and you've got a hypothetical 4080 competitor you can charge $600 for. Then the cut down version competes with a 4070ti for $500.

Which leaves the 40CU to beat the 4060ti at $400 and 12gb, and a cut down 128bit 8gb $300 version after that.

Note: AMD's "RDNA4 improves RT, not efficiency that much" lends credence to recent leaks about RDNA4 being skipped a bit to hit GFX5 sooner (2025 as an outside chance?). If so, looking forward to actual competition again. RDNA2 was a bit overly maligned, but the only thing 3 managed to accomplish of note is a cost a down.
 

Joe NYC

Golden Member
Jun 26, 2021
1,864
2,147
106
Why don't you make another thread for this?

After RDNA3 not ending well for AMD. I think they will have to be extra careful about RDNA4.
I wonder what will we see.
Higher clocks -> 3.5GHz?
Better dual-issue -> more HW used for It?
How much Vram -> 12GB would be low. 16GB would be good, but also not spectacular.
What about GDDR7?

My guess would be that the low end card - Navi 44 - will be similar number of CUs as Navi 33, but it's performance will be unclogged, with higher utilization of resources and higher clock speeds.

If Navi 23 is 200 mm2 on N6, Navi 44 on N4 could be tiny, somewhere between 100 and 150 mm2.

And the bigger card, Navi 43 - probably 2x Navi 44.

Also, a proper performance per power scaling. All that RDNA3 was supposed to deliver and then some on top of that.

In effect something like RDNA 3.5 or RDNA 3.75. If launched early enough, early 2024, they could achieve some success in the market.
 
  • Like
Reactions: Kaluan

jpiniero

Lifer
Oct 1, 2010
14,398
5,114
136
There are supposedly 2 chips. I would expect the smaller one to have 12GB and the bigger to have more.

See I think the basic chiplet model was where the "have more" comes in.

My guess would be that the low end card - Navi 44 - will be similar number of CUs as Navi 33, but it's performance will be unclogged, with higher utilization of resources and higher clock speeds.

If Navi 23 is 200 mm2 on N6, Navi 44 on N4 could be tiny, somewhere between 100 and 150 mm2.

Remember SRAM/IO scaling blows on N4.
 
  • Like
Reactions: Tlh97 and Ajay

Ajay

Lifer
Jan 8, 2001
15,332
7,787
136
I can see RDNA4 being 40 and 60 CU's each. Based on comments made today by AMD, I wouldn't expect any revolution in efficiency, mostly improvements in raytracing and maybe a minor bump in clockspeed thanks to tweaks in scheduler/node/cache (2.8ghz boost, 3?).
LOL - we really screwed up on power efficiency with RDNA3, but it's okay because gamers don't really care. While partly true, it's also revisionist.
In effect something like RDNA 3.5 or RDNA 3.75. If launched early enough, early 2024, they could achieve some success in the market.
Well, that's not happening. Curious why you mentioned it.
 

jpiniero

Lifer
Oct 1, 2010
14,398
5,114
136
I'll add that if it does come out about a year from now (not a bad guesstimate), the best we are talking about from NV is basically the same Ada products as today but with GDDR7.
 

Ajay

Lifer
Jan 8, 2001
15,332
7,787
136
I'll add that if it does come out about a year from now (not a bad guesstimate), the best we are talking about from NV is basically the same Ada products as today but with GDDR7.
I figure Nvidia will still with whatever their schedule is and release the follow on to Ada. Undoubtedly, NV didn't know how bad AMD was going to falter on RDNA4 before they were ready for TO themselves, or nearly so. I suppose they could delay part of their line up to milk Ada a bit, but not forever. RDNA5 better get it right.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,312
2,798
106
I can see RDNA4 being 40 and 60 CU's each. Based on comments made today by AMD, I wouldn't expect any revolution in efficiency, mostly improvements in raytracing and maybe a minor bump in clockspeed thanks to tweaks in scheduler/node/cache (2.8ghz boost, 3?).

So, call it 192bit GDDR7 12gb, 60CU. If it can reach 4070ti performance (RT included), instead of a 4070, for the same $500 price point that would be incredibly compelling for a mid ranger. What I don't see right now (RDNA3) is AMD really pushing the power/clocks. Push this hypothetical card to double stacked sram (96mb) 300 watts power and a 3ghz boost, and you've got a hypothetical 4080 competitor you can charge $600 for. Then the cut down version competes with a 4070ti for $500.

Which leaves the 40CU to beat the 4060ti at $400 and 12gb, and a cut down 128bit 8gb $300 version after that.
I can see either 40CU(2SE) and 60CU(3SE) with 20CU/SE or 32CU(2SE) and 64CU(4SE) with 16CU/SE.

You are underestimating the performance difference.
RX7800XT at 2.45GHz is ~RX6800XT raster level of performance.
RTX4070Ti is 21% faster.
RTX4080 is 52% faster.
Unless RDNA4 will offer higher IPC then the bigger chip will have to have higher clocks.
To compete against 4070Ti It needs 3GHz.
To compete against 4080 It needs 3.75GHz.

Let's say they will manage It somehow including RT, then your price is totally unrealistic.
4070Ti is $799, why would AMD ask only $499?
4070Ti 4080 is $1199, why would AMD ask only $599?

I agree, 40CU should perform well against 4060Ti.

edit: fixed a mistake
edit2: now It's fixed.
 
Last edited:
  • Like
Reactions: Tlh97

Tigerick

Senior member
Apr 1, 2022
490
409
96
I can see either 40CU(2SE) and 60CU(3SE) with 20CU/SE or 32CU(2SE) and 64CU(4SE) with 16CU/SE.

You are underestimating the performance difference.
RX7800XT at 2.45GHz is ~RX6800XT raster level of performance.
RTX4070Ti is 21% faster.
RTX4080 is 52% faster.
Unless RDNA4 will offer higher IPC then the bigger chip will have to have higher clocks.
To compete against 4070Ti It needs 3GHz.
To compete against 4080 It needs 3.75GHz.

Let's say they will manage It somehow including RT, then your price is totally unrealistic.
4070Ti is $799, why would AMD ask only $499?
4070Ti is $1199, why would AMD ask only $599?

I agree, 40CU should perform well against 4060Ti.
I think you are calculating backward; if N43 is performing 20% better due to real dual issue design then nVidia will response by dropping price like current 4060Ti 16GB or come out Super version to counter N43 offering. And remember by the time next year 7900XT should drop to around $700, so N43 pretty much is catered for replacing current N32 position.

Anyhow, I suspected AMD will position N43 and N44 as mobile first; with monolithic design and lower clock, AMD could potentially limit N43's TDP to around 150W, that is savings of about 100W compared to N32 which is crucial for notebook platform but not for desktop...we shall see
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,312
2,798
106
I think you are calculating backward; if N43 is performing 20% better due to real dual issue design then nVidia will response by dropping price like current 4060Ti 16GB or come out Super version to counter N43 offering. And remember by the time next year 7900XT should drop to around $700, so N43 pretty much is catered for replacing current N32 position.

Anyhow, I suspected AMD will position N43 and N44 as mobile first; with monolithic design and lower clock, AMD could potentially limit N43's TDP to around 150W, that is savings of about 100W compared to N32 which is crucial for notebook platform but not for desktop...we shall see
We don't know performance or specs of N4* and by the time It's released, maybe Nvidia will release Its next gen.

Maybe It will end up as a 60CU GPU but with 50% higher clocks, that's already very close to 7900XTX in performance.
Or It will have less CU at lower clocks, who knows.
We only know that It's a monolith and the size is supposedly 200-250mm2.

In laptops, you don't have 250W GPUs. You are limited to 175W for Nvidia and AMD stopped at 165W with RDNA2.
Mobile N32 If released will be the same, end up to at comparable TDP. Of course, performance will drop because of lower clocks.
 
Last edited:

Ajay

Lifer
Jan 8, 2001
15,332
7,787
136
We don't know performance or specs of N4* and by the time It's released, maybe Nvidia will release Its next gen.

Maybe It will end up as a 60CU GPU but with 50% higher clocks, that's already very close to 7900XTX in performance.
Or It will have less CU at lower clocks, who knows.
We only know that It's a monolith and the size is supposedly 200-250mm2.

In laptops, you don't have 250W GPUs. You are limited to 175W for Nvidia and AMD stopped at 165W with RDNA2.
Mobile N32 If released will be the same, end up to at comparable TDP. Of course, performance will drop because of lower clocks.
I doubt that 50% higher clocks was the plan. Would have been murderous power consumption on the real enthusiast cards. What I do wonder, is where are people getting this 200 mm^2 number for the larger GPU die. Seems like we shouldn't know at all this far out.
 

jpiniero

Lifer
Oct 1, 2010
14,398
5,114
136
What I do wonder, is where are people getting this 200 mm^2 number for the larger GPU die. Seems like we shouldn't know at all this far out.

The idea is that the chiplet models were going to handle the higher tiers. They could shape a bigger die... but with AI hype, it's not hard to think they would rather use their resources to get MI300/400 out sooner.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,312
2,798
106
I doubt that 50% higher clocks was the plan. Would have been murderous power consumption on the real enthusiast cards. What I do wonder, is where are people getting this 200 mm^2 number for the larger GPU die. Seems like we shouldn't know at all this far out.
It looks like for RDNA3 a lot higher clock was intended(3-3.4GHz)TweakTown, so It wouldn't be surprising If RDNA4 aimed even higher.
89532_05_amd-radeon-rx-7000-rdna-3-infinity-links-9-2gb-10x-higher-than-ryzen-epyc_full.jpg


If the architecture is designed to allow those clocks, then power consumption doesn't need to be murderous. Ok, in the case of that cancelled chiplet GPU N4X the power consumption would be very high, but also the performance.

@adroc_thurston said the size is 200-250mm2. Don't know If It's true or not.
 
Last edited: