Question Speculation: RDNA3 + CDNA2 Architectures Thread

Page 87 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,647
6,076
146

Joe NYC

Platinum Member
Jun 26, 2021
2,008
2,420
106
Anyone else think Angstronomics' supposed N32 leak makes little sense?

Why would they suddenly change GPU hardware allocation from N31 to N32 (and funnily enough N33 goes back to N31-style as well).

I would be quite shocked if N32 really is 30WGP and 2560 SIMD per Shader Engine, when N31 and N33 are both 2048 SIMD per SE.

Also why does everyone believe V-stacked RDNA3 won't use specialized SRAM libraries on the MCDs? Doesn't 16+32MB make more sense than 16+16MB? That's how ZenX3D does it at least. Why the departure?

The part about stacking such a tiny die, half the size of the base does not make sense to me either.

I am guessing that TSMC would be using Wafer on Wafer for stacking for this, as opposed to stacking the tiny parts individually, to get the cost down, and if it is Wafer on Wafer stacking, why leave > half of the top wafer area unused?
 
  • Like
Reactions: Kaluan

SteinFG

Senior member
Dec 29, 2021
425
478
106
The part about stacking such a tiny die, half the size of the base does not make sense to me either.

I am guessing that TSMC would be using Wafer on Wafer for stacking for this, as opposed to stacking the tiny parts individually, to get the cost down, and if it is Wafer on Wafer stacking, why leave > half of the top wafer area unused?
because taping out 1 design is cheaper than 2. Each mask costs millions. The less designs the better.
look at first gen ryzen for the extreme example. AMD used 5 different ways to package their first zen1 die: Naples, Whitehaven, Summit Ridge, Snowy Owl (1-die and 2-die variants).
 

Joe NYC

Platinum Member
Jun 26, 2021
2,008
2,420
106
because taping out 1 design is cheaper than 2. Each mask costs millions. The less designs the better.
look at first gen ryzen for the extreme example. AMD used 5 different ways to package their first zen1 die: Naples, Whitehaven, Summit Ridge, Snowy Owl (1-die and 2-die variants).

It is interesting where the cost crossover would be, between making new masks and just stacking 2 identical MCD dies, over life of RDNA3.

SRAM only wafer should be quite a bit cheaper to make the masks for. There should be fewer metal layers compared to logic.

And, importantly, for the user, there would be more performance from doubling the stacked SRAM.
 

Kaluan

Senior member
Jan 4, 2022
500
1,071
96
because taping out 1 design is cheaper than 2. Each mask costs millions. The less designs the better.
look at first gen ryzen for the extreme example. AMD used 5 different ways to package their first zen1 die: Naples, Whitehaven, Summit Ridge, Snowy Owl (1-die and 2-die variants).
If that would be the case, AMD may as well go ahead and market "7950XT" as a 768bit GPU 😅
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,357
1,564
136
Don't understand how it can't be ready yet since N32 and N31 shares arch?

There is substantial work in physical design in bringing a product to market even if the logical design (the arch) is the same. AMD GPU side, to the best of my knowledge, has one physical design team that does all the products in their lineup in series, so they get completed one at a time.
 

GodisanAtheist

Diamond Member
Nov 16, 2006
6,852
7,227
136
That would be dumb, because then they will compete for media attention.

- They will probably tease one in the presentation of the other though.

AMD's marketing team seems to have grown the minimum required 3rd neuron to have what can be considered a brain over the last couple years so I anticipate they'll roll the products out for as long as possible to keep themselves in the news.

Edit: So long as they have competitive products. You do a simultaneous launch when you want to hide a weaker product behind a more competitive one.
 

Kaluan

Senior member
Jan 4, 2022
500
1,071
96
So
7900XT 20GB/320bit cut N31
7900XTX 24GB/384bit full(?) N31
7950XT 24GB/384bit full(?) N31+3D
Or?

BTW, I've seen some leaked specs say one 10752 SIMD on (one of) the cut N31, were does the "leak" originate from and what do people think of it's validity?
Weird cut at first glance, but I suppose it's based on 1 WGP being cut or defective from each Shader Engine block or 1 every 2 Shader Array blocks (or 1 CU each SA)?
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,364
2,855
136
The theoretical performance is 2x, but it's not necessarily reflected on the game frame.
Performance in a synthetic benchmark according to Greymon55.

Performance is not bad, but more shaders brings only a limited improvement.
WGP: +20%
Frequency: +25-30%(2.9-3GHz)
Just this would mean at beast 100*1.2*(1.25 or 1.3)=144-156
200/(144 or 156)=1.28-1.39 or 28-39% better performance per WGP.
 
Last edited:

jpiniero

Lifer
Oct 1, 2010
14,643
5,272
136
I should point out that the 4090 is theoretically 2x of 3090 Ti. Well except for the bandwidth. I'm interested to see how much better the vcache model would be over the normal one.
 
  • Like
Reactions: Tlh97

maddie

Diamond Member
Jul 18, 2010
4,757
4,713
136

Performance in a synthetic benchmark according to Greymon55.

Performance is not bad, but more shaders brings only a limited improvement.
WGP: +20%
Frequency: +25-30%(2.9-3GHz)
Just this would mean at beast 100*1.2*(1.25 or 1.3)=144-156
200/(144 or 156)=1.28-1.39 or 28-39% better performance per WGP.
Do we ignore the > 50% perf/W increase?

As a very high level estimate, this can subsume all of the internal improvements into a box and ignores any blind analysis about the specific architectural improvements.
 

Kepler_L2

Senior member
Sep 6, 2020
347
1,271
106

Performance in a synthetic benchmark according to Greymon55.

Performance is not bad, but more shaders brings only a limited improvement.
WGP: +20%
Frequency: +25-30%(2.9-3GHz)
Just this would mean at beast 100*1.2*(1.25 or 1.3)=144-156
200/(144 or 156)=1.28-1.39 or 28-39% better performance per WGP.
FMA throughput is around 3.5x
Pixel fillrate is around 2x
L2/L3 cache bandwidth is around 2x
Memory bandwidth is around 2x

If they can't hit 2x performance in games it's due to driver overhead/CPU bottleneck or there's an issue with scaling with the architecture.
 

alexruiz

Platinum Member
Sep 21, 2001
2,836
556
126
If the performance is indeed that spectacular, AMD will price them accordingly
My take:

Price
RX 7900XTX: ~$1400
RX 7900XT: ~$1100
RX 7800XT: ~$900
RX 7700XT: ~$600
RX 7600XT: ~$400

Performance
RX 7900XTX: 180% of RX 6950XT
RX 7900XT: 150% of RX 6950XT
RX 7800XT: 130% of RX 6950XT
RX 7700XT: 100% of RX 6950XT
RX 7600XT: 70% of RX 6950XT (In between 6750XT and 6800)
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,364
2,855
136
Do we ignore the > 50% perf/W increase?

As a very high level estimate, this can subsume all of the internal improvements into a box and ignores any blind analysis about the specific architectural improvements.
How does the improved perf/W affect what I wrote?

FMA throughput is around 3.5x
Pixel fillrate is around 2x
L2/L3 cache bandwidth is around 2x
Memory bandwidth is around 2x
If FMA throughput is ~3.5 better, then considering N31 has 12288SP It would also mean ~45% higher clockspeed.
That would be ~3350MHz.
ROPs can be 192(+50%) and with this clockspeed It would be ~2.18x pixel fillrate.

I don't think ROPs are a limiting factor. My bet is memory, then I have to ask why they didn't add more cache but instead regressed.
It will be interesting to see N33 vs N23 when you set the same clockspeed for both of them.
 
Last edited:
  • Like
Reactions: Saylick

Tup3x

Senior member
Dec 31, 2016
967
953
136
They can't. They don't have CUDA nor DLSS 3. AMD's noise suppression is also betaware in quality, at least from what I have read.
I think it comes down to ray tracing performance. If it's lacking then it obviously has to be cheaper. If it is a no compromise product this time around, then it will show in price. In that case I hope some kind of price war instead of price cartel.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,615
5,868
136
FMA throughput is around 3.5x
Pixel fillrate is around 2x
L2/L3 cache bandwidth is around 2x
Memory bandwidth is around 2x

If they can't hit 2x performance in games it's due to driver overhead/CPU bottleneck or there's an issue with scaling with the architecture.
Lots of resources doubled or more but are there any architectural gains?
So much of rework around the CU/WGP and SoC architecture, I don't think they would have a regression there architecturally, but we will know in few days.
 
  • Like
Reactions: Leeea

Tup3x

Senior member
Dec 31, 2016
967
953
136
Lots of resources doubled or more but are there any architectural gains?
So much of rework around the CU/WGP and SoC architecture, I don't think they would have a regression there architecturally, but we will know in few days.
I'd be surprised if the chiplet approach doesn't have any negative impact. Interesting to see how things turn out.
 

Yosar

Member
Mar 28, 2019
28
136
76
They can't. They don't have CUDA nor DLSS 3. AMD's noise suppression is also betaware in quality, at least from what I have read.

DLSS 3 is pure crap (any technology giving such horrible artifacts is crap). If anyone cares better buy TV with this 'revolutionary' technology. It will be cheaper and
works on every game from start. Either way your latency goes to hell so who cares.
CUDA are worthless on gaming cards because they are gimped in drivers by nVidia. nVidia prefers to sell professional cards with the same
configuration as gaming cards but much more expensive just for not gimping CUDA in drivers.

Not that I want more expensive cards from AMD. Contrary but the only argument for these cards to be cheaper than nVidia is unlimited mindshare nVidia
has. And how AMD will choose to deal with it (if at all).
Of course assuming they are on par with nVidia cards or even better.
 

RnR_au

Golden Member
Jun 6, 2021
1,715
4,200
106
DLSS 3 is pure crap (any technology giving such horrible artifacts is crap). If anyone cares better buy TV with this 'revolutionary' technology. It will be cheaper and
works on every game from start. Either way your latency goes to hell so who cares.
DLSS 3 is fine for single player games where latency is not so important. And not everyone want a TV to game on. As for artifacts, I'm sure nvidia will release DLSS 3.1 with fixes.

DLSS 3 is not for me since I play twitchy games, but I can see the appeal for those that want to immerse themselves sliding everything to max.

CUDA are worthless on gaming cards because they are gimped in drivers by nVidia. nVidia prefers to sell professional cards with the same
configuration as gaming cards but much more expensive just for not gimping CUDA in drivers.
I have a friend that runs an nvidia consumer gaming card and CUDA support is fine for his needs. Enables him to do his stable diffusion ai art thing to his hearts content.

Edit: found a recent set of instructions for running on AMD gpu's - https://www.travelneil.com/stable-diffusion-windows-amd.html

...anecdotal observations that this seems to be anywhere from 3x to 8x slower than it is for people on similar-specced Nvidia hardware.

This is the reason why I have argued that AMD needs to be price conscious. They need market share more than anything else in this gpu space.