Question Speculation: RDNA2 + CDNA Architectures thread

Page 122 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,635
5,980
146
All die sizes are within 5mm^2. The poster here has been right on some things in the past afaik, and to his credit was the first to saying 505mm^2 for Navi21, which other people have backed up. Even still though, take the following with a pich of salt.

Navi21 - 505mm^2

Navi22 - 340mm^2

Navi23 - 240mm^2

Source is the following post: https://www.ptt.cc/bbs/PC_Shopping/M.1588075782.A.C1E.html
 

insertcarehere

Senior member
Jan 17, 2013
639
607
136
Chip developers use simulations to avoid getting into exactly such situations. The only way it then can still happen is bad management (Koduri) or over-promised capabilities for the node used (Ampere on Samsung). AMD's track record with TSMC is pretty flawless so far, and RDNA2 isn't even on a new node, so simulations should have become even better and not worse than with RDNA1.
MS and Sony presumably did similar performance simulations when deciding on how to configure the PS5 + XSX SoCs, and they both opted for "normal-sized" bus bandwidths given the GPU computing power despite the resulting packaging (additional memory controllers + PHYs) being more costly than something with a narrower bus width, so they probably found good reasons to justify their choices.
 

raghu78

Diamond Member
Aug 23, 2012
4,093
1,475
136
I reject the argument that it's a 64 CU (or less) card because Lisa did imply Big Navi, though she DID leave that open to interpretation...I haven't looked at the fine print, so I don't know whether it offers any hints confirming the card to be 'Big Navi' or the middle tier chip. As a bonus, the 72 CU card may not have even been running at stock speeds.

What I can tell you with absolute certainty is that anyone claiming that Navi2X had an IPC regression should be kicked off this forum, because IMO they are trolling. The cards also don't have memory bandwidth issues. This much is evident because they DROPPED the bus size for the 40CU card from 256 bit to 192 bit. The only reason they'd do that is if the card performed too close to the next card up the stack...the 52 CU card. Yes, there IS a 52 CU card. I'm still curious if I'll end up being pretty close on the specs I estimated a while ago...or if the speculation at ChipHell will be.

Big Navi is Navi 21 with 80CU and a die size of > 500 sq mm. Navi 21 is confirmed to have atleast 3 SKUs (some saying 4) .
Navi 21 XTX - RX 6900XT - 80CU,
Navi 21 XT - RX 6900 - 72 CU,
Navi 21 XL - RX 6800XT - 60-64 CU
Navi 21 XE - RX 6800 - ? CU

Navi 22 is the next die with 40 CU and 192 bit GDDR6 which will power RX 6700XT and RX 6700. Navi 23 with 32 CU and 128 bit GDDR6 will power RX 6600XT and RX6600. My RX 6800XT guess is based on multiple information sources and my understanding of RDNA2 power efficiency from Xbox Series X.
 

Saylick

Diamond Member
Sep 10, 2012
3,170
6,400
136
The more I think about it, the more it makes sense that AMD went with 3-4 SKUs using the same large N21 die. The most cut-down version is probably 52-56 CUs or roughly 2/3rds the CU count of a full die, which is crazy cut-down because typically most cut-down SKUs are 75% of the full die if not more. The benefit of having 4 products based off of the largest die means that you get a ton of yield from the wafer, which is imperative since GPU dies make AMD less money than CPU dies. Financially, they have to maximize their TSMC wafer allocation to high margin products, so for a large die GPU to even be financially worthwhile, it implies little to no waste in wafers. Having two dies (40 CUs and 80 CUs) cover the vast majority of the market space is actually pretty smart as it also minimizes mask costs while having the side benefit of being able to say your teaser GPU is based off of Big Navi, which for all intents and purposes now covers a huge performance range.
 

Veradun

Senior member
Jul 29, 2016
564
780
136
This much is evident because they DROPPED the bus size for the 40CU card from 256 bit to 192 bit.

The 5600XT is basically a 5700 even with a cut down bus, so I guess even Navi 1x wasn't so BE starved and they may still have introduced optimization in Navi 2x regarding BW use
 

PhoBoChai

Member
Oct 10, 2017
119
389
106
The more I think about it, the more it makes sense that AMD went with 3-4 SKUs using the same large N21 die. The most cut-down version is probably 52-56 CUs or roughly 2/3rds the CU count of a full die, which is crazy cut-down because typically most cut-down SKUs are 75% of the full die if not more. The benefit of having 4 products based off of the largest die means that you get a ton of yield from the wafer, which is imperative since GPU dies make AMD less money than CPU dies. Financially, they have to maximize their TSMC wafer allocation to high margin products, so for a large die GPU to even be financially worthwhile, it implies little to no waste in wafers. Having two dies (40 CUs and 80 CUs) cover the vast majority of the market space is actually pretty smart as it also minimizes mask costs while having the side benefit of being able to say your teaser GPU is based off of Big Navi, which for all intents and purposes now covers a huge performance range.

Only makes sense when yields are poor.

7N "enhanced" is very mature at this stage. Publish defect rate is well below avg.

When you have good yields, more binning only reduces the ASP for that die, as you are forced to sell perfectly good dies as cut down SKUs.
 

ModEl4

Member
Oct 14, 2019
71
33
61
If the rumours regarding 4SKUs Big Navi 80CU etc are true then probably it will be the following SKUs:
1)TOP limited edition model: full die 80CU 16GB part overclocked around 3-5% (like 5700XT anniversary edition was before it)
2)Standard full die 80CU model 16GB
3)72CU part 16GB clocked lower than no2 model (at whatever frequency brings a 10% performance difference between no2 & no3) (don't forget Vega64 had +5% higher clocks and 14.3% more shaders vs Vega56, +20% in total more theoretical shading/texel power but in reality the 4K difference was only 10%, check also 5700XT vs 5700 etc)
4) 64CU part, cut down die with 192bit bus and 12GB memory, there is no way to be 16GB since the performance differentiation with the already lower clocked 72CU part will be very difficult and it makes sense for AMD to have a 12GB part since it has no competition from Nvidia regarding memory (10GB 3080, 8GB 3070) and AMD will shave off the cost of 4GB being more prepared to compete in a lower tier price point.
OK with these assumptions in mind and looking at the performance of the part that AMD previewed at Zen3 event we will have the following:
According to TechPowerup 3080 would be +9.1% faster at 4K from the model that they previewed. With my calculations this model would be only -7% in relation with 3080, I will use mine.
AMD by not saying which model it was, they somehow implied that this was not their top of the line part because otherwise it will be plainly obvious that AMD used their top product knowingly that it can't compete with 3080 and withholding the info from the consumers they plainly just want to blindside the consumers...
If the part that they previewed was no2 we would probably have:
$699 3080 10GB perf=100
$699 no1 80CU Big Navi 16GB perf=96 (+$50 vs no2, just a limited edition OC part)
$649 no2 80CU Big Navi 16GB perf=93 (-7% vs 3080, same performance/$ but 16GB)
$579 no3 72CU Big Navi 16GB perf=84 (10% performance difference in relation with no2)
$499 no4 64CU Big Navi 12GB perf=74 (around 2080Ti performance level)
If the part that they previewed was no3 we would probably have:
$799 no1 80CU Big Navi 16GB perf=106 (+$50 vs no2, just a limited edition OC part)
$749 no2 80CU Big Navi 16GB perf=103 (little chance AMD to be faster than 3080 and to have +6GB memory and the price to be $699 same as 3080, it's not Navi1 with no raytracing etc)
$699 3080 10GB perf=100
$649 no3 72CU Big Navi 16GB perf=93 (-7% vs 3080, same performance/$ but 16GB)
$549 no4 64CU Big Navi 12GB perf=82 (around +10% vs 2080Ti)
If the part that they previewed was no4 (no chance in Hell!) we would probably have:
$999 no1 80CU Big Navi 16GB perf=119.5 (+6% vs 3090, just a limited edition OC part)
$899 no2 80CU Big Navi 16GB perf=116 (+3% vs 3090, +10% vs 72CU no3 Big Navi)
$749 no3 72CU Big Navi 16GB perf=105.5 (5.5% faster than 3080 nearly same performance/$ but 16GB)
$649 no4 64CU Big Navi 12GB perf=93 (-7% vs 3080, same performance/$ but 12GB)
I wonder which scenario is better for the consumer.
At last the end, I didn't want to do it but we had a slow week regarding leaks, predictions etc from the usual twitter, Youtube suspects😉
 
  • Like
Reactions: Tlh97

beginner99

Diamond Member
Jun 2, 2009
5,210
1,580
136
Only makes sense when yields are poor.

7N "enhanced" is very mature at this stage. Publish defect rate is well below avg.

When you have good yields, more binning only reduces the ASP for that die, as you are forced to sell perfectly good dies as cut down SKUs.

Define yield. A chip can function 100% but only at low clock or high power use. if you make more chips you will have more that stay within a certain power use at a certain high frequency when all CUs are enabled (= the halo SKU)

The worse dies with higher leakage (=worse power user) can be cut-down while the high leakage should allow higher clocks. So you cut your 80 part down to say 56 CU but increase clocks.

In essence similar advanatge you get with chiplets.

Or completely different option:

Simply add less reduancy in the design. This will lower yields but also make the die smaller. But you don't care that much about yields as you want to offer many cut-down SKUs anyway.
 

PhoBoChai

Member
Oct 10, 2017
119
389
106
Define yield. A chip can function 100% but only at low clock or high power use. if you make more chips you will have more that stay within a certain power use at a certain high frequency when all CUs are enabled (= the halo SKU)

The worse dies with higher leakage (=worse power user) can be cut-down while the high leakage should allow higher clocks. So you cut your 80 part down to say 56 CU but increase clocks.

In essence similar advanatge you get with chiplets.

Or completely different option:

Simply add less reduancy in the design. This will lower yields but also make the die smaller. But you don't care that much about yields as you want to offer many cut-down SKUs anyway.

All of this hypothetical, when you get into semantics... there's good reason why there are not so many cut down SKUs from the same die in general, because good yields also includes power & clock vs voltage curve, it's the entire package. We're not dealing with a crap node here.
 
  • Like
Reactions: Tlh97 and Mopetar

leoneazzurro

Senior member
Jul 26, 2016
930
1,464
136
It's not that easy. These chips should be massive (if the 505-536 mm^2 die size is a reality) so even defect rate is below average this does not mean most of the dies are OK (it depende on redundancy, and so on). Moreover, about Power/Clock/Voltage, having a "good die " does not mean that all good dies are top dog, but only that they are good to be sold. Some dies may be full and clock high, other may be not full and clock high, other may be full but clock low, and others are low clock/not full. It depends also on target frequencies, and of course you should keep a safety margin in frequency/thermals. This was seen already with Zen 2 & different SKUs: there was a big silicon lottery game when trying to overclock, especially with early parts. But yields were really good: so, even with a "good process" you can easily have only a minority of part being able to be sold as the "top" SKU.
 
Last edited:
  • Like
Reactions: Tlh97 and Saylick

moinmoin

Diamond Member
Jun 1, 2017
4,952
7,666
136
MS and Sony presumably did similar performance simulations when deciding on how to configure the PS5 + XSX SoCs, and they both opted for "normal-sized" bus bandwidths given the GPU computing power despite the resulting packaging (additional memory controllers + PHYs) being more costly than something with a narrower bus width, so they probably found good reasons to justify their choices.
While that's the case you also need to keep in mind those are completely different products. With consoles you have one spec fixed for eternity. To let developers make the most of it over the years you ensure that there are no bottlenecks in any part of that fixed design. Consoles are built to be run and exploited at the max possible capability at most of the time.

Not so with GPU chips that work with variable workloads on PC and get little hand tuning in software. The SKUs segmentation with lower SKUs having way fewer CUs already ensures that there'll be some over-provisioning going on, be it the IMC or the amount of CUs depending on the SKUs. Usually designs such as that are done by ensuring that fully enabled chips run reasonable well at specific workloads without any grave bottlenecks. Then SKUs with less than fully enabled chips profit of tweaked balance that allows for more headroom/less bottlenecks. This is what AMD already does with its many cores Zen chips built around a fixed IOD.
 

eek2121

Platinum Member
Aug 2, 2005
2,930
4,026
136
The more I think about it, the more it makes sense that AMD went with 3-4 SKUs using the same large N21 die. The most cut-down version is probably 52-56 CUs or roughly 2/3rds the CU count of a full die, which is crazy cut-down because typically most cut-down SKUs are 75% of the full die if not more. The benefit of having 4 products based off of the largest die means that you get a ton of yield from the wafer, which is imperative since GPU dies make AMD less money than CPU dies. Financially, they have to maximize their TSMC wafer allocation to high margin products, so for a large die GPU to even be financially worthwhile, it implies little to no waste in wafers. Having two dies (40 CUs and 80 CUs) cover the vast majority of the market space is actually pretty smart as it also minimizes mask costs while having the side benefit of being able to say your teaser GPU is based off of Big Navi, which for all intents and purposes now covers a huge performance range.

I am willing to bed that they only have 2 SKUs per die. As others have said 7 nm is very mature at this point and having 2 parts will give a 90+% success rate. You have to realize that the cost of a Big Navi die is much higher, even if you bin. AMD would love to sell tiny dies all day long, so you can bet that they will only use large dies where they need it competitively. All the mainstream stuff will be on smaller chips.

In the future we will have chiplet based designs which will make this a non-issue.
 
  • Like
Reactions: Tlh97 and PhoBoChai

Glo.

Diamond Member
Apr 25, 2015
5,711
4,558
136
It implies Raja knew about GA102 and therefore already had plans to counter with RDNA. Raja being Raja of course likes running his mouth like a coyote with Vega being a prime example. If Raja doesnt know about GA102 he wouldn't have tweeted about it on the same launch day. Remember RDNA1 has a plethora of issues ? I reckon it was done in the same manner as Vega, good ideas but with low budget and poor execution. People in the industry talk with each other, quite a lot actually but I'm sure you are fully aware of this already.
The only GPU arch that was designed, possibly, or rather there was groundwork landed from Koduri was Navi 1.

To the team designing RDNA2 there was plenty of Zen Engineers added, for optimizing the physical design(which is pretty key aspect of ANY ARCHITECTURE) long after Koduri left AMD.

Do not mistake RDNA1 for RDNA2. Raja could've had a hand in RDNA1. There is a large chance that RDNA2 is vastly different, than what he himself knew.

And here, we are talking only about consumer GPUs. Knowing Raja, he builds more compute focused GPUs(Vega, GCN, DGX from Intel).
 

raghu78

Diamond Member
Aug 23, 2012
4,093
1,475
136
I am willing to bed that they only have 2 SKUs per die. As others have said 7 nm is very mature at this point and having 2 parts will give a 90+% success rate. You have to realize that the cost of a Big Navi die is much higher, even if you bin. AMD would love to sell tiny dies all day long, so you can bet that they will only use large dies where they need it competitively. All the mainstream stuff will be on smaller chips.

In the future we will have chiplet based designs which will make this a non-issue.

AMD has more than 4 SKUs for Renoir based on a tiny die at roughly 150 sq mm.

R7 4800U - 8C,/16T, 8CU
R7 4700U - 8C,8T, 7 CU
R5 4600U - 6C/12T, 6 CU
R5 4500U - 6C/6T, 6 CU
R3 4300U - 4C/4T, 5 CU

Whats better is they have 4 SKUs for even a tiny 75 sq mm chiplet.

R7 3800X - 8C/16T
R5 3600X - 6C/12T
R3 3300X - 4C/8T (4+0)
R3 3100 - 4C/8T (2+2)

So anybody who says 4 SKUS for > 500 sq mm die is too much does not understand that even a mature process like TSMC 7nm is not perfect. Moreover larger dies have more defects and hence there are more options for SKUs.
 
Last edited:

Kuiva maa

Member
May 1, 2014
181
232
116
AMD has more than 4 SKUs for Renoir based on a tiny die at roughly 150 sq mm.

R7 4800U - 8C,/16T, 8CU
R7 4700U - 8C,8T, 7 CU
R5 4600U - 6C/12T, 6 CU
R5 4500U - 6C/6T, 6 CU
R3 4300U - 4C/4T, 5 CU

Whats better is they have 4 SKUs for even a tiny 75 sq mm chiplet.

R7 3800X - 8C/16T
R5 3600X - 6C/12T
R3 3300X - 4C/8T (4+0)
R3 3100 - 4C/8T (2+2)

So anybody who says 4 SKUS for > 500 sq mm die is too much does not understand that even a mature process like TSMC 7nm is not perfect. Moreover larger dies have more defects and hence there are more options for SKUs.

The counter argument to Renoir is that "harvesting" a tiny, cheaper die is more cost effective than disabling 30% of the shader array from a 500mm2+ chip. That being said, I agree with you. Without knowing the economies of scale, I would expect AMD to cover the gap between 40 and 80CU with defective (or just cut down) big chips, instead of developing a middle sized solution from scrath. with all the costs and market headaches this involves.
 

jpiniero

Lifer
Oct 1, 2010
14,605
5,225
136
So anybody who says 4 SKUS for > 500 sq mm die is too much does not understand that even a mature process like TSMC 7nm is not perfect. Moreover larger dies have more defects and hence there are more options for SKUs.

At the same time the further cut down dies could be OEM only. Maybe retail gets 72 and 64, with 80 being a limited edition and 56 or 48 being OEM only.
 
  • Like
Reactions: Tlh97 and raghu78

Ajay

Lifer
Jan 8, 2001
15,458
7,862
136
The only GPU arch that was designed, possibly, or rather there was groundwork landed from Koduri was Navi 1.

To the team designing RDNA2 there was plenty of Zen Engineers added, for optimizing the physical design(which is pretty key aspect of ANY ARCHITECTURE) long after Koduri left AMD.

Do not mistake RDNA1 for RDNA2. Raja could've had a hand in RDNA1. There is a large chance that RDNA2 is vastly different, than what he himself knew.

And here, we are talking only about consumer GPUs. Knowing Raja, he builds more compute focused GPUs(Vega, GCN, DGX from Intel).
RDNA2 was in design while Raja was still there. So he knew he would have known about the general architecture, performance goal, new features, etc. Not that it matters much.
What matters is who the chief designer was (aka, the Radeon equivalent of Mike Clark).
 

Veradun

Senior member
Jul 29, 2016
564
780
136
Why are we surprised at 128 ROPs. All RDNA 2 cards have RB+ (unlike the previous gens where only the APUs had it) and with that can rasterize 8 pixels per render backend. The leaks put Navi 21 at 16 render backends, so the total ROPs would be 128.
Actually I'm surprised at the fact rumors have told us, only in the last month, that N21 undoubtedly has 64, 96 and 128 ROPs.