• Guest, The rules for the P & N subforum have been updated to prohibit "ad hominem" or personal attacks against other posters. See the full details in the post "Politics and News Rules & Guidelines."

Question Speculation: RDNA2 + CDNA Architectures thread

Page 20 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

TESKATLIPOKA

Senior member
May 1, 2020
380
384
96
You cant measure dedicated GPU die size from Custom APU.
Xbox's APU dont have many unit that the dedicated GPU will have. Xbox's APU has less sophisticated display engine, no PCI-e root complex, probably no FP64 unit, less complex encode/decode engine, memory controller without ECC support, also probably less cache.
Let's be honest adding 16 CU shouldn't increase the size by more than 50mm2. Keep in mind that this GPU doesn't have GDDR6 PHY and memory controller just HMB2 PHY or 8 CPU cores with L3 cache and other things, yet It is supposedly 60mm2 bigger than Xbox. Does Xbox One have FP64 capability or not?
Either Xbox has much higher density or RDNA2 GPU has more units than 72CU or no HBM2.
 
Last edited:

Timorous

Senior member
Oct 27, 2008
592
631
136
Let's be honest adding 16 CU shouldn't increase the size by more than 50mm2. Keep in mind that this GPU doesn't have GDDR6 PHY and memory controller just HMB2 PHY or 8 CPU cores with L3 cache and other things, yet It is supposedly 60mm2 bigger than Xbox. Does Xbox One have FP64 capability or not?
Either Xbox has much higher density or RDNA2 GPU has more units than 72CU or no HBM2.
Xbox Series X is around 42M xtors/mm^2.
The Zen 2 chiplet is around 50M xtors/mm^2 with 32MB cache.
RDNA is around 41M xtors/mm^2.
Renoir is around 63M xtors/mm^2.

Given a rumoured 505mm^2 die size for 'big navi' that comfortably doubles up on everything in Navi 10 without increasing density, the extra 16 PCIe lanes in such a doubling are probably large enough to off set the Ray Tracing hardware.

If density for RDNA2 increases to Renoir levels though then a 505mm^2 die gets you around 31B xtors which is 3x the number that Navi 10 has and means that 120 CUs would fit pretty comfortably even with a 512bit GDDR6 memory bus.
 

Timorous

Senior member
Oct 27, 2008
592
631
136
Another leak is talking about only 72CU and 427mm2 and in my opinion that's a big size for such specs.
That adds up in at current RDNA density. Perhaps AMD can really increase the clockspeeds (PS5 shows this might be possible) making a larger die unnecesary.
 

TESKATLIPOKA

Senior member
May 1, 2020
380
384
96
I did some calculation based on DisEnchantment's measurements of Navi 10 Link.
RDNA1 GPU with 72CU, 96ROPs and 384bit bus would be 401mm^2. (L1 Cache, ACE / HWS, L2 Cache, Raster / Primitive Unit were doubled)
With this in mind 427mm^2 for RDNA2 could be correct If GDDR6 is used and not HBM2, but then I don't understand the size of Xbox X.
If I made RDNA1 GPU with 56CU, 64ROPs and 320bit, It would be 322mm^2. (L1 Cache, ACE / HWS, L2 Cache, Raster / Primitive Unit were increased by 50%)
So for the rest of SoC It leaves only 38mm^2 and we are talking about RDNA1 and not RDNA2, which should be bigger.
 

Glo.

Diamond Member
Apr 25, 2015
4,733
3,378
136
What if it is 427 mm2 for HBM2 version, and 505 mm2 for GDDR6 version of the die?

This is my speculation based on what YOU guys speculate.

IMO, we will see N21 - 500 mm2, N22 - 340 mm2, N23 - 240 mm2.

The transistor density will be between 50 and 60 mln xTors/mm2. I won't speculate on CU counts.
 
  • Like
Reactions: raghu78

TESKATLIPOKA

Senior member
May 1, 2020
380
384
96
What if it is 427 mm2 for HBM2 version, and 505 mm2 for GDDR6 version of the die?
384bit memory controller and PHY should be ~86mm^2 based on RDNA1 measurements and the difference between the two die sizes is 78mm^2 that would mean HBM2 PHY is only 8mm^2, which is very unlikely.
BTW I would think the larger one would use HBM2 which would leave more space for more execution units and save TBP for more performance which will allow higher selling price to minimize or offset the cost of HBM2.
 

TESKATLIPOKA

Senior member
May 1, 2020
380
384
96
This is my speculation based on what YOU guys speculate.

IMO, we will see N21 - 500 mm2, N22 - 340 mm2, N23 - 240 mm2.

The transistor density will be between 50 and 60 mln xTors/mm2. I won't speculate on CU counts.
That's ~25-50% higher density than RDNA1, I would love that. Such a high density would explain Xbox X, but It would mean a monster GPU.
If I used the higher density and made RDNA1 GPU with 160CU, 128ROPs and 512bit memory controller(basically 4x Navi 10 except ROPs and memory controller) It would be ~470-564 mm^2.
 

raghu78

Diamond Member
Aug 23, 2012
4,093
1,474
136
What if it is 427 mm2 for HBM2 version, and 505 mm2 for GDDR6 version of the die?

This is my speculation based on what YOU guys speculate.

IMO, we will see N21 - 500 mm2, N22 - 340 mm2, N23 - 240 mm2.

The transistor density will be between 50 and 60 mln xTors/mm2. I won't speculate on CU counts.
We got the 505 sq mm die size for Navi 21 from multiple sources in Nov 2019. AcquarisZi from a chinese forum and Charlie from semiaccurate


This user AquarisZi also gave the die size for the remaining 2 dies - N22 - 340 and N23-240 . He mentioned the 3 dies are within a range of +-5 sq mm


Moore's Law is Dead and Coreteks are speculating without having a full picture of the details or doing some basic analysis. Given the 12 TF Xbox Series X with 360 sq mm die size running at an estimated 115-120w (according to my calculations based on Series X PSU rating) the area and power efficiency of RDNA2 is very good. The only remaining question is did AMD design HBM2 and GDDR6 memory controllers on the same die. Its a very reasonable decision if AMD wanted to use GDDR6 for the heavily cut Navi 21

Radeon 6800XT - 72 CU, 3SE, 6SA, 96 ROPs, 10-12 GB GDDR6 , 720-768 GB/s (16-18 Gbps GDDR6)

and HBM2E for the top 2 SKUs

Radeon 6900XT - 80 CU, 4SE, 8SA, 128 ROPs, 16 GB HBM2E, 920 GB/s (Hynix 3.6 Gbps HBM2E)
Radeon 6950XT - 96 CU, 4SE, 8SA, 128 ROPs, 16 GB HBM2E, 920 GB/s (Hynix 3.6 Gbps HBM2E)

Radeon Pro WX GPUs for Windows would also use HBM2E and so would Apple if they want a massively powerful GPU for their Mac Pro workstations.

In conclusion my expectation is AMD have a product/architecture capable of taking the GPU crown from Nvidia and I fully expect them to do that.
 

Kenmitch

Diamond Member
Oct 10, 1999
8,421
2,160
136
In conclusion my expectation is AMD have a product/architecture capable of taking the GPU crown from Nvidia and I fully expect them to do that.
At this moment in time we have 2 hype trains heading full speed ahead, while on the same track. Until they collide we're just gonna have to take the wait and see approach....Once the dust settles.

Interesting times ahead. Would be a mighty feat if AMD does pull it off.
 

raghu78

Diamond Member
Aug 23, 2012
4,093
1,474
136
At this moment in time we have 2 hype trains heading full speed ahead, while on the same track. Until they collide we're just gonna have to take the wait and see approach....Once the dust settles.

Interesting times ahead. Would be a mighty feat if AMD does pull it off.
This time around we have actual specifications on RDNA2 based Xbox Series X GPU to arrive at a reasonable estimate of Navi 21 performance. I think AMD is sandbagging big time.

Navi 10 board power - 225w (9 TF at 1755 Mhz game clock) = 9000 GLOPS / 225 = 40 GLOPS/watt
Series X GPU with GDDR6 memory- 140-150w (12 TF at 1825 Mhz fixed clock) = 12000 GFLOPS/ 150 = 80 GLOPS/watt

There is sufficient data to prove RDNA2 is very area and power efficient. The fact that Nvidia are pushing 350w on GA102 (built on Samsung 8nm) is a clue as to how badly they are trying to keep the GPU crown. But I don't think that will save them.
 
  • Like
Reactions: movdx

maddie

Diamond Member
Jul 18, 2010
3,392
2,339
136
This time around we have actual specifications on RDNA2 based Xbox Series X GPU to arrive at a reasonable estimate of Navi 21 performance. I think AMD is sandbagging big time.

Navi 10 board power - 225w (9 TF at 1755 Mhz game clock) = 9000 GLOPS / 225 = 40 GLOPS/watt
Series X GPU with GDDR6 memory- 140-150w (12 TF at 1825 Mhz fixed clock) = 12000 GFLOPS/ 150 = 80 GLOPS/watt

There is sufficient data to prove RDNA2 is very area and power efficient. The fact that Nvidia are pushing 350w on GA102 (built on Samsung 8nm) is a clue as to how badly they are trying to keep the GPU crown. But I don't think that will save them.
All the talk of RDNA2 being larger for the power efficiency improvement ignores the other way to obtain less power consumption. Less circuitry in use, also likely leading to higher clocks. Less transistors to switch, simpler, less power hungry clock trees, etc.
 

Stuka87

Diamond Member
Dec 10, 2010
5,377
1,167
136
All the talk of RDNA2 being larger for the power efficiency improvement ignores the other way to obtain less power consumption. Less circuitry in use, also likely leading to higher clocks. Less transistors to switch, simpler, less power hungry clock trees, etc.
While true, cards are normally designed for a performance window, not an efficiency window.

Chips end up with a window where they both perform well, and are efficient. If you clock them past that window, power consumption skyrockets. And you will ultimately hit a clock speed wall. AMD has had a history of having to factory OC cards to meet performance windows which results in high power consumption. Its in their best interest to design a larger GPU that can run at a lower clock, so that its both fast and efficient.
 

TESKATLIPOKA

Senior member
May 1, 2020
380
384
96
The only remaining question is did AMD design HBM2 and GDDR6 memory controllers on the same die. Its a very reasonable decision if AMD wanted to use GDDR6 for the heavily cut Navi 21
Isn't memory controller of HBM2 actually part of the memory stack and the GPU only contains the PHY? If this is so, then I don't think having additional GDDR6 controller and PHY such a great idea, It takes up a lot of space on a chip.
 

TESKATLIPOKA

Senior member
May 1, 2020
380
384
96
Still 64 ROPs. Wonder if that is implying a large clockspeed increase to make up for the lack of units.
Even If you increase the clock speed the CU/ROPs ratio will stay the same and It will be halved compared to Navi 10. I am quite sceptical about only 64 ROPs( actually only 16 Backends) unless they are more capable than the previous ones.

RDNA1 GPU with 80CU, 64ROPs and 384bit memory controller would be 409mm^2 with the same transistor density as Navi 10.
 

maddie

Diamond Member
Jul 18, 2010
3,392
2,339
136
While true, cards are normally designed for a performance window, not an efficiency window.

Chips end up with a window where they both perform well, and are efficient. If you clock them past that window, power consumption skyrockets. And you will ultimately hit a clock speed wall. AMD has had a history of having to factory OC cards to meet performance windows which results in high power consumption. Its in their best interest to design a larger GPU that can run at a lower clock, so that its both fast and efficient.
I'm not talking about the overall die size but the individual CUs and the functional blocks. Less impedance. You can then use this to place more in the die for your end goal.
 
  • Like
Reactions: Stuka87

raghu78

Diamond Member
Aug 23, 2012
4,093
1,474
136
Isn't memory controller of HBM2 actually part of the memory stack and the GPU only contains the PHY? If this is so, then I don't think having additional GDDR6 controller and PHY such a great idea, It takes up a lot of space on a chip.
You are right. So the Navi 21 die is likely to have GDDR6 memory controller+ GDDR6 PHY + HBM2E PHY (if it supports both memory types).
 
Last edited:

DisEnchantment

Senior member
Mar 3, 2017
782
1,894
136
You are right. I corrected that in a later post. So the Navi 21 die is likely to have GDDR6 memory controller+ GDDR6 PHY + HBM2E PHY
Sienna Cichlid has HBM. Navy Flounder has GDDR6. Mesa indicates both Sienna and Navy to be GFX1030/Navi21.
These two are not the same die. They have different SMU configurations.
 
  • Like
Reactions: Tlh97

Veradun

Senior member
Jul 29, 2016
564
780
136
Even If you increase the clock speed the CU/ROPs ratio will stay the same and It will be halved compared to Navi 10. I am quite sceptical about only 64 ROPs( actually only 16 Backends) unless they are more capable than the previous ones.

RDNA1 GPU with 80CU, 64ROPs and 384bit memory controller would be 409mm^2 with the same transistor density as Navi 10.
Also: Navi10 has 2 SEs and 16RBEs having 4 SEs and still 16RBEs seems to be a strange move. Something might be off here.

edit: fresh tapatalk installation wanted to post its signature thing
 

jpiniero

Diamond Member
Oct 1, 2010
9,187
1,812
126
Sienna Cichlid has HBM. Navy Flounder has GDDR6. Mesa indicates both Sienna and Navy to be GFX1030/Navi21.
These two are not the same die. They have different SMU configurations.
Be pretty strange to do two different dies with the same shader counts except one being HBM2 and the other GDDR6. I would say that having both would be more likely although that would be pretty unusual.
 

DisEnchantment

Senior member
Mar 3, 2017
782
1,894
136
Be pretty strange to do two different dies with the same shader counts except one being HBM2 and the other GDDR6. I would say that having both would be more likely although that would be pretty unusual.
There is more to that than just shader count though. Sienna has XGMI, dual VCN, besides others which Navy Flounder does not have.
 
  • Like
Reactions: Tlh97 and FaaR

ASK THE COMMUNITY