Question Speculation: RDNA2 + CDNA Architectures thread

Page 79 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,634
5,962
146
All die sizes are within 5mm^2. The poster here has been right on some things in the past afaik, and to his credit was the first to saying 505mm^2 for Navi21, which other people have backed up. Even still though, take the following with a pich of salt.

Navi21 - 505mm^2

Navi22 - 340mm^2

Navi23 - 240mm^2

Source is the following post: https://www.ptt.cc/bbs/PC_Shopping/M.1588075782.A.C1E.html
 

AtenRa

Lifer
Feb 2, 2009
14,001
3,357
136
No. The problem is that people exaggerate the performance of RTX 3070.

It won't be faster than RTX 2080 Ti, already. Neither of the two GPUs will be.

And yes, target for 40CU GPU is 35% more performance over RX 5700 XT.

How come? Higher IPC+ higher clock speeds.

This is not going to happen , sorry but then XBOX SX would be faster than RTX2080Ti then , which its not.

The 60CU die will compete against GA104 (RTX3070) and the 80CU die will compete against the GA102 (RTX3080/90)
 
  • Like
Reactions: Konan

Glo.

Diamond Member
Apr 25, 2015
5,711
4,556
136
This is not going to happen , sorry but then XBOX SX would be faster than RTX2080Ti then , which its not.

The 60CU die will compete against GA104 (RTX3070) and the 80CU die will compete against the GA102 (RTX3080/90)
Oh, you have got hands on the GPU tech in the Xbox Series X, with specifically optimized titles for that console to know how it performs, eh? ;).

BTW. I would love to see your breakdown on performance per CU of RDNA2 architecture.

You know why?

AMD claims that they have achieved increase in IPC with RDNA2. If 60 CU GPU will compete only with RTX 3070, then they achieved another BULLDOZER, and IPC regression.

Which is not the case, based on what Microsoft has shown in their materials.
 
  • Like
Reactions: spursindonesia

dr1337

Senior member
May 25, 2020
337
566
106
This is over simplifying the changes that were made. Thats like saying Bulldozer doubled its INT performance because it could not handle two INT threads at the same time.

There are only very specific work loads that can actually make use of the single precision FP32 capabilities. Benchmarks have not shown these changes to increase gaming performance. The 3080 has vastly more hardware than the 2080 Ti, but is not vastly faster. Why then would a 3070 with LESS hardware than the 2080 Ti suddenly be able to catch it?
Hence why I used the word capabilities instead of "performance". Its not oversimplifying its just making it convenient to discuss. The facts are that Ampere has the ability to double FP32 performance, however we can see in benchmarks that performance isn't anywhere close to being doubled. This is of course due to how resource utilization works and is compounded by the fact that doubling fp32 ALUs doesn't impact performance that much without also doubling the rest of the supporting architecture.

Im not saying nvidia has actually doubled performance, but they have made a massive change to the architecture and their performance gains are mostly because of it. With RDNA2, unless we assume that the desktop product architecture is is different than the consoles, so far there haven't been any major changes in compute ability whatsoever outside of clockspeed and CU count. This is contrasted by ampere directly increasing performance per CU, rather than RNDA2's apparent increase of global performance with clocks and node improvements.
 

Glo.

Diamond Member
Apr 25, 2015
5,711
4,556
136
Hence why I used the word capabilities instead of "performance". Its not oversimplifying its just making it convenient to discuss. The facts are that Ampere has the ability to double FP32 performance, however we can see in benchmarks that performance isn't anywhere close to being doubled. This is of course due to how resource utilization works and is compounded by the fact that doubling fp32 ALUs doesn't impact performance that much without also doubling the rest of the supporting architecture.

Im not saying nvidia has actually doubled performance, but they have made a massive change to the architecture and their performance gains are mostly because of it. With RDNA2, unless we assume that the desktop product architecture is is different than the consoles, so far there haven't been any major changes in compute ability whatsoever outside of clockspeed and CU count. This is contrasted by ampere directly increasing performance per CU, rather than RNDA2's apparent increase of global performance with clocks and node improvements.
And it never will achieve its potential.

Nvidia simply just GCN'd their gaming architecture. It behaves EXACTLY like GCN in games, and EXACTLY like GCN in compute. Mosterous in compute, mediocre in games, with insane inefficiency. The performance increase in compute is not reflected in gaming. For the same exact reasons why GCN never reflected its performance in games.
 

CastleBravo

Member
Dec 6, 2019
119
271
96
And it never will achieve its potential.

Nvidia simply just GCN'd their gaming architecture. It behaves EXACTLY like GCN in games, and EXACTLY like GCN in compute. Mosterous in compute, mediocre in games, with insane inefficiency. The performance increase in compute is not reflected in gaming. For the same exact reasons why GCN never reflected its performance in games.

The improved capabilities don't reflect improved performance in current games, but there is still some chance that future games will make proper use of it. The NV marbles demo showed a massive improvement.
 
  • Like
Reactions: lightmanek

dr1337

Senior member
May 25, 2020
337
566
106
And it never will achieve its potential.

Nvidia simply just GCN'd their gaming architecture. It behaves EXACTLY like GCN in games, and EXACTLY like GCN in compute. Mosterous in compute, mediocre in games, with insane inefficiency. The performance increase in compute is not reflected in gaming. For the same exact reasons why GCN never reflected its performance in games.
haha yeah thats my interpretation as well, I'm not debating any of this either lol. But frankly what we know about RDNA2 currently makes it looks like a half generation compared to ampere. I enjoy speculation but with the knowlege that most performance changes have been with clocks and power consumption, its not logical to assume that AMD have somehow managed to exceed 'ipc' parity with nvida with only these changes vs. the amount of overhaul nvidia has take with ampere. The 2070s has as many SPs and SMs as the 5700xt but its 5-10% faster. With Ampere being 20% faster than Turing on average, we'd have to assume AMD somehow managed to increase performance more than 25% per CU. This assumption is entirely baseless and so far everything seems to point towards AMD focusing efficiency rather than performance.
 

AtenRa

Lifer
Feb 2, 2009
14,001
3,357
136
Oh, you have got hands on the GPU tech in the Xbox Series X, with specifically optimized titles for that console to know how it performs, eh? ;).

BTW. I would love to see your breakdown on performance per CU of RDNA2 architecture.

You know why?

AMD claims that they have achieved increase in IPC with RDNA2. If 60 CU GPU will compete only with RTX 3070, then they achieved another BULLDOZER, and IPC regression.

Which is not the case, based on what Microsoft has shown in their materials.

NAVI 10 (RX5700XT) has 10.3Billion transistors wiith a die size of 251mm2

GA104 (RTX3070) has 17.4Billion transistors with a die size of 392mm2

You are saying that IF a 270-280mm2 40 CU RDNA2 (estimate ~12B transistors) will not reach the same performance as the GA104, then AMD will have another Bulldozer ???


The 40 CU die will compete with GA106 (RTX3060)

The 60 CU die will compete with the GA104 (RTX3070)

And the 80 CU die will compete against the GA102 (RTX3080/90)

You can have two/three different cards from each die.

Simple as that.
 

Glo.

Diamond Member
Apr 25, 2015
5,711
4,556
136
haha yeah thats my interpretation as well, I'm not debating any of this either lol. But frankly what we know about RDNA2 currently makes it looks like a half generation compared to ampere.
And how did Maxwell compare to Kepler?
NAVI 10 (RX5700XT) has 10.3Billion transistors wiith a die size of 251mm2

GA104 (RTX3070) has 17.4Billion transistors with a die size of 392mm2

You are saying that IF a 270-280mm2 40 CU RDNA2 (estimate ~12B transistors) will not reach the same performance as the GA104, then AMD will have another Bulldozer ???


The 40 CU die will compete with GA106 (RTX3060)

The 60 CU die will compete with the GA104 (RTX3070)

And the 80 CU die will compete against the GA102 (RTX3080/90)

You can have two/three different cards from each die.

Simple as that.
First of all you have zero idea, what transistor density AMD was able to get from RDNA2.

Secondly, AMD with RDNA1 already had CU-to-SM performance parity with Turing.

How, in your opinion, they will suddenly go back with that, considering they will increase performance per clock, and the clocks, themselves of those GPUs?

Its not about making more with more, RDNA2 is making more with less, again, just like RDNA1 was making more with less CUs compared to previous generation of GPUs.

Its really that simple.
 

Konan

Senior member
Jul 28, 2017
360
291
106
80 CUs with just 256bit memory not even if hell froze.

Also, I cannot see any 40CU die (less that 300mm2) coming close to 3070 (GA104 400mm2)

yeah I’m unsure of the 256 on the 80CU maybe for a potential SKU without HBM and a HBM variant has 512?

I agree about the 40CU
 

Head1985

Golden Member
Jul 8, 2014
1,864
689
136
This is not going to happen , sorry but then XBOX SX would be faster than RTX2080Ti then , which its not.

The 60CU die will compete against GA104 (RTX3070) and the 80CU die will compete against the GA102 (RTX3080/90)
PS5 runs at 2.2Ghz.What amd need to do is just run 40CU navi at 2.3Ghz and this alone will be enough for 2080/super performace.Reference 5700XT runs only at 1850mhz.
2300/1850=24% faster clock.
 

AtenRa

Lifer
Feb 2, 2009
14,001
3,357
136
And how did Maxwell compare to Kepler?

First of all you have zero idea, what transistor density AMD was able to get from RDNA2.

Secondly, AMD with RDNA1 already had CU-to-SM performance parity with Turing.

How, in your opinion, they will suddenly go back with that, considering they will increase performance per clock, and the clocks, themselves of those GPUs?

Its not about making more with more, RDNA2 is making more with less, again, just like RDNA1 was making more with less CUs compared to previous generation of GPUs.

Its really that simple.

I dont expect much more than RDNA 1 transistor density

Ampere SM is not the same as Turing, RDNA 2 CU is almost the same as RDNA 1 CU with the addition of RT.

If a RDNA 2 40CU die will be as fast as GA104 (RTX3070) then there is no way we will se a 80CU Gaming card because the 60 CU will compete against the GA102 (RTX3080/90)

So in that scenario the 80CU is not a gaming chip.
 

Zstream

Diamond Member
Oct 24, 2005
3,396
277
136
I dont expect much more than RDNA 1 transistor density

Ampere SM is not the same as Turing, RDNA 2 CU is almost the same as RDNA 1 CU with the addition of RT.

If a RDNA 2 40CU die will be as fast as GA104 (RTX3070) then there is no way we will se a 80CU Gaming card because the 60 CU will compete against the GA102 (RTX3080/90)

So in that scenario the 80CU is not a gaming chip.
Can you elaborate on your sources on the CU's being the same? I was under the impression that the MS & PS5 have much different layouts than RD1
 
May 17, 2020
122
233
86
I dont expect much more than RDNA 1 transistor density

Ampere SM is not the same as Turing, RDNA 2 CU is almost the same as RDNA 1 CU with the addition of RT.

If a RDNA 2 40CU die will be as fast as GA104 (RTX3070) then there is no way we will se a 80CU Gaming card because the 60 CU will compete against the GA102 (RTX3080/90)

So in that scenario the 80CU is not a gaming chip.
The non gaming chips are CDNA with Arcturus, why doing a RDNA2 based GPU a non gaming chip ?
 

AtenRa

Lifer
Feb 2, 2009
14,001
3,357
136
Can you elaborate on your sources on the CU's being the same? I was under the impression that the MS & PS5 have much different layouts than RD1

XBOX SX is a RDNA 1 with Ray Tracing and some other changes taken from RDNA 2 for higher efficiency

You can compare the two CUs bellow

861-cu-diagram.jpg


xboxseries-xhot-chips-RDNA-2-Dual-CU.jpg
 
  • Wow
Reactions: psolord

Glo.

Diamond Member
Apr 25, 2015
5,711
4,556
136
XBOX SX is a RDNA 1 with Ray Tracing and some other changes taken from RDNA 2 for higher efficiency

You can compare the two CUs bellow

861-cu-diagram.jpg


xboxseries-xhot-chips-RDNA-2-Dual-CU.jpg
Where do you have exact data on the sizes of the caches in each CU, that would warrant saying that RDNA1 and RDNA2 CUs are exactly the same?

P.S. RDNA1 CUs launch 5 instructions per clock. RDNA2 CUs launch 7 instructions per clock ;).
 
  • Like
Reactions: Tlh97

AtenRa

Lifer
Feb 2, 2009
14,001
3,357
136
Where do you have exact data on the sizes of the caches in each CU, that would warrant saying that RDNA1 and RDNA2 CUs are exactly the same?

P.S. RDNA1 CUs launch 5 instructions per clock. RDNA2 CUs launch 7 instructions per clock ;).

I said they are almost the same.
I dont expect more than 10% higher IPC per CU in RDNA 2 vs RDNA 1
 

Glo.

Diamond Member
Apr 25, 2015
5,711
4,556
136
I said they are almost the same.
I dont expect more than 10% higher IPC per CU in RDNA 2 vs RDNA 1
If the caches have been redesigned - no they are not the same.

P.S. If AMD achieved 10% IPC uplift, and 25% higher clock speeds, why is it unbelievable that 40 CU GPU will compete with RTX 3070?

Its EXACTLY the performance level I was told. +35% over RX 5700 XT/10% above RTX 2080 Super.
 

CastleBravo

Member
Dec 6, 2019
119
271
96
I dont expect much more than RDNA 1 transistor density

Ampere SM is not the same as Turing, RDNA 2 CU is almost the same as RDNA 1 CU with the addition of RT.

If a RDNA 2 40CU die will be as fast as GA104 (RTX3070) then there is no way we will se a 80CU Gaming card because the 60 CU will compete against the GA102 (RTX3080/90)

So in that scenario the 80CU is not a gaming chip.

40CU die might be clocked a lot higher than 60CU or 80CU.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,686
1,221
136
rdna1vsrdna2.jpeg

Navi14 = Yellow and Navi23 Dimgrey Cavefish in Lockhart(Van Gogh/Mero) = Green

They are physically designed differently. Navi1x has the PRFs? vertical while Navi2x has the PRFs? horizontal.

Most top-left, top-right, bottom-left, bottom-right are the 2x SIMD32 units. TMUs and ray-tracing units are in the center for Navi23/Navi14(No RT).
 
Last edited:
  • Like
Reactions: lightmanek

exquisitechar

Senior member
Apr 18, 2017
657
871
136
This is not going to happen , sorry but then XBOX SX would be faster than RTX2080Ti then , which its not.

The 60CU die will compete against GA104 (RTX3070) and the 80CU die will compete against the GA102 (RTX3080/90)
Series X isn't representative of desktop RDNA2 performance because it has 7WGPs per SA.
 
  • Like
Reactions: Tlh97

AtenRa

Lifer
Feb 2, 2009
14,001
3,357
136
If the caches have been redesigned - no they are not the same.

P.S. If AMD achieved 10% IPC uplift, and 25% higher clock speeds, why is it unbelievable that 40 CU GPU will compete with RTX 3070?

Its EXACTLY the performance level I was told. +35% over RX 5700 XT/10% above RTX 2080 Super.

Are you sure that increasing clocks will gain you 1 to 1 performance ??

Im expecting 40CU RDNA 2 card to be in the same level as RTX2080/Super and RTX3060