Question Speculation: RDNA2 + CDNA Architectures thread

Page 80 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,705
6,427
146
All die sizes are within 5mm^2. The poster here has been right on some things in the past afaik, and to his credit was the first to saying 505mm^2 for Navi21, which other people have backed up. Even still though, take the following with a pich of salt.

Navi21 - 505mm^2

Navi22 - 340mm^2

Navi23 - 240mm^2

Source is the following post: https://www.ptt.cc/bbs/PC_Shopping/M.1588075782.A.C1E.html
 

Konan

Senior member
Jul 28, 2017
360
291
106
That 40 CU die will be formidable opponent for RTX 3070. VERY formidable ;).
Navi 23 - RTX 2080 Super performance levels +10%, @150W TBP.

A 2080S is 16% behind a 2080Ti at 4k so N23 in the target you said you got, still won't beat a 2080Ti.
Basically a 40CU is still going to loose comfortably to the 3070. It's target competitor will more be like the 3060Ti.
Nvidia stated that the 3070 will beat the 2080Ti (still pending judgement on that one in a few weeks)
The 40CU 5700XT is 50% behind the 2080Ti at 4K I don't see a 40CU RDNA2 making at least a 40% leap sorry.
The N22 is the the card that will beat the 3070 (until the S or ti comes out that is) but loose to the 3080.
 
  • Like
Reactions: ozzy702

Glo.

Diamond Member
Apr 25, 2015
5,803
4,777
136
A 2080S is 16% behind a 2080Ti at 4k so N23 in the target you said you got, still won't beat a 2080Ti.
Basically a 40CU is still going to loose comfortably to the 3070. It's target competitor will more be like the 3060Ti.
Nvidia stated that the 3070 will beat the 2080Ti (still pending judgement on that one in a few weeks)
The 40CU 5700XT is 50% behind the 2080Ti at 4K I don't see a 40CU RDNA2 making at least a 40% leap sorry.
The N22 is the the card that will beat the 3070 (until the S or ti comes out that is) but loose to the 3080.
You can't see how its possible.

Hmmm.

And how is it possible that 44 SM RTX 3070 GPU can be on par, or slightly slower than 68 CU RTX 2080 Ti?

So why would it be impossible for 40CU GPU, with much higher clock speeds to compete with RTX 3070, considering that RTX 3070 also has roughly around the same amount of Compute Units?

AMD is extremely underestimated. Ridiculously underestimated, I'd say given their latest track record, both on GPU and CPU side of things.
 

dr1337

Senior member
May 25, 2020
417
691
136
You can't see how its possible.

Hmmm.

And how is it possible that 44 SM RTX 3070 GPU can be on par, or slightly slower than 68 CU RTX 2080 Ti?

So why would it be impossible for 40CU GPU, with much higher clock speeds to compete with RTX 3070, considering that RTX 3070 also has roughly around the same amount of Compute Units?

AMD is extremely underestimated. Ridiculously underestimated, I'd say given their latest track record, both on GPU and CPU side of things.
Nvidia literally doubled the fp32 capabilities of turing with ampere. The 3070 at 44sm vs an RDNA2 card with 40CU on paper is a total knockout. Even if AMD somehow managed to wring 20% more performance per cu out of RDNA2 they're still going to be behind the 3070 due to its 4 extra streaming modules.

It would take some yet announced unknown technology for AMD to beat a 3070 with less streaming processors. And tbh if they actually can do that then it means the 6900xt might actually be on par with the 3090. All of that said we don't have a single piece of information that would imply any of this. The only speculation that can be confident is saying that a 40cu RDNA2 card will be much more efficient than the 44sm 3070. But faster? Yeah nobody except AMD knows that.
 

Olikan

Platinum Member
Sep 23, 2011
2,023
275
126
If AMD has intentionally planted incorrect firmware info so that leakers don't get the actual specs then I think even 80 CU might be wrong. I would not be surprised to see 96 CU and 2048 bit HBM2E for the 505 sq mm flagship die.
Are we sure the 505mm2 is not ARCTURUS?
Rumors say they doubled the CUs of Radeon7 (is 331mm2), but removed display engines
 
  • Like
Reactions: lightmanek

Zstream

Diamond Member
Oct 24, 2005
3,395
277
136
If we know this is nuts, then AMD knows this is nuts. So there are only so many scenarios:

1.) They are ceding the high end yet again. :rolleyes:
2.) They have some secret sauce compression or caching scheme up their sleeve.
3.) They have successfully avoided accurate leaks and there is a big-die plus HBM card at the top of the stack.

I'm fervently hoping for #3, but know it will be expensive.
It's only expensive if consumers are unwilling to pay for it.
 

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
A 2080S is 16% behind a 2080Ti at 4k so N23 in the target you said you got, still won't beat a 2080Ti.
Basically a 40CU is still going to loose comfortably to the 3070. It's target competitor will more be like the 3060Ti.
Nvidia stated that the 3070 will beat the 2080Ti (still pending judgement on that one in a few weeks)
The 40CU 5700XT is 50% behind the 2080Ti at 4K I don't see a 40CU RDNA2 making at least a 40% leap sorry.
The N22 is the the card that will beat the 3070 (until the S or ti comes out that is) but loose to the 3080.

It would be a 90% improvement in perf/w over the 5700XT. He is just making his numbers up. At the same time Sony cant even ship a 10TFLOPs console with less than 100w for the GPU and memory...
 

Konan

Senior member
Jul 28, 2017
360
291
106
You can't see how its possible.

Hmmm.

And how is it possible that 44 SM RTX 3070 GPU can be on par, or slightly slower than 68 CU RTX 2080 Ti?

So why would it be impossible for 40CU GPU, with much higher clock speeds to compete with RTX 3070, considering that RTX 3070 also has roughly around the same amount of Compute Units?

AMD is extremely underestimated. Ridiculously underestimated, I'd say given their latest track record, both on GPU and CPU side of things.

You said the target of N23 is 2080S + 10% that is already less than the 3070 (if it comes out as officially pitched). Seems like you just exaggerated performance. The real danger is the Vega style exaggerating...

Here is what I think -
N21 80 CU 256bit bus 16Gb HBM2| or G6 > trades with 3080 or just over (RDNA2 more efficient)
N22 60 CU 256bit bus 12/16GB G6 > wants to get close to 3080 (RDNA2 more efficient)
N23 40 CU 192bit bus 12GB G6 > wants to get close to 3070 (RDNA2 more efficient)
N24 20 CU 128bit bus 8GB? G6

I will be v happy if things turn out better of course.
 
Last edited:

AtenRa

Lifer
Feb 2, 2009
14,003
3,361
136
You said the target of N23 is 2080S + 10% that is already less than the 3070 (if it comes out as officially pitched). Seems like you just exaggerated performance. The real danger is the Vega style exaggerating...

Here is what I think -
N21 80 CU 256bit bus 16Gb HBM2| or G6 > trades with 3080 or just over (RDNA2 more efficient)
N22 60 CU 256bit bus 12/16GB G6 > close to 3080 (RDNA2 more efficient)
N23 40 CU 192bit bus 12GB G6 > close to 3070 (RDNA2 more efficient)
N24 20 CU 128bit bus 8GB? G6

80 CUs with just 256bit memory not even if hell froze.

Also, I cannot see any 40CU die (less that 300mm2) coming close to 3070 (GA104 400mm2)
 

Glo.

Diamond Member
Apr 25, 2015
5,803
4,777
136
You said the target of N23 is 2080S + 10% that is already less than the 3070 (if it comes out as officially pitched). Seems like you just exaggerated performance. The real danger is the Vega style exaggerating...
No. The problem is that people exaggerate the performance of RTX 3070.

It won't be faster than RTX 2080 Ti, already. Neither of the two GPUs will be.

And yes, target for 40CU GPU is 35% more performance over RX 5700 XT.

How come? Higher IPC+ higher clock speeds.
 

Stuka87

Diamond Member
Dec 10, 2010
6,240
2,559
136
Nvidia literally doubled the fp32 capabilities of turing with ampere.

This is over simplifying the changes that were made. Thats like saying Bulldozer doubled its INT performance because it could now handle two INT threads at the same time.

There are only very specific work loads that can actually make use of the single precision FP32 capabilities. Benchmarks have not shown these changes to increase gaming performance. The 3080 has vastly more hardware than the 2080 Ti, but is not vastly faster. Why then would a 3070 with LESS hardware than the 2080 Ti suddenly be able to catch it?
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,361
136
No. The problem is that people exaggerate the performance of RTX 3070.

It won't be faster than RTX 2080 Ti, already. Neither of the two GPUs will be.

And yes, target for 40CU GPU is 35% more performance over RX 5700 XT.

How come? Higher IPC+ higher clock speeds.

This is not going to happen , sorry but then XBOX SX would be faster than RTX2080Ti then , which its not.

The 60CU die will compete against GA104 (RTX3070) and the 80CU die will compete against the GA102 (RTX3080/90)
 
  • Like
Reactions: Konan

Glo.

Diamond Member
Apr 25, 2015
5,803
4,777
136
This is not going to happen , sorry but then XBOX SX would be faster than RTX2080Ti then , which its not.

The 60CU die will compete against GA104 (RTX3070) and the 80CU die will compete against the GA102 (RTX3080/90)
Oh, you have got hands on the GPU tech in the Xbox Series X, with specifically optimized titles for that console to know how it performs, eh? ;).

BTW. I would love to see your breakdown on performance per CU of RDNA2 architecture.

You know why?

AMD claims that they have achieved increase in IPC with RDNA2. If 60 CU GPU will compete only with RTX 3070, then they achieved another BULLDOZER, and IPC regression.

Which is not the case, based on what Microsoft has shown in their materials.
 
  • Like
Reactions: spursindonesia

dr1337

Senior member
May 25, 2020
417
691
136
This is over simplifying the changes that were made. Thats like saying Bulldozer doubled its INT performance because it could not handle two INT threads at the same time.

There are only very specific work loads that can actually make use of the single precision FP32 capabilities. Benchmarks have not shown these changes to increase gaming performance. The 3080 has vastly more hardware than the 2080 Ti, but is not vastly faster. Why then would a 3070 with LESS hardware than the 2080 Ti suddenly be able to catch it?
Hence why I used the word capabilities instead of "performance". Its not oversimplifying its just making it convenient to discuss. The facts are that Ampere has the ability to double FP32 performance, however we can see in benchmarks that performance isn't anywhere close to being doubled. This is of course due to how resource utilization works and is compounded by the fact that doubling fp32 ALUs doesn't impact performance that much without also doubling the rest of the supporting architecture.

Im not saying nvidia has actually doubled performance, but they have made a massive change to the architecture and their performance gains are mostly because of it. With RDNA2, unless we assume that the desktop product architecture is is different than the consoles, so far there haven't been any major changes in compute ability whatsoever outside of clockspeed and CU count. This is contrasted by ampere directly increasing performance per CU, rather than RNDA2's apparent increase of global performance with clocks and node improvements.
 

Glo.

Diamond Member
Apr 25, 2015
5,803
4,777
136
Hence why I used the word capabilities instead of "performance". Its not oversimplifying its just making it convenient to discuss. The facts are that Ampere has the ability to double FP32 performance, however we can see in benchmarks that performance isn't anywhere close to being doubled. This is of course due to how resource utilization works and is compounded by the fact that doubling fp32 ALUs doesn't impact performance that much without also doubling the rest of the supporting architecture.

Im not saying nvidia has actually doubled performance, but they have made a massive change to the architecture and their performance gains are mostly because of it. With RDNA2, unless we assume that the desktop product architecture is is different than the consoles, so far there haven't been any major changes in compute ability whatsoever outside of clockspeed and CU count. This is contrasted by ampere directly increasing performance per CU, rather than RNDA2's apparent increase of global performance with clocks and node improvements.
And it never will achieve its potential.

Nvidia simply just GCN'd their gaming architecture. It behaves EXACTLY like GCN in games, and EXACTLY like GCN in compute. Mosterous in compute, mediocre in games, with insane inefficiency. The performance increase in compute is not reflected in gaming. For the same exact reasons why GCN never reflected its performance in games.
 

CastleBravo

Member
Dec 6, 2019
120
271
136
And it never will achieve its potential.

Nvidia simply just GCN'd their gaming architecture. It behaves EXACTLY like GCN in games, and EXACTLY like GCN in compute. Mosterous in compute, mediocre in games, with insane inefficiency. The performance increase in compute is not reflected in gaming. For the same exact reasons why GCN never reflected its performance in games.

The improved capabilities don't reflect improved performance in current games, but there is still some chance that future games will make proper use of it. The NV marbles demo showed a massive improvement.
 
  • Like
Reactions: lightmanek

dr1337

Senior member
May 25, 2020
417
691
136
And it never will achieve its potential.

Nvidia simply just GCN'd their gaming architecture. It behaves EXACTLY like GCN in games, and EXACTLY like GCN in compute. Mosterous in compute, mediocre in games, with insane inefficiency. The performance increase in compute is not reflected in gaming. For the same exact reasons why GCN never reflected its performance in games.
haha yeah thats my interpretation as well, I'm not debating any of this either lol. But frankly what we know about RDNA2 currently makes it looks like a half generation compared to ampere. I enjoy speculation but with the knowlege that most performance changes have been with clocks and power consumption, its not logical to assume that AMD have somehow managed to exceed 'ipc' parity with nvida with only these changes vs. the amount of overhaul nvidia has take with ampere. The 2070s has as many SPs and SMs as the 5700xt but its 5-10% faster. With Ampere being 20% faster than Turing on average, we'd have to assume AMD somehow managed to increase performance more than 25% per CU. This assumption is entirely baseless and so far everything seems to point towards AMD focusing efficiency rather than performance.
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,361
136
Oh, you have got hands on the GPU tech in the Xbox Series X, with specifically optimized titles for that console to know how it performs, eh? ;).

BTW. I would love to see your breakdown on performance per CU of RDNA2 architecture.

You know why?

AMD claims that they have achieved increase in IPC with RDNA2. If 60 CU GPU will compete only with RTX 3070, then they achieved another BULLDOZER, and IPC regression.

Which is not the case, based on what Microsoft has shown in their materials.

NAVI 10 (RX5700XT) has 10.3Billion transistors wiith a die size of 251mm2

GA104 (RTX3070) has 17.4Billion transistors with a die size of 392mm2

You are saying that IF a 270-280mm2 40 CU RDNA2 (estimate ~12B transistors) will not reach the same performance as the GA104, then AMD will have another Bulldozer ???


The 40 CU die will compete with GA106 (RTX3060)

The 60 CU die will compete with the GA104 (RTX3070)

And the 80 CU die will compete against the GA102 (RTX3080/90)

You can have two/three different cards from each die.

Simple as that.
 

Glo.

Diamond Member
Apr 25, 2015
5,803
4,777
136
haha yeah thats my interpretation as well, I'm not debating any of this either lol. But frankly what we know about RDNA2 currently makes it looks like a half generation compared to ampere.
And how did Maxwell compare to Kepler?
NAVI 10 (RX5700XT) has 10.3Billion transistors wiith a die size of 251mm2

GA104 (RTX3070) has 17.4Billion transistors with a die size of 392mm2

You are saying that IF a 270-280mm2 40 CU RDNA2 (estimate ~12B transistors) will not reach the same performance as the GA104, then AMD will have another Bulldozer ???


The 40 CU die will compete with GA106 (RTX3060)

The 60 CU die will compete with the GA104 (RTX3070)

And the 80 CU die will compete against the GA102 (RTX3080/90)

You can have two/three different cards from each die.

Simple as that.
First of all you have zero idea, what transistor density AMD was able to get from RDNA2.

Secondly, AMD with RDNA1 already had CU-to-SM performance parity with Turing.

How, in your opinion, they will suddenly go back with that, considering they will increase performance per clock, and the clocks, themselves of those GPUs?

Its not about making more with more, RDNA2 is making more with less, again, just like RDNA1 was making more with less CUs compared to previous generation of GPUs.

Its really that simple.
 

Konan

Senior member
Jul 28, 2017
360
291
106
80 CUs with just 256bit memory not even if hell froze.

Also, I cannot see any 40CU die (less that 300mm2) coming close to 3070 (GA104 400mm2)

yeah I’m unsure of the 256 on the 80CU maybe for a potential SKU without HBM and a HBM variant has 512?

I agree about the 40CU
 

Head1985

Golden Member
Jul 8, 2014
1,867
699
136
This is not going to happen , sorry but then XBOX SX would be faster than RTX2080Ti then , which its not.

The 60CU die will compete against GA104 (RTX3070) and the 80CU die will compete against the GA102 (RTX3080/90)
PS5 runs at 2.2Ghz.What amd need to do is just run 40CU navi at 2.3Ghz and this alone will be enough for 2080/super performace.Reference 5700XT runs only at 1850mhz.
2300/1850=24% faster clock.
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,361
136
And how did Maxwell compare to Kepler?

First of all you have zero idea, what transistor density AMD was able to get from RDNA2.

Secondly, AMD with RDNA1 already had CU-to-SM performance parity with Turing.

How, in your opinion, they will suddenly go back with that, considering they will increase performance per clock, and the clocks, themselves of those GPUs?

Its not about making more with more, RDNA2 is making more with less, again, just like RDNA1 was making more with less CUs compared to previous generation of GPUs.

Its really that simple.

I dont expect much more than RDNA 1 transistor density

Ampere SM is not the same as Turing, RDNA 2 CU is almost the same as RDNA 1 CU with the addition of RT.

If a RDNA 2 40CU die will be as fast as GA104 (RTX3070) then there is no way we will se a 80CU Gaming card because the 60 CU will compete against the GA102 (RTX3080/90)

So in that scenario the 80CU is not a gaming chip.