Question Speculation: RDNA2 + CDNA Architectures thread

Page 81 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,705
6,427
146
All die sizes are within 5mm^2. The poster here has been right on some things in the past afaik, and to his credit was the first to saying 505mm^2 for Navi21, which other people have backed up. Even still though, take the following with a pich of salt.

Navi21 - 505mm^2

Navi22 - 340mm^2

Navi23 - 240mm^2

Source is the following post: https://www.ptt.cc/bbs/PC_Shopping/M.1588075782.A.C1E.html
 

Zstream

Diamond Member
Oct 24, 2005
3,395
277
136
I dont expect much more than RDNA 1 transistor density

Ampere SM is not the same as Turing, RDNA 2 CU is almost the same as RDNA 1 CU with the addition of RT.

If a RDNA 2 40CU die will be as fast as GA104 (RTX3070) then there is no way we will se a 80CU Gaming card because the 60 CU will compete against the GA102 (RTX3080/90)

So in that scenario the 80CU is not a gaming chip.
Can you elaborate on your sources on the CU's being the same? I was under the impression that the MS & PS5 have much different layouts than RD1
 
May 17, 2020
123
233
116
I dont expect much more than RDNA 1 transistor density

Ampere SM is not the same as Turing, RDNA 2 CU is almost the same as RDNA 1 CU with the addition of RT.

If a RDNA 2 40CU die will be as fast as GA104 (RTX3070) then there is no way we will se a 80CU Gaming card because the 60 CU will compete against the GA102 (RTX3080/90)

So in that scenario the 80CU is not a gaming chip.
The non gaming chips are CDNA with Arcturus, why doing a RDNA2 based GPU a non gaming chip ?
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,361
136
Can you elaborate on your sources on the CU's being the same? I was under the impression that the MS & PS5 have much different layouts than RD1

XBOX SX is a RDNA 1 with Ray Tracing and some other changes taken from RDNA 2 for higher efficiency

You can compare the two CUs bellow

861-cu-diagram.jpg


xboxseries-xhot-chips-RDNA-2-Dual-CU.jpg
 
  • Wow
Reactions: psolord

Glo.

Diamond Member
Apr 25, 2015
5,803
4,777
136
XBOX SX is a RDNA 1 with Ray Tracing and some other changes taken from RDNA 2 for higher efficiency

You can compare the two CUs bellow

861-cu-diagram.jpg


xboxseries-xhot-chips-RDNA-2-Dual-CU.jpg
Where do you have exact data on the sizes of the caches in each CU, that would warrant saying that RDNA1 and RDNA2 CUs are exactly the same?

P.S. RDNA1 CUs launch 5 instructions per clock. RDNA2 CUs launch 7 instructions per clock ;).
 
  • Like
Reactions: Tlh97

AtenRa

Lifer
Feb 2, 2009
14,003
3,361
136
Where do you have exact data on the sizes of the caches in each CU, that would warrant saying that RDNA1 and RDNA2 CUs are exactly the same?

P.S. RDNA1 CUs launch 5 instructions per clock. RDNA2 CUs launch 7 instructions per clock ;).

I said they are almost the same.
I dont expect more than 10% higher IPC per CU in RDNA 2 vs RDNA 1
 

Glo.

Diamond Member
Apr 25, 2015
5,803
4,777
136
I said they are almost the same.
I dont expect more than 10% higher IPC per CU in RDNA 2 vs RDNA 1
If the caches have been redesigned - no they are not the same.

P.S. If AMD achieved 10% IPC uplift, and 25% higher clock speeds, why is it unbelievable that 40 CU GPU will compete with RTX 3070?

Its EXACTLY the performance level I was told. +35% over RX 5700 XT/10% above RTX 2080 Super.
 

CastleBravo

Member
Dec 6, 2019
120
271
136
I dont expect much more than RDNA 1 transistor density

Ampere SM is not the same as Turing, RDNA 2 CU is almost the same as RDNA 1 CU with the addition of RT.

If a RDNA 2 40CU die will be as fast as GA104 (RTX3070) then there is no way we will se a 80CU Gaming card because the 60 CU will compete against the GA102 (RTX3080/90)

So in that scenario the 80CU is not a gaming chip.

40CU die might be clocked a lot higher than 60CU or 80CU.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,706
1,233
136
rdna1vsrdna2.jpeg

Navi14 = Yellow and Navi23 Dimgrey Cavefish in Lockhart(Van Gogh/Mero) = Green

They are physically designed differently. Navi1x has the PRFs? vertical while Navi2x has the PRFs? horizontal.

Most top-left, top-right, bottom-left, bottom-right are the 2x SIMD32 units. TMUs and ray-tracing units are in the center for Navi23/Navi14(No RT).
 
Last edited:
  • Like
Reactions: lightmanek

exquisitechar

Senior member
Apr 18, 2017
684
942
136
This is not going to happen , sorry but then XBOX SX would be faster than RTX2080Ti then , which its not.

The 60CU die will compete against GA104 (RTX3070) and the 80CU die will compete against the GA102 (RTX3080/90)
Series X isn't representative of desktop RDNA2 performance because it has 7WGPs per SA.
 
  • Like
Reactions: Tlh97

AtenRa

Lifer
Feb 2, 2009
14,003
3,361
136
If the caches have been redesigned - no they are not the same.

P.S. If AMD achieved 10% IPC uplift, and 25% higher clock speeds, why is it unbelievable that 40 CU GPU will compete with RTX 3070?

Its EXACTLY the performance level I was told. +35% over RX 5700 XT/10% above RTX 2080 Super.

Are you sure that increasing clocks will gain you 1 to 1 performance ??

Im expecting 40CU RDNA 2 card to be in the same level as RTX2080/Super and RTX3060
 

Glo.

Diamond Member
Apr 25, 2015
5,803
4,777
136
Are you sure that increasing clocks will gain you 1 to 1 performance ??
Depends on the Frequency/IPC design that AMD aimed for. If you want to scale your GPUs to clock so high, you do not aim for the most optimal frequency/IPC curve to be on the same level as your previous generation of GPUs. You want to schedule as much as possible, with each cycle.
 

Stuka87

Diamond Member
Dec 10, 2010
6,240
2,559
136
The improved capabilities don't reflect improved performance in current games, but there is still some chance that future games will make proper use of it. The NV marbles demo showed a massive improvement.

nVidia (and other companies) likes to "forcefully" make their demo's look better on new hardware. Lots of optimizations and such on the new cards, zero optimizations on the old cards. The demo looks very cool, and has nice visuals. But should never be used to compare performance between two generations of cards.
 

raghu78

Diamond Member
Aug 23, 2012
4,093
1,475
136
Are you sure that increasing clocks will gain you 1 to 1 performance ??

Im expecting 40CU RDNA 2 card to be in the same level as RTX2080/Super and RTX3060

If a 2SE,4SA, 20WGP/40CU, 64 ROPs, 192 bit GDDR6 RDNA2 graphics card can match a 5 GPC, 38 SM, 80 ROPs, 256 bit GDDR6 Ampere graphics card then the comparisons up the stack will get brutal as AMD is likely to have better scaling due to the 80 CU being 4SE, 8SA, 40 WGP/80CU, 128 ROPs and most likely 2048 bit HBM2E. All I can say is NV are in for a hard contest this fall.
 
  • Like
Reactions: spursindonesia

Saylick

Diamond Member
Sep 10, 2012
3,532
7,858
136
P.S. RDNA1 CUs launch 5 instructions per clock. RDNA2 CUs launch 7 instructions per clock ;).
Out of curiosity, do you have a source on the 5 inst/clk for RDNA1? I saw 2-4 inst/clock per CU for RDNA1 from the whitepaper, but no indication of what the average inst/clk is for the WGP.
 
  • Like
Reactions: lightmanek

Glo.

Diamond Member
Apr 25, 2015
5,803
4,777
136
If a 2SE,4SA, 20WGP/40CU, 64 ROPs, 192 bit GDDR6 RDNA2 graphics card can match a 5 GPC, 38 SM, 80 ROPs, 256 bit GDDR6 Ampere graphics card then the comparisons up the stack will get brutal as AMD is likely to have better scaling due to the 80 CU being 4SE, 8SA, 40 WGP/80CU, 128 ROPs and most likely 2048 bit HBM2E. All I can say is NV are in for a hard contest this fall.
Its simpler.

RDNA2 CU compares to one SM from Ampere.

But RDNA2 GPUs clock higher.
Out of curiosity, do you have a source on the 5 inst/clk for RDNA1? I saw 2-4 inst/clock per CU for RDNA1 from the whitepaper, but no indication of what the average inst/clk is for the WGP.
Oh, did I made I mistake? I thought I read in the RDNA1 whitepaper that it was 5 instructions per clock, not 4.
 

dr1337

Senior member
May 25, 2020
417
691
136
Its simpler.

RDNA2 CU compares to one SM from Ampere.

But RDNA2 GPUs clock higher.

Oh, did I made I mistake? I thought I read in the RDNA1 whitepaper that it was 5 instructions per clock, not 4.
https://www.amd.com/system/files/documents/rdna-whitepaper.pdf

Can confirm, page 9 of the whitepaper says 2-4 instructions per cycle. It would seem like RDNA2 has an increase vs. RDNA1 unless somehow the xbox sex slides are lying or aren't truly accurate.
 
  • Like
Reactions: lightmanek

Saylick

Diamond Member
Sep 10, 2012
3,532
7,858
136
Oh, did I made I mistake? I thought I read in the RDNA1 whitepaper that it was 5 instructions per clock, not 4.
No worries.

So its 4 RDNA1 instructions vs 7 in RDNA2, with each cycle.

Thats pretty hefty increase in scheduling...

https://www.amd.com/system/files/documents/rdna-whitepaper.pdf

Can confirm, page 9 of the whitepaper says 2-4 instructions per cycle. It would seem like RDNA2 has an increase vs. RDNA1 unless somehow the xbox sex slides are lying or aren't truly accurate.

Yeah, so it's effectively 4-8 inst/clk for RDNA 1 at the CU level vs. 7 inst/clk for RDNA 2 per the Xbox Series X presentation. Knowing the actual average inst/clk for RDNA 1 will let us know what the potential improvement in IPC can be.
 
  • Like
Reactions: Tlh97 and raghu78

Glo.

Diamond Member
Apr 25, 2015
5,803
4,777
136
No worries.





Yeah, so it's effectively 4-8 inst/clk for RDNA 1 at the WGP level vs. 7 inst/clk for RDNA 2 per the Xbox Series X presentation. Knowing the actual average inst/clk for RDNA 1 will let us know what the potential improvement in IPC can be.
Uhhh, Isn't it 7 Instructions PER CU, not per WGP?

If its per CU, and there is 5 CUs per WGP it means 35 instruction per WGP with each cycle.

O_O
 

Saylick

Diamond Member
Sep 10, 2012
3,532
7,858
136
Uhhh, Isn't it 7 Instructions PER CU, not per WGP?

If its per CU, and there is 5 CUs per WGP it means 35 instruction per WGP with each cycle.

O_O
Yeah, it's at the CU level, my mistake. I tried to ninja edit my post after I realized my folly. ;)

Regardless, I think my point still stands: 2-4 IPC per SIMD or 4-8 IPC per CU for RDNA 1, up to 7 IPC per CU for RDNA 2.

EDIT: It's 2 CUs per WGP, not 5. :p
 

andermans

Member
Sep 11, 2020
151
153
76
I think the quote from the whitepaper is "Each of the four SIMDs can request instructions every cycle and the instruction cache can deliver 32B (typically 2-4 instructions) every clock to each of the SIMDs". So that would be up to a theoretical 8 per CU/16 per WGP.

In practice, if you look at what units it can issue to I believe the numbers are unchanged though (with 1 vector ALU unit per SIMD you're obviously not issuing more than 2 per CU), or at least there is no indication of any changes. Looking at the 7 I'm also pretty sure those are not the only ones that can be issued, there are categories not named (like LDS or NOPs). Pretty sure RDNA1 and RDNA2 have these but they were not included in the 7 explicitly listed on the xbox slides.
 

Ajay

Lifer
Jan 8, 2001
16,094
8,109
136
If a 2SE,4SA, 20WGP/40CU, 64 ROPs, 192 bit GDDR6 RDNA2 graphics card can match a 5 GPC, 38 SM, 80 ROPs, 256 bit GDDR6 Ampere graphics card then the comparisons up the stack will get brutal as AMD is likely to have better scaling due to the 80 CU being 4SE, 8SA, 40 WGP/80CU, 128 ROPs and most likely 2048 bit HBM2E. All I can say is NV are in for a hard contest this fall.
HBM just isn't going to show its face on a consumer GPU. Don't know why people keep repeating this idea.