Question Speculation: RDNA2 + CDNA Architectures thread

Page 102 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,635
5,983
146
All die sizes are within 5mm^2. The poster here has been right on some things in the past afaik, and to his credit was the first to saying 505mm^2 for Navi21, which other people have backed up. Even still though, take the following with a pich of salt.

Navi21 - 505mm^2

Navi22 - 340mm^2

Navi23 - 240mm^2

Source is the following post: https://www.ptt.cc/bbs/PC_Shopping/M.1588075782.A.C1E.html
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,361
2,850
106
Lets see.

40 CU GPU was on par in performance with 40 SM Turing GPU, plus or minus the differences in turbo boost clock speeds.

Also, RDNA1 was on par in IPC with Turing GPUs, already: https://www.computerbase.de/2019-07.../4/#diagramm-performancerating-navi-vs-turing

The difference is just 1-2% margin IPC benefit for RDNA1 architecture.
......
And here is a different result.
2070 Super 1879Mhz average clockspeed
5700 XT 1887Mhz average clockspeed
Difference in performance is 9-14% in favor of Nvida as shown here: Techpowerup Performance Summary
Maybe Navi is bandwidth starved at higher clocks and that's why the result is different.
 
Last edited:
  • Like
Reactions: ozzy702 and Konan

ModEl4

Member
Oct 14, 2019
71
33
61
We can go the other direction, with TechPowerUp's relative GPU performance chart showing the 2080ti being 34% faster than the 5700XT, a 40 CU RDNA1 card.

The 5700XT maintains clockspeeds around 1900 MHz. Simply increasing the sustained clocks to 2500 MHz would grant potentially a 31% performance increase. Couple that with a minor (9% I've seen flying around?) IPC increase and you could well have the 40CU RDNA2 performing at 5700XT + 34% levels.
According to TechPowerup (below the link with the latest VGA review)
At QHD the difference between 2080Ti and 5700XT =74/52 (1.42%) and at 4K= 67/44 (1.52%).
If you take the QHD scaling then it is wrong to apply perfect linear scaling for the RDNA2 40CU part because the QHD performance advantage vs 5700XT would be lower than the 4K equivalent...
So, yes It can match 2080Ti at 4K (1.52%) if the clock is 2.5GHz and the IPC is +16% AND most importantly there is zero deficit regarding actual pixel fillrate (a 2.5GHz 5700XT would need on a 192bit bus 24.5Gbps GDDR6 (hypothetically, if there was 24.5Gbps)instead of 14Gbps) so also this 40CU part have the magic cache? Let's hope!
 
  • Like
Reactions: lightmanek

ModEl4

Member
Oct 14, 2019
71
33
61
Lets see.

40 CU GPU was on par in performance with 40 SM Turing GPU, plus or minus the differences in turbo boost clock speeds.

Also, RDNA1 was on par in IPC with Turing GPUs, already: https://www.computerbase.de/2019-07.../4/#diagramm-performancerating-navi-vs-turing

The difference is just 1-2% margin IPC benefit for RDNA1 architecture.

Now lets look at Ampere.

68 SM GPU with 320 bit bus is 25% faster than RTX 2080 Ti, which has exactly the same number of SM's but 352 bit bus.

We can safely assume that Ampere brought 25% IPC increase, per SM.

The problem for Ampere is that it clocks somewhat similarly to Turing, it didn't brought clock speed increases.

Now, Navi 22 has, according to MacOS data 2.5 GHz, and according to AMD, themsleves, RDNA2 GPUs have 50% performance per watt increase, and that includes both, IPC and clock speeds.

2.5 GHz is 33% higher clock speed over 1.887 Mhz of 40 CU RX 5700 XT.

And that is without any IPC differences. If RDNA2 brings 10% uplift over RDNA1 in IPC? 40 RDNA2 CUs at 1.8 GHz will perform like 44 RDNA1 CUs at 1.8 GHz. Add to that 33% more performance and you get... RTX 2080 Ti performance levels.

The thing is, what if that 50% performance per watt increase is simply IPC and clock speeds, per CU, as Paul from RedGamingTech says?

What if it actually 20% IPC increase, and 33% clock speeds, at the same time?

That 40 CU GPU, will be 15% faster than RTX 2080 Ti. Just like Coreteks, and AdoredTV suggested that Big Navi GPUs will perform.

My personal expectation is that RDNA2 brings 10% IPC uplift over RDNA1. But I will absolutely not be surprised if the cache patents and discussion that was already done will come to fruition, and that RDNA2 actually has that 20% IPC uplift over RDNA1.

Because since we are on a hype train, and we have seen the clock speeds of Navi 22, what the hell could not be possible, at this point?
According to my calculations 2070S is around 8% faster than 5700XT (according to TechPowerup is nearly +11.5% 49/44 at 4K)
Also we have 1-2% actual clock difference, so the 40SM part has around 10% higher IPC than the 40CU part. Still this wrong, in reality it is higher if you take account the potential actual pixel fillrate at 448Gbps that the 2 designs can achieve and how it skews the results, but I won't go there. In fact I will stop here, OK let's say the 2.5GHz 40CU part is at 2080Ti level (100) what this tell us for the 2.2GHz 80CU part in relation with 3090 (150) , is it going to be let's say (165) +10% faster? (with perfect scaling it would be 17,33% faster...)
Let's not examine the +15% 2080Ti rumor (in select AMD optimized games was the expression he used) that Coretecs spread because in his site he clarified that this number he got it from a first party (lol, is it Sapphire?, I don't know) and is for the 80CU part and now after NAAF video he had second thoughts etc.
I would love if 40CU part was indeed 15% faster than 2080Ti but this would also mean a certain performance for the 80CU part (around 165+15%?) which I refuse to believe🤯
 
  • Like
Reactions: Elfear

Glo.

Diamond Member
Apr 25, 2015
5,711
4,559
136
According to my calculations 2070S is around 8% faster than 5700XT (according to TechPowerup is nearly +11.5% 49/44 at 4K)
Also we have 1-2% actual clock difference, so the 40SM part has around 10% higher IPC than the 40CU part. Still this wrong, in reality it is higher if you take account the potential actual pixel fillrate at 448Gbps that the 2 designs can achieve and how it skews the results, but I won't go there. In fact I will stop here, OK let's say the 2.5GHz 40CU part is at 2080Ti level (100) what this tell us for the 2.2GHz 80CU part in relation with 3090 (150) , is it going to be let's say (165) +10% faster? (with perfect scaling it would be 17,33% faster...)
Let's not examine the +15% 2080Ti rumor (in select AMD optimized games was the expression he used) that Coretecs spread because in his site he clarified that this number he got it from a first party (lol, is it Sapphire?, I don't know) and is for the 80CU part and now after NAAF video he had second thoughts etc.
I would love if 40CU part was indeed 15% faster than 2080Ti but this would also mean a certain performance for the 80CU part (around 165+15%?) which I refuse to believe🤯
Navi 1 requires more bandwidth to scale with clocks. IPC, clock for clock, SM for SM, with sufficient bandwidth will give you 1-2% higher IPC on RDNA1 than Turing.

In general I believe that base Navi 22 die with 40 CUs/192 bit bus and 2.5 GHz should be on par with RTX 2080 Ti in performance. Pessimistic calculations suggest that this die will be around 15% faster than RTX 2080 Super. Optimistic - even up to 15% performance higher than RTX 2080 Ti, if the IPC really increased by 20%, over RDNA1.

It is pretty mental architecture, after all, regardless of the outcome.
 

Kuiva maa

Member
May 1, 2014
181
232
116
And here is a different result.
2070 Super 1879Mhz average clockspeed
5700 XT 1887Mhz average clockspeed
Difference in performance is 9-14% in favor of Nvida as shown here: Techpowerup Performance Summary
Maybe Navi is bandwidth starved at higher clocks and that's why the result is different.


HU did find them to be pretty much on par right now at 1080/1440p and 2070S faster by 5% at 4k.
 

ModEl4

Member
Oct 14, 2019
71
33
61
Navi 1 requires more bandwidth to scale with clocks. IPC, clock for clock, SM for SM, with sufficient bandwidth will give you 1-2% higher IPC on RDNA1 than Turing.

In general I believe that base Navi 22 die with 40 CUs/192 bit bus and 2.5 GHz should be on par with RTX 2080 Ti in performance. Pessimistic calculations suggest that this die will be around 15% faster than RTX 2080 Super. Optimistic - even up to 15% performance higher than RTX 2080 Ti, if the IPC really increased by 20%, over RDNA1.

It is pretty mental architecture, after all, regardless of the outcome.
I agree, Navi1 indeed requires more bandwidth (much more) to scale, that is the reason I am saying that the Turing IPC is greater than 10%, taking into account the complete design.
If you double the ROPS in a design like 2070S and also on 5700XT (minor increase in transistors in both designs) you will see much greater difference than the 10% I mentioned,
the SMs/CUs/clocks haven't changed, (nearly same result theoretically you will get if you cut in half the memory speed, so you can try it, but I don't know how driver/game optimizations will affect the results) so keeping the same SM & CU counts and clocks but halving the bandwidth or doubling the ROPs for example, gives different results, so comparing two models doesn't always tell the truth about general design assumptions.
If Nvidia wanted to make an "Ampere2" in 2 years, just going to TSMC 7nm and doubling the ROPs and using 24Gbps GDDR6X a 2,5GHz "3070"would be at 3090 level with minor die size increase in relation with 3070!
For example in Pascal, we had for a 1070ti 1683MHz 64ROPs to be fed by a 256bit bus with 8Gbps, so the exact analogy regarding theoretical pixel fillrate/bandwidth. I'm sure Turing had the same ROPs efficiency as Pascal, the main reason they nearly doubled the bandwidth (8 vs 14Gbps) is for raytracing. The only thing that troubles me is that with Ampere they moved the ROPs inside the SMs, this certainly has a deficit, but I don't know how big it is yet.
Not an apple fan made a video about how Ampere is a more compute focus design and only mentioned the absolute obvious basic stuff, lol these Youtubers, this is the guy that Coretecs quoting and generating all this hype?(Probably Coretecs wanted to help him reach the 4.000 likes, I don't know) Anyway, I like the guy, also I love Ireland so I wish him the best!
I don't want to sound too negative about RDNA2, so let me add to the hype then😄 Microsoft in Hot Chips presentation said that a SeriesX CU has 25% IPC in relation with the previous generation. Now the vast majority of the press cover it in the context that Microsoft was talking in relation with RDNA1, some (a minority) thought that with the term previous generation MS was referring to XBOX oneX (GCN4) so no IPC increase worth mentioning (like 3% Zen+) vs RDNA1 , or was referring it as margin GCN4 75% RDNA2 100% , so 33,3% in mark-up (around +7% in relation with RDNA1)
If anyone who saw the presentation, can clarify (without a doubt) that MS was referring to RDNA1 with the term previous generation, that would be great!
 

Hitman928

Diamond Member
Apr 15, 2012
5,316
7,994
136
I don't want to sound too negative about RDNA2, so let me add to the hype then😄 Microsoft in Hot Chips presentation said that a SeriesX CU has 25% IPC in relation with the previous generation. Now the vast majority of the press cover it in the context that Microsoft was talking in relation with RDNA1, some (a minority) thought that with the term previous generation MS was referring to XBOX oneX (GCN4) so no IPC increase worth mentioning (like 3% Zen+) or was referring it as margin GCN4 75% RDNA2 100% , so 33,3% in mark-up (around +7% in relation with RDNA1)
If anyone who saw the presentation, can clarify (without a doubt) that MS was referring to RDNA1 with the term previous generation, that would be great!

Microsoft wouldn't be relating what's in XSX to RDNA1, they would be relating it to their own product, so X1X. With that said, we don't know how different RDNA2 that ends up in consumer cards may be from what's in XSX. It is probably very similar but we'll have to wait to find out for sure.
 

raghu78

Diamond Member
Aug 23, 2012
4,093
1,475
136
Microsoft wouldn't be relating what's in XSX to RDNA1, they would be relating it to their own product, so X1X. With that said, we don't know how different RDNA2 that ends up in consumer cards may be from what's in XSX. It is probably very similar but we'll have to wait to find out for sure.

AMD has said that RDNA2 will have higher perf/clock vs RDNA. How much is still to be seen when the final products arrive.

 
Last edited:

PhoBoChai

Member
Oct 10, 2017
119
389
106
Thanks, but I think best to wait for actual game review results (which looking forward to) rather than Theoretical performance ones.

Probably from an internal AIB brief, which is why the rumors were that Big Navi had faster RT than Turing, but not faster than Ampere. Not that it really made any difference in gaming. :0
 

Hitman928

Diamond Member
Apr 15, 2012
5,316
7,994
136
Probably from an internal AIB brief, which is why the rumors were that Big Navi had faster RT than Turing, but not faster than Ampere. Not that it really made any difference in gaming. :0

Assuming for a moment that the graph is even accurate and ignoring the speed of rasterization, then which would be faster with ray tracing turned on would seem to me to be dependent on the scene in question as Navi would be faster at calculating box intersections but slower at triangle intersections. So if a scene had less complex geometry or fewer ray intersections, Navi could still potentially calculate the ray tracing faster than a 3090. I don't want to get too deep into the consequences of that chart as there are so many things to consider (including its accuracy), just wanted to point out that detail based upon the data presented.
 
  • Like
Reactions: Tlh97 and Elfear

PhoBoChai

Member
Oct 10, 2017
119
389
106
Assuming for a moment that the graph is even accurate and ignoring the speed of rasterization, then which would be faster with ray tracing turned on would seem to me to be dependent on the scene in question as Navi would be faster at calculating box intersections but slower at triangle intersections. So if a scene had less complex geometry or fewer ray intersections, Navi could still potentially calculate the ray tracing faster than a 3090. I don't want to get too deep into the consequences of that chart as there are so many things to consider (including its accuracy), just wanted to point out that detail based upon the data presented.

As I understood it, faster BVH traversal would allow for early discard for misses so you won't have to waste further cycles on that ray. Or faster ray depth, so distant reflections may have a smaller perf hit.

Funny that this chart assumes Turing had 1/4 ray-triangle rate too, while Ampere has 1/2.
 

ModEl4

Member
Oct 14, 2019
71
33
61
Microsoft wouldn't be relating what's in XSX to RDNA1, they would be relating it to their own product, so X1X. With that said, we don't know how different RDNA2 that ends up in consumer cards may be from what's in XSX. It is probably very similar but we'll have to wait to find out for sure.
That's also what my logic says.
I guess I didn't add anything to the hype after all 🤔
Anyway, below what I think is possible with a hypothetical 15% IPC increase in relation with RDNA1 at 4K resolution (although probably I'm wrong based on how many people on the forum believe the hype)
2.5GHz 40CU/48RBEs 12GB no magic cache $449 210W (+40% vs 5700XT, still 2080Ti will be 8% faster)
Then automatically this means a certain performance level for Big Navi
2.2GHz 80CU/128RBEs 16GB magic cache $999-$899 +2% from a $999 3090 12GB
In any case, everything will be clear in a few weeks anyway
 
  • Like
Reactions: kurosaki

ModEl4

Member
Oct 14, 2019
71
33
61
Infinity Cache is inherent part of RDNA2 architecture, and will be available in all of RDNA2 GPUs.

There will not be a desktop GPU without it.
So they also scale down the cache as you go down in segments? ($250-$150 segment for example?) If true, I have a feeling that the main reason for cache will be raytracing, not balancing the 256bit bus deficit etc.
 

Zstream

Diamond Member
Oct 24, 2005
3,396
277
136
The more I read, and attempt to understand IC, it would come to be that AMD is, or will be using a chiplet design later on down the line. We will get compute clusters sharing a large IC, and an even larger IC cache as a whole. So if you have compute units, with high levels of cache hits, the exact process will continue to exploit the initial number of CU's presented. If there are high levels of processing, failing to use cache, the system will automatically add additional compute units to the workload. This will provide complete CU utilization, and increase efficiency across the board.

With the above, I doubt AMD really has much of an IPC increase outside of the cache and high clocks.
 

eek2121

Platinum Member
Aug 2, 2005
2,930
4,026
136
I recently received an information (third party source, loosely affiliated with a board partner) that some models would release on Nov. 9th. Not claiming this to be accurate at all, but the timing seems right.

EDIT: The reason why I'm saying to take it with a grain of salt is that supposedly board partners aren't going to be able to launch their own cards until next year.
 

uzzi38

Platinum Member
Oct 16, 2019
2,635
5,983
146
I recently received an information (third party source, loosely affiliated with a board partner) that some models would release on Nov. 9th. Not claiming this to be accurate at all, but the timing seems right.

EDIT: The reason why I'm saying to take it with a grain of salt is that supposedly board partners aren't going to be able to launch their own cards until next year.
Ngl I heard something similar, but no concrete dates. Just a very, very, very, very, very, very limited AIB launch in November.

Truckload of salt at the least. I'm not even sure I believe it.
 

Glo.

Diamond Member
Apr 25, 2015
5,711
4,559
136
Ngl I heard something similar, but no concrete dates. Just a very, very, very, very, very, very limited AIB launch in November.

Truckload of salt at the least. I'm not even sure I believe it.
Considering that AMD was full on production of those GPUs already, but only for themselves, and AIBs still haven't got their hands on the GPUs, I can believe in this information.

If AIB's would get their hands on those GPUs already, we would see an avalanche of leaks.
 

Kenmitch

Diamond Member
Oct 10, 1999
8,505
2,249
136
I recently received an information (third party source, loosely affiliated with a board partner) that some models would release on Nov. 9th. Not claiming this to be accurate at all, but the timing seems right.

EDIT: The reason why I'm saying to take it with a grain of salt is that supposedly board partners aren't going to be able to launch their own cards until next year.

The reference cooler at least looks worthy. I guess we'll see how it plays out in the end.