Question Speculation: RDNA2 + CDNA Architectures thread

Page 120 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,622
5,879
146
All die sizes are within 5mm^2. The poster here has been right on some things in the past afaik, and to his credit was the first to saying 505mm^2 for Navi21, which other people have backed up. Even still though, take the following with a pich of salt.

Navi21 - 505mm^2

Navi22 - 340mm^2

Navi23 - 240mm^2

Source is the following post: https://www.ptt.cc/bbs/PC_Shopping/M.1588075782.A.C1E.html
 

Zstream

Diamond Member
Oct 24, 2005
3,396
277
136
A little follow up.

If RDNA2 achieved 10% IPC increase - 80 RDNA2 CUs at 1.8 GHz will perform just like 88 RDNA1 CUs clocked at 1.8 GHz.

2.2 GHz clock speeds are 16% above RX 5700 XT clock speeds(1887 MHz).

In essence, if AMD has found a way, that 256 bit memory bus is enough for 80 RDNA2 CUs that GPU should be, with full config, with 2.2 GHz around 135% above RX 5700 XT.

Around 10-15% above RTX 3090. In 4K.

And this is only with 10% IPC increase.

This is all theoretical calculation.

So we better pray, that 256 bit bus is enough to feed those CUs, and that AMD was able to find a way, that those CUs scale in performance, similarly to RDNA1 architecture.
You’re assuming it has increased IPC. Based off the demo, if THAT is their GOD card, IPC is less than 10%, and all gains are done by clock speed increases and some extra cache.
We better pray they are sandbagging for the end of month, otherwise...
 

andermans

Member
Sep 11, 2020
151
153
76
They double the number of CUs so IPC of the whole GPU should have increased by quite a lot actually :)

On a more serious note, I think improvements due to better caching are counted in the IPC number? (as in performance = (perf per instruction) * (instructions per clock) * (clock frequency). The component increasing with a better cache would be IPC)

My stock rx 5700xt has ~2020 MHz as top level in the powerplay table (even though it doesn't achieve it for more than a few seconds) so with the 2050-2200 MHz top powerplay levels (i.e. like for like) that were leaked for navi 21 that is a 1.5-9% increase in clock speeds. (assuming those were accurate anyway, which I doubt)

So if we're looking at a ~1.9x scaling in perf and subtract the 9% clock increase that is a ~1.75x performance improvement due to more CUs and better IPC. With 2x the CUs we always knew it wasn't going to scale linearly so I think this is actually not horrible though it obviously isn't great either. Obviously curious how many improvements they actually made to the core vs. just scaling to more CUs.
 

moinmoin

Diamond Member
Jun 1, 2017
4,944
7,656
136
Why wouldn't cache be counted as part of "IPC"? AMD's claim/goal is 50% perf/watt improvement for RDNA2. Everything that gets it there is fair play.
Something else to speculate on?

Kinda inane article, why would AMD challenge its own long time customers in the console field? It's far more likely that Project Quantum, if ever released commercially, is supposed to challenge Intel's gaming NUCs.
 

Kenmitch

Diamond Member
Oct 10, 1999
8,505
2,249
136
Kinda inane article, why would AMD challenge its own long time customers in the console field? It's far more likely that Project Quantum, if ever released commercially, is supposed to challenge Intel's gaming NUCs.

Tom's probably spun it as a console challenger to get the clicks.

There's a tear down video of the original one. Being dated it's kind of funny listening to him talk about Fury. The design kind of looks interesting. With todays hardware and a exterior makeover it could make a decent rig. Pricey most likely.

 

DJinPrime

Member
Sep 9, 2020
87
89
51
AMD claimed RDNA2 is +50% more perf/watt

AMD-RDNA2-Slide.jpg


It's Impossible to tweak N7 for 50% more performance at the same watt.so How can they gain 50% more perf/watt ? also Look at the word "IPC" , This is official from AMD slides.AMD claims there is improved Perf-per-clock (IPC) over RDNA1 , unless they mean using features like VSR/DirectML or etc can improve perf.
AMD might be borrowing a page from NV marketing. Ampere is upto 1.9X Turing Perf/Watt after all (at 60 fps on some game). Without knowing at which level of performance AMD is talking about, the % increase is just marketing talk.
 

Saylick

Diamond Member
Sep 10, 2012
3,125
6,294
136
AMD might be borrowing a page from NV marketing. Ampere is upto 1.9X Turing Perf/Watt after all (at 60 fps on some game). Without knowing at which level of performance AMD is talking about, the % increase is just marketing talk.
I'm not so sure about that. I probably sound like an AMD loyalist here but just to lay out the facts, the 5700XT was 14% faster than Vega 64 and the 5700 was 15% faster than Vega 56 @ 1440p according to Computerbase. Normalizing for power, the 5700 had 49% higher perf/W than Vega 56 and the 5700XT had 63% higher perf/W than Vega 64.

So yeah, I have no reason to believe AMD would all of a sudden calculate perf/W at some arbitrary performance level of RDNA 1 and downclocking RDNA 2 to hit a much lower power target. Benchmarks will ultimately tell the truth, but given what AMD has shown in the past vs. what reviews of those products have told us, I think I'll give AMD the benefit of the doubt and assume their Zen physical optimization team are doing a fine job.

PerfperWatt 5700XT.PNG
PerfperWatt 5700.PNG
 

uzzi38

Platinum Member
Oct 16, 2019
2,622
5,879
146
AMD might be borrowing a page from NV marketing. Ampere is upto 1.9X Turing Perf/Watt after all (at 60 fps on some game). Without knowing at which level of performance AMD is talking about, the % increase is just marketing talk.
AMD's 50% perf/W is measured using the same configuration (40CUs) and same power draw. With those two set, performance is then measured.
98cac05ad8b8d8b9be443f7a23430f66.jpg
69c33389369c1cd7ae806729ca4a882a.jpg
 

DJinPrime

Member
Sep 9, 2020
87
89
51
AMD's 50% perf/W is measured using the same configuration (40CUs) and same power draw. With those two set, performance is then measured.
98cac05ad8b8d8b9be443f7a23430f66.jpg
69c33389369c1cd7ae806729ca4a882a.jpg
That's not different than what NV was doing either. Same CUs, at which end of the performance curve? They could also be saying at 60 fps, rdna2 use 1/2 the power of rdna1. That's different than saying with the same 40 CU, rdna2 is 50% more fps. The fine print does not say 50% better fps, you can see it's less 140, 136, 49, 37, 84 vs 103,113,41,32,72 in 3d Marks. They also said 40 cu 5700xt vs 64 vega gives 14% higher in something. It's not 50%.
Also, from benchmarks we know 3080 is about 100% faster than the rdna1. If rdna2 is really 50% faster, then it will be 100% (40 -> 80 cu) * 50%. No way it's going to be 50% faster than 3080. Even accounting for nonlinear scaling of the CUs, it's not realistic.
 

Guru

Senior member
May 5, 2017
830
361
106
You: "The 3080 is 80% faster than the 5700XT showing that AMD must be getting roughly 80% scaling with double the CUs"

Me: "No it seems to show 100% scaling if they can actually pull off 3080 performance *provides evidence"

You:
Well I disagree that its 100% faster, I think that is the fundamental difference. I am looking at 1440p though, since that is the resolution most gamers are going to play on, as 4k monitors are way more expensive and target 60fps at most, while at 1440p you can still target 120fps gaming.

Looking at various different benches its about 80 to 86% faster at 1440p on average, with what AMD showed at their presentation of big navi, it seems like big navi is going to be about 5% slower than the RTX 3080 on average, and that comes at just about 80% performance increase over RX 5700xt with double the CU's and that is applying much higher clock as well, realistically without the higher clock and likely faster memory bandwodth, its like just pure cu on cu increase we are seeing about 70% scaling.
 

DJinPrime

Member
Sep 9, 2020
87
89
51
I'm not so sure about that. I probably sound like an AMD loyalist here but just to lay out the facts, the 5700XT was 14% faster than Vega 64 and the 5700 was 15% faster than Vega 56 @ 1440p according to Computerbase. Normalizing for power, the 5700 had 49% higher perf/W than Vega 56 and the 5700XT had 63% higher perf/W than Vega 64.

So yeah, I have no reason to believe AMD would all of a sudden calculate perf/W at some arbitrary performance level of RDNA 1 and downclocking RDNA 2 to hit a much lower power target. Benchmarks will ultimately tell the truth, but given what AMD has shown in the past vs. what reviews of those products have told us, I think I'll give AMD the benefit of the doubt and assume their Zen physical optimization team are doing a fine job.

View attachment 31636
View attachment 31637
These numbers are not normalized to the CUs, I don't think it's the same as the 50% they're referring to.
 

uzzi38

Platinum Member
Oct 16, 2019
2,622
5,879
146
That's not different than what NV was doing either. Same CUs, at which end of the performance curve? They could also be saying at 60 fps, rdna2 use 1/2 the power of rdna1. That's different than saying with the same 40 CU, rdna2 is 50% more fps. The fine print does not say 50% better fps, you can see it's less 140, 136, 49, 37, 84 vs 103,113,41,32,72 in 3d Marks. They also said 40 cu 5700xt vs 64 vega gives 14% higher in something. It's not 50%.
Also, from benchmarks we know 3080 is about 100% faster than the rdna1. If rdna2 is really 50% faster, then it will be 100% (40 -> 80 cu) * 50%. No way it's going to be 50% faster than 3080. Even accounting for nonlinear scaling of the CUs, it's not realistic.

Nope. Nvidia compared a 3080 to a 2080 Super from the looks of it (roughly 70-80% performance based on the graph here while the Turing card pulls ~250W), which is an entirely different comparison. 48SMs vs 62SMs, 3072 ALUs vs 8704 ALUs. Shock of the day, wide and slow is better efficiency-wise than fast and slim. Who woulda thunk it?

Secondly, you should reread the endnotes. The ones for the slide I posted are RX-325 and RX-362. RX-325 states that the perf/W comparison is done in The Division 2 at 1440p. RX-362 states that the perf/W breakdown on the right is achieved by capping a Vega64 to 40CUs and comparing it to a 5700XT. They then stated that as an average across all 3 resolutions (not just 1440p), the 5700XT performs 14% faster. I'd just ignore this, because if you look at the platform details and the game in question, you can probably understand why that gap isn't larger.

As for the last bits and pieces, who the hell said 50% faster than a 3080? No way in hell. I'm just here to laugh at those that still think AMD aren't pumping out something 3080 +- 5% minimum.

Oh how the tides have turned. We're truly in a reality where AMD uses normal memory, less ALUs and higher clocks to achieve the same performance as Nvidia. Not to mention has marketing that doesn't try to overhype people as hard as well as not focusing too hard on the competition. What kind of twisted world is this?
8527fe87aa1559fcec4f93f4003865fa.jpg
 

DJinPrime

Member
Sep 9, 2020
87
89
51
As for the last bits and pieces, who the hell said 50% faster than a 3080? No way in hell. I'm just here to laugh at those that still think AMD aren't pumping out something 3080 +- 5% minimum.
Well, big Navi will be going from 40 to 80 CUs, so that's a 100% increase in CU. And if each CU is 50% more efficient, then you should get significantly better performance than the 3080. Maybe they're keeping the power extremely low. Otherwise, the math don't make sense.
Also, if you think that endnote is clear where the 50% is from, please do share the math.
If your product is that great, then there's nothing wrong with stating facts. In the past, they hyped up their product when they should have known it's inferior. If they have the better product now, then they should be hyping it up.
 

Glo.

Diamond Member
Apr 25, 2015
5,705
4,549
136
You’re assuming it has increased IPC. Based off the demo, if THAT is their GOD card, IPC is less than 10%, and all gains are done by clock speed increases and some extra cache.
We better pray they are sandbagging for the end of month, otherwise...
No. If the GPU they demoed is the top dog, AMD achieved regression in IPC with RDNA2.

Illogical. AMD has pointed out in their slides that there is IPC increase with RDNA2 over RDNA1.
 

Zstream

Diamond Member
Oct 24, 2005
3,396
277
136
No. If the GPU they demoed is the top dog, AMD achieved regression in IPC with RDNA2.

Illogical. AMD has pointed out in their slides that there is IPC increase with RDNA2 over RDNA1.
So, we’re at the point to where they demonstrated a lesser tier card? If so, ok. Otherwise regression for an 80cu card.
 

maddie

Diamond Member
Jul 18, 2010
4,738
4,667
136
Something else to think about when comparing the 320 vs 256 bus widths on the Xbox series X and the rumored N21. Xbox has an 8C CPU.

How much bandwidth will this need to be fully utilized?
 

DJinPrime

Member
Sep 9, 2020
87
89
51
I got some sleep and played around with the 50% improvement, I think I'm thinking this clearly. If not, please correct.
CU​
Watt​
FPS​
Perf/Watt/CU​
1​
40​
225​
100​
0.0111111111​
2​
40​
150​
100​
0.0166666667​
3​
40​
225​
150​
0.0166666667​
4​
80​
225​
300​
0.0166666667​
5​
80​
300​
400​
0.0166666667​
6​
80​
275​
366​
0.0166363636​
7​
64​
275​
293​
0.0166477273​

So, row 1 is the baseline 5700xt with an imaginary 100 fps, last column is the efficiency normalized to per core. Then just to convince myself that my numbers are correct, row 2 and 3 shows 2 different ways to get a 50% improvement. 2 is if the power usage have a 50% improvement (150 + 75 = 225), with everything else the same. We get 0.0167%, which is 50% better than 0.011%. Row 3 is if I kept the power the same and then bump the performance by 50% (100 + 50 = 150), we see the same 0.01667%. Now that I convinced myself that the excel is valid, then let's take AMD's word about a 50% improvement and apply it to potentially what big Navi could be. Row 4, is big Navi with 80 CU and with the same Watt, we need a FPS of 300 to get to the 50% improvement. That's really high. But we also know that they'll be bumping up the power, so arbitrarily say 300 Watt. Row 5 would require a 400 fps to get to the 50% improvement per CU. Row 6 is if we dial down the power to 275 Watt. All crazy high FPS, so maybe it's not a 80 CU. Row 7 is for a 64 CU at a reasonable 275 Watt, which still require a 293 fps to get to the 50% improvement. The 3080 is only around 200 fps...so what's up with 50% improvement?

Here's what the numbers would look like if we're not normalizing to the CU and instead just look at the entire GPU.

Watt​
FPS​
Perf/Watt​
1​
225​
100​
0.4444444444​
2​
150​
100​
0.6666666667​
3​
225​
150​
0.6666666667​
4​
300​
200​
0.6666666667​
5​
275​
184​
0.6690909091​

Row 4 and 5 is big Navi with some Watt and FPS estimate. These are much more reasonable numbers, but then the 50% is not from IPC improvement, but rather from 40 -> XX increase in CU mainly.
 
  • Wow
  • Like
Reactions: Grazick and Tlh97

Mopetar

Diamond Member
Jan 31, 2011
7,831
5,980
136
No. If the GPU they demoed is the top dog, AMD achieved regression in IPC with RDNA2.

Illogical. AMD has pointed out in their slides that there is IPC increase with RDNA2 over RDNA1.

There's a third possibility where IPC has increased, but the limited memory bus essentially leaves the rest of the hardware starved. If whatever magic they're supposedly doing to get by with a smaller bus didn't work out as well as hoped then I could see performance suffering for it. IPC could still have gone up, but if you can't realistically feed more than 60 of the 80 CUs in most cases then you will see a performance hit like this.

If that were the case, then it would make sense for AMD to go with GDDR6X memory on the high end cards though. We haven't heard anything about that yet, but it's such an obvious solution to that problem. This makes me think that it's more likely that we didn't get to see the full 80 CU card, but it's still a possibility for the memory bandwidth to be a bottleneck for them.
 

Mopetar

Diamond Member
Jan 31, 2011
7,831
5,980
136
Kinda inane article, why would AMD challenge its own long time customers in the console field? It's far more likely that Project Quantum, if ever released commercially, is supposed to challenge Intel's gaming NUCs.

Yeah, that thing reminds me of the Steam box. I don't think those did particularly well in terms of sales, but I could see AMD making some of their own small form factor PCs as a way to get some additional marketing for their components more than as a serious competitor to the console manufacturers.
 

Saylick

Diamond Member
Sep 10, 2012
3,125
6,294
136

Looks like Planet3D added in more review sites to their comparison. On average, about 80-85% faster than the 5700XT which leads me to think there's another 10% to be gained since I am expecting a legit 2x perf increase over Navi 10. If so, the GPU that AMD teased is most likely the 72 CU variant and another 8 CUs falls right in-line with the missing performance for a full 2x increase over RDNA 1.
 

Glo.

Diamond Member
Apr 25, 2015
5,705
4,549
136
There's a third possibility where IPC has increased, but the limited memory bus essentially leaves the rest of the hardware starved. If whatever magic they're supposedly doing to get by with a smaller bus didn't work out as well as hoped then I could see performance suffering for it. IPC could still have gone up, but if you can't realistically feed more than 60 of the 80 CUs in most cases then you will see a performance hit like this.
If it would be because of limited memory bandwidth, do you believe AMD would pick 256 bit memory bus for this GPU?

Its as if AMD engineers are so insanely clueless at GPU design, they can clock their GPU to 2.5 GHz but do not understand it will be bottlenecked by 256 bit memory bus...
 

Glo.

Diamond Member
Apr 25, 2015
5,705
4,549
136
I got some sleep and played around with the 50% improvement, I think I'm thinking this clearly. If not, please correct.
CU​
Watt​
FPS​
Perf/Watt/CU​
1​
40​
225​
100​
0.0111111111​
2​
40​
150​
100​
0.0166666667​
3​
40​
225​
150​
0.0166666667​
4​
80​
225​
300​
0.0166666667​
5​
80​
300​
400​
0.0166666667​
6​
80​
275​
366​
0.0166363636​
7​
64​
275​
293​
0.0166477273​

So, row 1 is the baseline 5700xt with an imaginary 100 fps, last column is the efficiency normalized to per core. Then just to convince myself that my numbers are correct, row 2 and 3 shows 2 different ways to get a 50% improvement. 2 is if the power usage have a 50% improvement (150 + 75 = 225), with everything else the same. We get 0.0167%, which is 50% better than 0.011%. Row 3 is if I kept the power the same and then bump the performance by 50% (100 + 50 = 150), we see the same 0.01667%. Now that I convinced myself that the excel is valid, then let's take AMD's word about a 50% improvement and apply it to potentially what big Navi could be. Row 4, is big Navi with 80 CU and with the same Watt, we need a FPS of 300 to get to the 50% improvement. That's really high. But we also know that they'll be bumping up the power, so arbitrarily say 300 Watt. Row 5 would require a 400 fps to get to the 50% improvement per CU. Row 6 is if we dial down the power to 275 Watt. All crazy high FPS, so maybe it's not a 80 CU. Row 7 is for a 64 CU at a reasonable 275 Watt, which still require a 293 fps to get to the 50% improvement. The 3080 is only around 200 fps...so what's up with 50% improvement?

Here's what the numbers would look like if we're not normalizing to the CU and instead just look at the entire GPU.

Watt​
FPS​
Perf/Watt​
1​
225​
100​
0.4444444444​
2​
150​
100​
0.6666666667​
3​
225​
150​
0.6666666667​
4​
300​
200​
0.6666666667​
5​
275​
184​
0.6690909091​

Row 4 and 5 is big Navi with some Watt and FPS estimate. These are much more reasonable numbers, but then the 50% is not from IPC improvement, but rather from 40 -> XX increase in CU mainly.
50% performance/watt uplift may mean simply that AMD achieved 10% IPC increase, with 30% clock speed uplift, and in 10% lower, overall SKU package.

40 CU RDNA2 GPU will clock to 2.5 GHz according to data we have from MacOS kexts. 2.5 GHz is 33% above 1887 MHz from RX 5700 XT.

In essence, who knows? We can speculate only how they did measured this uplift.

Paul from RGT, who broke out the name Infinity Cache weeks before it was official name, said that AMD achieved closer to 60% performance uplift, and that is PER CU.
 
Oct 13, 2020
1
0
6
6500XT40 CU6GB GDDR6~250mm²~175w TDP
6600XT56 CU8GB GDDR6~350mm²~215w TDP
6700XT60 CU8GB GDDR6~350mm²~235w TDP
6800XT172 CU12GB GDDR6~450mm²~255w TDP
6900XT80 CU16GB GDDR6~450mm²~275w TDP
6950XT296 CU24GB HBM2E~550mm²~300w TDP

1 Not part of initial line-up; preparatory response to 3070/80 Ti/Su.
2 Marketed as "ultimate prosumer" card, appears to be RVII 2.0.
 

Mopetar

Diamond Member
Jan 31, 2011
7,831
5,980
136
If it would be because of limited memory bandwidth, do you believe AMD would pick 256 bit memory bus for this GPU?

Its as if AMD engineers are so insanely clueless at GPU design, they can clock their GPU to 2.5 GHz but do not understand it will be bottlenecked by 256 bit memory bus...

It isn't as though the would have done that knowing in advance that it wouldn't work out. If they do have some large cache designed to help offset the smaller bus the simple explanation is that their approach didn't work out as well in reality as they had planned.

Would it be any different than Vega which was supposed to have all kinds of great performance enhancing features? Either they didn't work out or the magic drivers didn't materialize. But why did AMD go to the effort of trying any of that since they should have known it wouldn't work out.

If it's just a case of being 5-10% on one side of the 3080 instead of the other, it's hard to call it a failure as with some of the promised Vega tech. But, as I said if that were the only issue we'd see GDDR6X to fix the problem.

I only point it out as an alternative explanation for the results that we have been shown. We don't have to be in a world where the only two possibilities are IPC regression or cutdown GPU.
 

Glo.

Diamond Member
Apr 25, 2015
5,705
4,549
136
It isn't as though the would have done that knowing in advance that it wouldn't work out. If they do have some large cache designed to help offset the smaller bus the simple explanation is that their approach didn't work out as well in reality as they had planned.

Would it be any different than Vega which was supposed to have all kinds of great performance enhancing features? Either they didn't work out or the magic drivers didn't materialize. But why did AMD go to the effort of trying any of that since they should have known it wouldn't work out.

If it's just a case of being 5-10% on one side of the 3080 instead of the other, it's hard to call it a failure as with some of the promised Vega tech. But, as I said if that were the only issue we'd see GDDR6X to fix the problem.

I only point it out as an alternative explanation for the results that we have been shown. We don't have to be in a world where the only two possibilities are IPC regression or cutdown GPU.
We are talking about Raja's RTG, or David Wang's/Lisa Su's RTG?

Its not the same RTG anymore, guys.