Question Speculation: RDNA2 + CDNA Architectures thread

Page 34 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,673
6,229
146
All die sizes are within 5mm^2. The poster here has been right on some things in the past afaik, and to his credit was the first to saying 505mm^2 for Navi21, which other people have backed up. Even still though, take the following with a pich of salt.

Navi21 - 505mm^2

Navi22 - 340mm^2

Navi23 - 240mm^2

Source is the following post: https://www.ptt.cc/bbs/PC_Shopping/M.1588075782.A.C1E.html
 

Grooveriding

Diamond Member
Dec 25, 2008
9,108
1,260
126
Where is the claim coming from that 3080 is 1.9x perf of 2080? Normally I would assume that is some cherry picked far outlier, but, the RTX cards were so awful in performance improvement, that it seems possible to get maybe 1.6 or 1.7 real world improvement.

2080 was literally a 1080ti with better power consumption, and really low value rtx feature, and 2080ti was a 30% or so improvement. So it seems possible the 3080 could really be a lot faster than the 2080, because really it's 2 generations away from the 1080ti, so you'd expect about 70-80% more performance.

Unlike 1080ti owners, 2080ti and 2080 owners looking to sell on the used market are going to get absolutely dumpstered in loss of value. I figure 2080tis will be $400-$500 used and 2080s will be $250 or so? Awful depreciation lol
 

Saylick

Diamond Member
Sep 10, 2012
3,228
6,630
136
Where is the claim coming from that 3080 is 1.9x perf of 2080? Normally I would assume that is some cherry picked far outlier, but, the RTX cards were so awful in performance improvement, that it seems possible to get maybe 1.6 or 1.7 real world improvement.

2080 was literally a 1080ti with better power consumption, and really low value rtx feature, and 2080ti was a 30% or so improvement. So it seems possible the 3080 could really be a lot faster than the 2080, because really it's 2 generations away from the 1080ti, so you'd expect about 70-80% more performance.

Unlike 1080ti owners, 2080ti and 2080 owners looking to sell on the used market are going to get absolutely dumpstered in loss of value. I figure 2080tis will be $400-$500 used and 2080s will be $250 or so? Awful depreciation lol
From this plot specifically:
Ampere_PPW_575px.jpg


https://www.anandtech.com/show/1605...re-for-gaming-starting-with-rtx-3080-rtx-3090
The immediate oddity here is that power efficiency is normally measured at a fixed level of power consumption, not a fixed level of performance. With power consumption of a transistor increasing at roughly the cube of the voltage, a “wider” part like Ampere with more functional blocks can clock itself at a much lower frequency to hit the same overall performance as Turing. In essence, this graph is comparing Turing at its worst to Ampere at its best, asking “what would it be like if we downclocked Ampere to be as slow as Turing” rather than “how much faster is Ampere than Turing under the same constraints”. In other words, NVIDIA’s graph is not presenting us with an apples-to-apples performance comparison at a specific power draw.

If you actually make a fixed wattage comparison, then Ampere doesn’t look quite as good in NVIDIA’s graph. Whereas Turing hits 60fps at 240W in this example, Ampere’s performance curve has it at roughly 90fps. Which to be sure, this is still a sizable improvement, but it’s only a 50% improvement in performance-per-watt. Ultimately the exact improvement in power efficiency is going to depend on where in the graph you sample, but it’s clear that NVIDIA’s power efficiency improvements with Ampere, as defined by more normal metrics, are not going to be 90% as NVIDIA’s slide claims.

All of which is reflected in the TDP ratings of the new RTX 30 series cards. The RTX 3090 draws a whopping 350 watts of power, and even the RTX 3080 pulls 320W. If we take NVIDIA’s performance claims at their word – that RTX 3080 offers up to 100% more performance than RTX 2080 – then that comes with a 49% hike in power consumption, for an effective increase in performance-per-watt of just 34%. And the comparison for the RTX 3090 is even harsher, with NVIDIA claiming a 50% performance increase for a 25% increase in power consumption, for a net power efficiency gain of just 20%.
 

Hitman928

Diamond Member
Apr 15, 2012
5,405
8,302
136
Where is the claim coming from that 3080 is 1.9x perf of 2080? Normally I would assume that is some cherry picked far outlier, but, the RTX cards were so awful in performance improvement, that it seems possible to get maybe 1.6 or 1.7 real world improvement.

2080 was literally a 1080ti with power consumption, and 2080ti was a 30% or so improvement. So it seems possible the 3080 could really be a lot faster than the 2080, because really it's 2 generations away from the 1080ti, so you'd expect about 70-80% more performance.

Unlike 1080ti owners, 2080ti and 2080 owners looking to sell on the used market are going to get absolutely dumpstered in loss of value. I figure 2080tis will be $400-$500 used and 2080s will be $250 or so? Awful depreciation lol

It comes from an Nvidia slide where they downclocked a 3080 to match a 2080 (Super most likely) in performance in Control with RTX on and then compared the power use. It's a fairly worthless comparison. If you look at the slide at compare perf/w at actual operating power use for both cards, then it is ~34% improvement in perf/w. But again, that's in 1 game with I'm sure carefully chosen card models and settings.

If you watch the Digital Foundry video and run the numbers from that, it's more like 20 - 25% perf/w improvement on average. We'll have to wait for the full reviews to know for sure though obviously.
 

Saylick

Diamond Member
Sep 10, 2012
3,228
6,630
136
It's Navi 2x that will have the OC headroom, honestly. PS5 hits 2.2GHz+ clocks and Renoir Vega can OC to 2.4-2.5GHz. Ampere is 320W-350W stock, lol.
Yeah, I suspect that RDNA 2 can OC to well over 2200 MHz just going off of what we've seen on the PS5. We'll have to wait until reviews come out for the 3080 and 3090 to know what the average OC will be on Ampere in order to do a comparison. If Ampere routinely can OC to 2100 MHz then it will be hard for Big Navi to catch the 3090. If Ampere tops out at 2000 MHz and RDNA 2 can do 2200 MHz, I can see a scenario where an 80 CU RDNA2 part slots somewhere between a 3080 and 3090.
 

DDH

Member
May 30, 2015
168
168
111

DDH

Member
May 30, 2015
168
168
111
Reviewers can't give us numbers in FPS, so Digital Foundry gave a preview with percentages.
It's 1.7x minimum, with average more like 1.8x. Various cases are 1.9x and with RT its 2.0x. This with pre-launch drivers.
If it only changes from 1.9 to 2.0 with RT off vs RT on, that says the RT performance change between 3080 and 2080 is 10%. Straight away that tells you something funky is going on with the numbers and I wouldn't be hedging bets with them

RT performance improvement should be closer to 40%
 
  • Like
Reactions: spursindonesia

Veradun

Senior member
Jul 29, 2016
564
780
136
So this is the state of the rumors?

3 SKUs, all air cooled:
40WGPs
36WGPs
36WGPs

I think a 28WGPs can pop up later to fill the obvious gap with the tier below (Navi23? I'm confused tbh) that I believe will be no more than 24WGPs.

Let's see, exciting times ahead.
 

exquisitechar

Senior member
Apr 18, 2017
657
872
136
So this is the state of the rumors?

3 SKUs, all air cooled:
40WGPs
36WGPs
36WGPs

I think a 28WGPs can pop up later to fill the obvious gap with the tier below (Navi23? I'm confused tbh) that I believe will be no more than 24WGPs.

Let's see, exciting times ahead.
Navi22 will fill that segment. Navi23 is like Navi10, 40CUs.
 

Timorous

Golden Member
Oct 27, 2008
1,682
2,976
136
So this is the state of the rumors?

3 SKUs, all air cooled:
40WGPs
36WGPs
36WGPs

I think a 28WGPs can pop up later to fill the obvious gap with the tier below (Navi23? I'm confused tbh) that I believe will be no more than 24WGPs.

Let's see, exciting times ahead.

Seems so but with Ampere boasting nearly 60M Xtors/mm^2 and Renoir exceeding 60M Xtors/mm^2 it seems that if 500mm^2 is the die size for 'big navi' then 40WGPs is far too few unless the Samsung 10nm derived 8N process Nvidia are using can achieve higher densities than the TSMC N7? process AMD are using.
 

Gideon

Golden Member
Nov 27, 2007
1,678
3,805
136
After Amperes reveal we know more about what Nvidia has to offer so it's time to reevaluate what AMD needs to deliver. IMO AMD could still be neck-to-neck with RTX 3080, but they have to bring out all of the big guns and really execute well on top of that.

To Recap:
  • RTX 2080 is 123% of the RX 5700 XT perfomance at 4K.
  • RTX 3080 is 70-80% faster than that (look at the table. 80% is probably being best case, but 70% looks quite normal)
  • That ends up at roughly 210-220% of RX 5700 XT perfomance.
In order for the Big Navi to compete it needs to:
  1. Have the rumored 80 CUs (twice of RX 5700 XT)
  2. Use HBM2E or at least a 448-bit wide GDDR6 memory interface
  3. Have actual in-game boost clocks in the neighbourhood of 2.0 GHz
  4. And hopefully some extra tweaks to improve IPC slightly on top of all that

80 CUs is almost certainly already given.

Second point is needed to get the required bandwidth to compete. HBM2E is the preffered choice and hopefully what it will have. If it's GDDR6, then 448bit wide interace is the bare minimum that would give comparable bandwith to RTX 3080 (784 GB/s vs 760.0 GB/s) at standard clocks. Anything less won't really cut it. It doesn't matter how good the CUs are, it won't compete as it will be memory bandwidth bottlnecked (see how little overclocking helps the 5700 XT).

Third point is required because in reality doubling of resources almost never doubles the actual performance. They need additional clock-speed to offset that. See the 2080 Ti vs 2060 perfomance table below as an example. 2080 Ti is pretty close to doubling of resources of RTX 2060 yet it only nets 1.87x of the perfomance. In fact it's not closer to 2.15x the resources (when taking into account the 9% clock speed decline) with only memory bandwidth being 1.83x that of RTX 2060

Forth point proably isn't needed to compete with RTX 3080 per se, but it is definitely needed to outperform it.

RTX 2060RTX 2080 TiChange in %
Relative perfomance @ 4K (compared to RTX 2060)100%187%
+ 87%
Boost Clock1680 Mhz1545 MHz- 9%
Shading Units19204352+ 126%
TMUs120272+ 126%
ROPs4888+ 126%
SM Count3068+ 126%
Tensor Cores240544+ 126%
RT Cores3068+ 126%
L2 Cache3 MB5.5 MB+ 83%
Memory Bandwidth336.0 GB/s616.0 GB/s+ 83%

TL;DR:

AMD really needs to perform flawlessly to compete (let alone win) against 3080. If they deliver anything less than ~110% of RT 5700 XT perfomance, they'll be competing with 3070 (and a probable Ti version) rather than RTX 3080.

All in all a very tall order. But I'll stay cautiously optimistic given what we know of PS5/Xbox perfomance so far.
 
Last edited:

Thala

Golden Member
Nov 12, 2014
1,355
653
136
AMD really needs to perform flawlessly to compete (let alone win) against 3080. If they deliver anything less than ~110% of RT 5700 XT perfomance, they'll be competing with 3070 (and a probable Ti version) rather than RTX 3080.

Does it really matter if they compete with the RTX3080? They could just price it closer to RTX3070 to get some sales.
 
  • Like
Reactions: kurosaki

Timorous

Golden Member
Oct 27, 2008
1,682
2,976
136
Why are you assuming AMD require the same bandwidth as Nvidia? To entirely different uArchs altogether.

Turing seems more bandwidth friendly than Navi. The computerbase.de 1.5Ghz tests between RDNA and Turing had them neck and neck with the same bandwidth. Increase the clocks and keep the bandwidth the same and Turing starts to pull away showing there is some sort of bottleneck in the RDNA architecture that is not scaling with clockspeed.

I doubt NV have regressed on this and while AMD may have made progress I doubt it is enough to exceed NV performance.
 

Timorous

Golden Member
Oct 27, 2008
1,682
2,976
136
80 CUs is almost certainly already given.

An exact doubling of Navi 10 is 502mm^2 at 41Mxtors / mm^2. If you increase this to be closer to Renoir density the die size required for the 20.6B xtors is 343mm^2. If there is a 450+ mm^2 Navi 2x die and transistor density is in the same ballpark as Renoir then it will have considerably more than 80CUs. Possibly upto 120CUs.
 

uzzi38

Platinum Member
Oct 16, 2019
2,673
6,229
146
Turing seems more bandwidth friendly than Navi. The computerbase.de 1.5Ghz tests between RDNA and Turing had them neck and neck with the same bandwidth. Increase the clocks and keep the bandwidth the same and Turing starts to pull away showing there is some sort of bottleneck in the RDNA architecture that is not scaling with clockspeed.

I doubt NV have regressed on this and while AMD may have made progress I doubt it is enough to exceed NV performance.
And your mistake is immediately assuming that the bottleneck is in memory bandwidth. A loss in performance on a relative scale can be down to many things, not just purely memory bandwidth.

Not everything scales well with clocks in a GPU uArch.
 

Gideon

Golden Member
Nov 27, 2007
1,678
3,805
136
Why are you assuming AMD require the same bandwidth as Nvidia? To entirely different uArchs altogether.
Agreed, though Historically Nvidia's memory compression seems to be slightly more efficent (eeking out more perfomance out of similar BW).

But never mind the comparison to Turing. All I'm really saying is that AMD will never get anywhere close to 100% performance uplift (even when doubling CUs and adding clocks) if they only improve bandwidth by say 50%. (that would be the case with 384 bit wide GDDR6 interface). It needs to be closer to 100% as well. We know enough from the Xbox presentation to know that AMD didn't radically change the cache structure nor add boatloads of L2 to compensate. Therefore my hunch that they need at least as much as 3080 has. Just my 2 cents.

An exact doubling of Navi 10 is 502mm^2 at 41Mxtors / mm^2. If you increase this to be closer to Renoir density the die size required for the 20.6B xtors is 343mm^2. If there is a 450+ mm^2 Navi 2x die and transistor density is in the same ballpark as Renoir then it will have considerably more than 80CUs. Possibly upto 120CUs.

I really hope something like that is the case. Yet there have been a lot of leaks claiming 80CUs
 
  • Like
Reactions: xpea

Veradun

Senior member
Jul 29, 2016
564
780
136
Seems so but with Ampere boasting nearly 60M Xtors/mm^2 and Renoir exceeding 60M Xtors/mm^2 it seems that if 500mm^2 is the die size for 'big navi' then 40WGPs is far too few unless the Samsung 10nm derived 8N process Nvidia are using can achieve higher densities than the TSMC N7? process AMD are using.

?

Isn't Ampere at around 42Mxtors/mm2?

I agree 40WGPs seems a bit low with 500mm2 anyway, my eye-o-meter "measured" around 50WGPs for that size. UNLESS they used a 512b bus.
 

Timorous

Golden Member
Oct 27, 2008
1,682
2,976
136
I really hope something like that is the case. Yet there have been a lot of leaks claiming 80CUs

There have been but there is no guarantee that is the top chip.

?

Isn't Ampere at around 42Mxtors/mm2?

42M xtors/mm^2 would make GA102 666mm^2. On an LTT video I watched it said it was 56M xtors/mm^2 so that would be closer to 500mm^2. May have been a typo on their end.

EDIT: In the Anandtech article they say it looks to be > 500mm^2 but NV have not released exact numbers yet and nobody has measured it.
 
  • Like
Reactions: Konan

Qwertilot

Golden Member
Nov 28, 2013
1,604
257
126
Does it really matter if they compete with the RTX3080? They could just price it closer to RTX3070 to get some sales.

You can't really rationally justify building a very large, expensive, low volume GPU just to sell it at something like cost to let you compete in the mid range.

Its worth being a bit kind to AMD here - objectively they're fighting a new architecture + die shrink with ~1 years refinement work. It'll be a considerable success even if they 'just' keep the gap fairly steady.

The extra money going in now will get them competitive over time, but it will likely take a little bit of time.
 

Timorous

Golden Member
Oct 27, 2008
1,682
2,976
136
And your mistake is immediately assuming that the bottleneck is in memory bandwidth. A loss in performance on a relative scale can be down to many things, not just purely memory bandwidth.

Not everything scales well with clocks in a GPU uArch.

You miss-read what I wrote. I did not state that bandwidth is the limiting factor for RDNA.