Question Speculation: RDNA2 + CDNA Architectures thread

uzzi38 · Apr 28, 2020

All die sizes are within 5mm^2. The poster here has been right on some things in the past afaik, and to his credit was the first to saying 505mm^2 for Navi21, which other people have backed up. Even still though, take the following with a pich of salt.

Navi21 - 505mm^2

Navi22 - 340mm^2

Navi23 - 240mm^2

Source is the following post: https://www.ptt.cc/bbs/PC_Shopping/M.1588075782.A.C1E.html

Grooveriding · Sep 1, 2020

Where is the claim coming from that 3080 is 1.9x perf of 2080? Normally I would assume that is some cherry picked far outlier, but, the RTX cards were so awful in performance improvement, that it seems possible to get maybe 1.6 or 1.7 real world improvement.

2080 was literally a 1080ti with better power consumption, and really low value rtx feature, and 2080ti was a 30% or so improvement. So it seems possible the 3080 could really be a lot faster than the 2080, because really it's 2 generations away from the 1080ti, so you'd expect about 70-80% more performance.

Unlike 1080ti owners, 2080ti and 2080 owners looking to sell on the used market are going to get absolutely dumpstered in loss of value. I figure 2080tis will be $400-$500 used and 2080s will be $250 or so? Awful depreciation lol

Saylick · Sep 1, 2020

Grooveriding said:
Where is the claim coming from that 3080 is 1.9x perf of 2080? Normally I would assume that is some cherry picked far outlier, but, the RTX cards were so awful in performance improvement, that it seems possible to get maybe 1.6 or 1.7 real world improvement.

2080 was literally a 1080ti with better power consumption, and really low value rtx feature, and 2080ti was a 30% or so improvement. So it seems possible the 3080 could really be a lot faster than the 2080, because really it's 2 generations away from the 1080ti, so you'd expect about 70-80% more performance.

Unlike 1080ti owners, 2080ti and 2080 owners looking to sell on the used market are going to get absolutely dumpstered in loss of value. I figure 2080tis will be $400-$500 used and 2080s will be $250 or so? Awful depreciation lol

From this plot specifically:

https://www.anandtech.com/show/1605...re-for-gaming-starting-with-rtx-3080-rtx-3090

The immediate oddity here is that power efficiency is normally measured at a fixed level of power consumption, not a fixed level of performance. With power consumption of a transistor increasing at roughly the cube of the voltage, a “wider” part like Ampere with more functional blocks can clock itself at a much lower frequency to hit the same overall performance as Turing. In essence, this graph is comparing Turing at its worst to Ampere at its best, asking “what would it be like if we downclocked Ampere to be as slow as Turing” rather than “how much faster is Ampere than Turing under the same constraints”. In other words, NVIDIA’s graph is not presenting us with an apples-to-apples performance comparison at a specific power draw.

If you actually make a fixed wattage comparison, then Ampere doesn’t look quite as good in NVIDIA’s graph. Whereas Turing hits 60fps at 240W in this example, Ampere’s performance curve has it at roughly 90fps. Which to be sure, this is still a sizable improvement, but it’s only a 50% improvement in performance-per-watt. Ultimately the exact improvement in power efficiency is going to depend on where in the graph you sample, but it’s clear that NVIDIA’s power efficiency improvements with Ampere, as defined by more normal metrics, are not going to be 90% as NVIDIA’s slide claims.

All of which is reflected in the TDP ratings of the new RTX 30 series cards. The RTX 3090 draws a whopping 350 watts of power, and even the RTX 3080 pulls 320W. If we take NVIDIA’s performance claims at their word – that RTX 3080 offers up to 100% more performance than RTX 2080 – then that comes with a 49% hike in power consumption, for an effective increase in performance-per-watt of just 34%. And the comparison for the RTX 3090 is even harsher, with NVIDIA claiming a 50% performance increase for a 25% increase in power consumption, for a net power efficiency gain of just 20%.

Hitman928 · Sep 1, 2020

Grooveriding said:
Where is the claim coming from that 3080 is 1.9x perf of 2080? Normally I would assume that is some cherry picked far outlier, but, the RTX cards were so awful in performance improvement, that it seems possible to get maybe 1.6 or 1.7 real world improvement.

2080 was literally a 1080ti with power consumption, and 2080ti was a 30% or so improvement. So it seems possible the 3080 could really be a lot faster than the 2080, because really it's 2 generations away from the 1080ti, so you'd expect about 70-80% more performance.

Unlike 1080ti owners, 2080ti and 2080 owners looking to sell on the used market are going to get absolutely dumpstered in loss of value. I figure 2080tis will be $400-$500 used and 2080s will be $250 or so? Awful depreciation lol

It comes from an Nvidia slide where they downclocked a 3080 to match a 2080 (Super most likely) in performance in Control with RTX on and then compared the power use. It's a fairly worthless comparison. If you look at the slide at compare perf/w at actual operating power use for both cards, then it is ~34% improvement in perf/w. But again, that's in 1 game with I'm sure carefully chosen card models and settings.

If you watch the Digital Foundry video and run the numbers from that, it's more like 20 - 25% perf/w improvement on average. We'll have to wait for the full reviews to know for sure though obviously.

Grooveriding · Sep 1, 2020

ah, thank you for the reply and I got the wrong thread somehow too lol

Saylick · Sep 1, 2020

exquisitechar said:
It's Navi 2x that will have the OC headroom, honestly. PS5 hits 2.2GHz+ clocks and Renoir Vega can OC to 2.4-2.5GHz. Ampere is 320W-350W stock, lol.

Yeah, I suspect that RDNA 2 can OC to well over 2200 MHz just going off of what we've seen on the PS5. We'll have to wait until reviews come out for the 3080 and 3090 to know what the average OC will be on Ampere in order to do a comparison. If Ampere routinely can OC to 2100 MHz then it will be hard for Big Navi to catch the 3090. If Ampere tops out at 2000 MHz and RDNA 2 can do 2200 MHz, I can see a scenario where an 80 CU RDNA2 part slots somewhere between a 3080 and 3090.

Panino Manino · Sep 1, 2020

Grooveriding said:
Where is the claim coming from that 3080 is 1.9x perf of 2080?

Reviewers can't give us numbers in FPS, so Digital Foundry gave a preview with percentages.
It's 1.7x minimum, with average more like 1.8x. Various cases are 1.9x and with RT its 2.0x. This with pre-launch drivers.

DDH · Sep 1, 2020

Saylick said:
From this plot specifically:

https://www.anandtech.com/show/1605...re-for-gaming-starting-with-rtx-3080-rtx-3090

Wow, didn't see that part from anandtech.

Can't wait for their review. They are my go to, no bullshit, facts driven deep dive reviewers.

The digital foundry vid is interesting. Clearly the 3080 is a decent bit faster than the 2080, how much exactly is yet to be seen. Bring on the reviews!

AMD really need to get some info out about their offerings

DDH · Sep 1, 2020

Panino Manino said:
Reviewers can't give us numbers in FPS, so Digital Foundry gave a preview with percentages.
It's 1.7x minimum, with average more like 1.8x. Various cases are 1.9x and with RT its 2.0x. This with pre-launch drivers.

If it only changes from 1.9 to 2.0 with RT off vs RT on, that says the RT performance change between 3080 and 2080 is 10%. Straight away that tells you something funky is going on with the numbers and I wouldn't be hedging bets with them

RT performance improvement should be closer to 40%

lobz · Sep 1, 2020

Saylick said:
From this plot specifically:

https://www.anandtech.com/show/1605...re-for-gaming-starting-with-rtx-3080-rtx-3090

Yeah, this is comparing the 3080 at the best of its efficiency curve to (probably) the 2080 at the very end, worst part of its efficiency curve.

100% senseless comparison.

Veradun · Sep 2, 2020

So this is the state of the rumors?

3 SKUs, all air cooled:
40WGPs
36WGPs
36WGPs

I think a 28WGPs can pop up later to fill the obvious gap with the tier below (Navi23? I'm confused tbh) that I believe will be no more than 24WGPs.

Let's see, exciting times ahead.

exquisitechar · Sep 2, 2020

Veradun said:
So this is the state of the rumors?

3 SKUs, all air cooled:
40WGPs
36WGPs
36WGPs

I think a 28WGPs can pop up later to fill the obvious gap with the tier below (Navi23? I'm confused tbh) that I believe will be no more than 24WGPs.

Let's see, exciting times ahead.

Navi22 will fill that segment. Navi23 is like Navi10, 40CUs.

Timorous · Sep 2, 2020

Veradun said:
So this is the state of the rumors?

3 SKUs, all air cooled:
40WGPs
36WGPs
36WGPs

I think a 28WGPs can pop up later to fill the obvious gap with the tier below (Navi23? I'm confused tbh) that I believe will be no more than 24WGPs.

Let's see, exciting times ahead.

Seems so but with Ampere boasting nearly 60M Xtors/mm^2 and Renoir exceeding 60M Xtors/mm^2 it seems that if 500mm^2 is the die size for 'big navi' then 40WGPs is far too few unless the Samsung 10nm derived 8N process Nvidia are using can achieve higher densities than the TSMC N7? process AMD are using.

Gideon · Sep 2, 2020

After Amperes reveal we know more about what Nvidia has to offer so it's time to reevaluate what AMD needs to deliver. IMO AMD could still be neck-to-neck with RTX 3080, but they have to bring out all of the big guns and really execute well on top of that.

To Recap:

RTX 2080 is 123% of the RX 5700 XT perfomance at 4K.
RTX 3080 is 70-80% faster than that (look at the table. 80% is probably being best case, but 70% looks quite normal)
That ends up at roughly 210-220% of RX 5700 XT perfomance.

In order for the Big Navi to compete it needs to:

Have the rumored 80 CUs (twice of RX 5700 XT)
Use HBM2E or at least a 448-bit wide GDDR6 memory interface
Have actual in-game boost clocks in the neighbourhood of 2.0 GHz
And hopefully some extra tweaks to improve IPC slightly on top of all that

80 CUs is almost certainly already given.

Second point is needed to get the required bandwidth to compete. HBM2E is the preffered choice and hopefully what it will have. If it's GDDR6, then 448bit wide interace is the bare minimum that would give comparable bandwith to RTX 3080 (784 GB/s vs 760.0 GB/s) at standard clocks. Anything less won't really cut it. It doesn't matter how good the CUs are, it won't compete as it will be memory bandwidth bottlnecked (see how little overclocking helps the 5700 XT).

Third point is required because in reality doubling of resources almost never doubles the actual performance. They need additional clock-speed to offset that. See the 2080 Ti vs 2060 perfomance table below as an example. 2080 Ti is pretty close to doubling of resources of RTX 2060 yet it only nets 1.87x of the perfomance. In fact it's not closer to 2.15x the resources (when taking into account the 9% clock speed decline) with only memory bandwidth being 1.83x that of RTX 2060

Forth point proably isn't needed to compete with RTX 3080 per se, but it is definitely needed to outperform it.

	RTX 2060	RTX 2080 Ti	Change in %
Relative perfomance @ 4K (compared to RTX 2060)	100%	187%	+ 87%
Boost Clock	1680 Mhz	1545 MHz	- 9%
Shading Units	1920	4352	+ 126%
TMUs	120	272	+ 126%
ROPs	48	88	+ 126%
SM Count	30	68	+ 126%
Tensor Cores	240	544	+ 126%
RT Cores	30	68	+ 126%
L2 Cache	3 MB	5.5 MB	+ 83%
Memory Bandwidth	336.0 GB/s	616.0 GB/s	+ 83%

TL;DR:

AMD really needs to perform flawlessly to compete (let alone win) against 3080. If they deliver anything less than ~110% of RT 5700 XT perfomance, they'll be competing with 3070 (and a probable Ti version) rather than RTX 3080.

All in all a very tall order. But I'll stay cautiously optimistic given what we know of PS5/Xbox perfomance so far.

Thala · Sep 2, 2020

Gideon said:
AMD really needs to perform flawlessly to compete (let alone win) against 3080. If they deliver anything less than ~110% of RT 5700 XT perfomance, they'll be competing with 3070 (and a probable Ti version) rather than RTX 3080.

Does it really matter if they compete with the RTX3080? They could just price it closer to RTX3070 to get some sales.

uzzi38 · Sep 2, 2020

Why are you assuming AMD require the same bandwidth as Nvidia? To entirely different uArchs altogether.

itsmydamnation · Sep 2, 2020

@Gideon what about at something like 1440p? as the 2060 has a relative rop/bandwdith advantage to the 2080ti which would be most visible @ 4k when measuring relative performance.

Timorous · Sep 2, 2020

uzzi38 said:
Why are you assuming AMD require the same bandwidth as Nvidia? To entirely different uArchs altogether.

Turing seems more bandwidth friendly than Navi. The computerbase.de 1.5Ghz tests between RDNA and Turing had them neck and neck with the same bandwidth. Increase the clocks and keep the bandwidth the same and Turing starts to pull away showing there is some sort of bottleneck in the RDNA architecture that is not scaling with clockspeed.

I doubt NV have regressed on this and while AMD may have made progress I doubt it is enough to exceed NV performance.

Timorous · Sep 2, 2020

Gideon said:
80 CUs is almost certainly already given.

An exact doubling of Navi 10 is 502mm^2 at 41Mxtors / mm^2. If you increase this to be closer to Renoir density the die size required for the 20.6B xtors is 343mm^2. If there is a 450+ mm^2 Navi 2x die and transistor density is in the same ballpark as Renoir then it will have considerably more than 80CUs. Possibly upto 120CUs.

uzzi38 · Sep 2, 2020

Timorous said:
Turing seems more bandwidth friendly than Navi. The computerbase.de 1.5Ghz tests between RDNA and Turing had them neck and neck with the same bandwidth. Increase the clocks and keep the bandwidth the same and Turing starts to pull away showing there is some sort of bottleneck in the RDNA architecture that is not scaling with clockspeed.

I doubt NV have regressed on this and while AMD may have made progress I doubt it is enough to exceed NV performance.

And your mistake is immediately assuming that the bottleneck is in memory bandwidth. A loss in performance on a relative scale can be down to many things, not just purely memory bandwidth.

Not everything scales well with clocks in a GPU uArch.

Gideon · Sep 2, 2020

uzzi38 said:
Why are you assuming AMD require the same bandwidth as Nvidia? To entirely different uArchs altogether.

Agreed, though Historically Nvidia's memory compression seems to be slightly more efficent (eeking out more perfomance out of similar BW).

But never mind the comparison to Turing. All I'm really saying is that AMD will never get anywhere close to 100% performance uplift (even when doubling CUs and adding clocks) if they only improve bandwidth by say 50%. (that would be the case with 384 bit wide GDDR6 interface). It needs to be closer to 100% as well. We know enough from the Xbox presentation to know that AMD didn't radically change the cache structure nor add boatloads of L2 to compensate. Therefore my hunch that they need at least as much as 3080 has. Just my 2 cents.

Timorous said:
An exact doubling of Navi 10 is 502mm^2 at 41Mxtors / mm^2. If you increase this to be closer to Renoir density the die size required for the 20.6B xtors is 343mm^2. If there is a 450+ mm^2 Navi 2x die and transistor density is in the same ballpark as Renoir then it will have considerably more than 80CUs. Possibly upto 120CUs.

I really hope something like that is the case. Yet there have been a lot of leaks claiming 80CUs

Veradun · Sep 2, 2020

Timorous said:
Seems so but with Ampere boasting nearly 60M Xtors/mm^2 and Renoir exceeding 60M Xtors/mm^2 it seems that if 500mm^2 is the die size for 'big navi' then 40WGPs is far too few unless the Samsung 10nm derived 8N process Nvidia are using can achieve higher densities than the TSMC N7? process AMD are using.

?

Isn't Ampere at around 42Mxtors/mm2?

I agree 40WGPs seems a bit low with 500mm2 anyway, my eye-o-meter "measured" around 50WGPs for that size. UNLESS they used a 512b bus.

Timorous · Sep 2, 2020

Gideon said:
I really hope something like that is the case. Yet there have been a lot of leaks claiming 80CUs

There have been but there is no guarantee that is the top chip.

Veradun said:
?

Isn't Ampere at around 42Mxtors/mm2?

42M xtors/mm^2 would make GA102 666mm^2. On an LTT video I watched it said it was 56M xtors/mm^2 so that would be closer to 500mm^2. May have been a typo on their end.

EDIT: In the Anandtech article they say it looks to be > 500mm^2 but NV have not released exact numbers yet and nobody has measured it.

Qwertilot · Sep 2, 2020

Thala said:
Does it really matter if they compete with the RTX3080? They could just price it closer to RTX3070 to get some sales.

You can't really rationally justify building a very large, expensive, low volume GPU just to sell it at something like cost to let you compete in the mid range.

Its worth being a bit kind to AMD here - objectively they're fighting a new architecture + die shrink with ~1 years refinement work. It'll be a considerable success even if they 'just' keep the gap fairly steady.

The extra money going in now will get them competitive over time, but it will likely take a little bit of time.

Timorous · Sep 2, 2020

uzzi38 said:
And your mistake is immediately assuming that the bottleneck is in memory bandwidth. A loss in performance on a relative scale can be down to many things, not just purely memory bandwidth.

Not everything scales well with clocks in a GPU uArch.

You miss-read what I wrote. I did not state that bandwidth is the limiting factor for RDNA.

moinmoin · Sep 2, 2020

Karnak said:
https://twitter.com/x/status/1300842481886662658

He retweeted this response (making it head his own timeline):

https://twitter.com/x/status/1300844535078027264

RTG folks sure seem to enjoy this round of rumors so far.

Question Speculation: RDNA2 + CDNA Architectures thread

Platinum Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Golden Member

Member

Member

Platinum Member

Senior member

Senior member

Golden Member

Platinum Member

Golden Member

Platinum Member

Diamond Member

Golden Member

Golden Member

Platinum Member

Platinum Member

Senior member

Golden Member

Golden Member

Golden Member

Diamond Member