Question 'Ampere'/Next-gen gaming uarch speculation thread

Page 136 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Ottonomous

Senior member
May 15, 2014
559
292
136
How much is the Samsung 7nm EUV process expected to provide in terms of gains?
How will the RTX components be scaled/developed?
Any major architectural enhancements expected?
Will VRAM be bumped to 16/12/12 for the top three?
Will there be further fragmentation in the lineup? (Keeping turing at cheaper prices, while offering 'beefed up RTX' options at the top?)
Will the top card be capable of >4K60, at least 90?
Would Nvidia ever consider an HBM implementation in the gaming lineup?
Will Nvidia introduce new proprietary technologies again?

Sorry if imprudent/uncalled for, just interested in the forum member's thoughts.
 

Hitman928

Diamond Member
Apr 15, 2012
6,187
10,694
136
1,9x its measured at 60fps, at top performance the perf/watt is 37% higher.

How do you get the 37% higher perf/watt? Comparing the 3080 to the 2080 I see roughly 80% higher performance across RTX and non-RTX titles for a 48.8% increase in TDP. That would make it 1.8/1.488 = 1.21 or 21% higher perf/watt.

Edit: This is obviously based on TDP listed for each card and not measured game power use which we don't have at this time.
 
  • Like
Reactions: spursindonesia

blckgrffn

Diamond Member
May 1, 2003
9,299
3,440
136
www.teamjuchems.com
Unless you absolutely need an RTX 30 series card today, I think it's prudent to wait for AMD to drop their Big Navi line-up and hopefully Nvidia will drop a RTX 3080 or possibly 3080 Ti with 20 GB to counter. MSRP $899? I just can't imagine any serious PC enthusiast would be willing to settle for 10 GB of VRAM on the 3080 as it will make you regret your purchase within 6 months.

It will be interesting to see how far down the stack AMD pushes more than 8GB of ram. Are we going to see it at the $399 spot?

Whatever the performance numbers are, for the armchair enthusiasts a $400 AMD card having 16GB of ram vs everything less than $1,500 on the nvidia front having "last gen" ram quantities is going to be some poor optics.

How many 1080ti owners are going to see that buffer downgrade and take a pause on the 3080? I would, but I already sold my 1080 ti.

Can't wait for the DLSS vs Frame Buffer debates to ensue for people to hypothesize how soon each purchase will be regrettable 😂

I am certain people equate bigger is better with vram out there and buy accordingly. How many large vram cards with weak GPUs have been built for suckers in the past? I am sure we have all seen them.
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,361
136
How do you get the 37% higher perf/watt? Comparing the 3080 to the 2080 I see roughly 80% higher performance across RTX and non-RTX titles for a 48.8% increase in TDP. That would make it 1.8/1.488 = 1.21 or 21% higher perf/watt.

Edit: This is obviously based on TDP listed for each card and not measured game power use which we don't have at this time.

RTX3080 110fps / 320W

RTX2080 60fps / 240W

20200901171847.jpg
 

Saylick

Diamond Member
Sep 10, 2012
3,532
7,858
136
It will be interesting to see how far down the stack AMD pushes more than 8GB of ram. Are we going to see it at the $399 spot?

Whatever the performance numbers are, for the armchair enthusiasts a $400 AMD card having 16GB of ram vs everything less than $1,500 on the nvidia front having "last gen" ram quantities is going to be some poor optics.

How many 1080ti owners are going to see that buffer downgrade and take a pause on the 3080? I would, but I already sold my 1080 ti.

Can't wait for the DLSS vs Frame Buffer debates to ensue for people to hypothesize how soon each purchase will be regrettable 😂

I am certain people equate bigger is better with vram out there and buy accordingly. How many large vram cards with weak GPUs have been built for suckers in the past? I am sure we have all seen them.
Given how next-gen consoles will have 10 GB+ of VRAM with 10+ TFLOPS of compute, I would hope that any discrete GPU with at least 10 TFLOPS of compute has at least that much VRAM, if not more. I personally think 16 GB is the sweet-spot for high-end discrete GPUs targeting 4K resolution. 8 GB is too little and 24 GB seems excessive.
 
  • Like
Reactions: blckgrffn

Hitman928

Diamond Member
Apr 15, 2012
6,187
10,694
136
RTX3080 110fps / 320W

RTX2080 60fps / 240W

20200901171847.jpg

First, that's a single RTX game. Second, the RTX2080 has a TDP of 215W, so you're not comparing 3080 to 2080 there, more likely the 2080 Super at 250W. Third, that does not look like 110 fps to me, more like 105 fps for Ampere. Lastly, If you look at Digital Foundry's video across multiple games comparing 2080 to 3080, it tells a different story.

Edit: even if we are very generous to the 3080 and say it is 90% faster than the 2080 which I doubt is the case across an average of many games, you're looking at 27.65% higher performance/watt.
 

GoodRevrnd

Diamond Member
Dec 27, 2001
6,801
581
126
Are Tensor cores used for anything other than DLSS? And if not, doesn't DLSS barely tax them? What is the point, just design consistency from the A series?

Is 10GB really that insufficient for these cards?
 

Asterox

Golden Member
May 15, 2012
1,039
1,823
136
How do you get the 37% higher perf/watt? Comparing the 3080 to the 2080 I see roughly 80% higher performance across RTX and non-RTX titles for a 48.8% increase in TDP. That would make it 1.8/1.488 = 1.21 or 21% higher perf/watt.

Edit: This is obviously based on TDP listed for each card and not measured game power use which we don't have at this time.

Or simple, do we need to calculate Ampere IPC increase?

- RTX 2080 TI, 4352 Cuda cores, 1500mhz

- RTX 3080, 8704 Cuda Cores, 1700mhz
 

Konan

Senior member
Jul 28, 2017
360
291
106
Given how next-gen consoles will have 10 GB+ of VRAM with 10+ TFLOPS of compute, I would hope that any discrete GPU with at least 10 TFLOPS of compute has at least that much VRAM, if not more. I personally think 16 GB is the sweet-spot for high-end discrete GPUs targeting 4K resolution. 8 GB is too little and 24 GB seems excessive.

G6X memory > GDDR6 though??
 

Hitman928

Diamond Member
Apr 15, 2012
6,187
10,694
136
Aren't those potentially apples/oranges cuda cores though?

Yes. If you actually want to do that comparison that way then Ampere is a severe reduction in "IPC".

3080/2080 performance = 1.8
3080/2080 cuda cores * freq = 3.32
3080/2080 "IPC" = 0.54 or in other words a 46% reduction in IPC for Ampere.

Clearly this is not the proper comparison.
 
  • Love
Reactions: spursindonesia

Hitman928

Diamond Member
Apr 15, 2012
6,187
10,694
136
As one of the threads mentioned, it's possible that the integer core that was part of the SM can now do both int and fp32.

eg: here's the Turing block diagram:

GeForce_EditorsDay_Aug2018_Updated090318_1536034900-compressed-010.png

Turing could already do both int & fp32. The theory is that now Ampere can do int & fp32 OR 2xfp32. This won't be 2x performance in games though as you aren't doubling the rest of the pipeline to feed 2xfp32 and games don't do just fp32 calculations, they need a significant amount of int calculations as well.
 
  • Like
Reactions: spursindonesia

Saylick

Diamond Member
Sep 10, 2012
3,532
7,858
136
Are Tensor cores used for anything other than DLSS? And if not, doesn't DLSS barely tax them? What is the point, just design consistency from the A series?

Is 10GB really that insufficient for these cards?
Tensor cores are used for denoising for the ray-tracing and also for AI-based upscaling. In other words, you can run RTX at native 4k without upscaling, as an example, and it will use the tensor cores, or you can just run DLSS without ray-tracing and it will also use the tensor cores, and a game with RTX and DLSS will both tax them.
 
  • Like
Reactions: sxr7171

MrTeal

Diamond Member
Dec 7, 2003
3,614
1,816
136
As one of the threads mentioned, it's possible that the integer core that was part of the SM can now do both int and fp32.

eg: here's the Turing block diagram:
Nvidia itself is quoting the 36TFlop FP32 number, so that would seem to be the case. I'm actually quite surprised, but that is a lot of compute horsepower there if there's no gotchas. 29.8TF FP32 for $700 is a crazy value compared to the 13.4TF @ $1200 for the 2080 Ti or 13.8TF @ $700 for the Radeon VII.
 

Glo.

Diamond Member
Apr 25, 2015
5,803
4,777
136
Nvidia itself is quoting the 36TFlop FP32 number, so that would seem to be the case. I'm actually quite surprised, but that is a lot of compute horsepower there if there's no gotchas. 29.8TF FP32 for $700 is a crazy value compared to the 13.4TF @ $1200 for the 2080 Ti or 13.8TF @ $700 for the Radeon VII.
They quote SHADER Flops.

Not FP32 ;). Its similar to what they claimed that FP16 performs "the same", enhanced by all that GEMM stuff, in their A100 chip to native FP32.

In reality native FP32 will be exactly that, native FP32.
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,361
136
First, that's a single RTX game. Second, the RTX2080 has a TDP of 215W, so you're not comparing 3080 to 2080 there, more likely the 2080 Super at 250W. Third, that does not look like 110 fps to me, more like 105 fps for Ampere. Lastly, If you look at Digital Foundry's video across multiple games comparing 2080 to 3080, it tells a different story.

Edit: even if we are very generous to the 3080 and say it is 90% faster than the 2080 which I doubt is the case across an average of many games, you're looking at 27.65% higher performance/watt.

Its the slide they mentioned 1.9x perf/watt and according to official NVIDIA slide,

Turing 60fps / 240w = 0.25
vs
Ampere 105fps / 320W = 0.33

This is 32% higher perf/watt , again this is according to official NVIDIA slide.
 

MrTeal

Diamond Member
Dec 7, 2003
3,614
1,816
136
They quote SHADER Flops.

Not FP32 ;). Its similar to what they claimed that FP16 performs "the same", enhanced by all that GEMM stuff, in their A100 chip to native FP32.

In reality native FP32 will be exactly that, native FP32.
Maybe? The nvidia website itself says 2x FP32 in the SM, though they don't directly list the TF number there.
1598990816978.png

Edit: You're correct though and I was wrong. They don't directly say 36TFlops, they just say 10496/8704 CUDA cores and 2x FP32 throughput
 

Hitman928

Diamond Member
Apr 15, 2012
6,187
10,694
136
Its the slide they mentioned 1.9x perf/watt and according to official NVIDIA slide,

Turing 60fps / 240w = 0.25
vs
Ampere 105fps / 320W = 0.33

This is 32% higher perf/watt , again this is according to official NVIDIA slide.

Again, that is in 1 game which is currently the most RTX intense game available, look at the fine print. It is also comparing a 3080 versus an essentially overclocked 2080 so that will also tilt the scale in favor of the 3080 in terms of perf/w.

If you watch the Digital Foundry video you get more like 20-25% perf/w improvement compared to Nvidia's media slide.
 

Saylick

Diamond Member
Sep 10, 2012
3,532
7,858
136
Nvidia itself is quoting the 36TFlop FP32 number, so that would seem to be the case. I'm actually quite surprised, but that is a lot of compute horsepower there if there's no gotchas. 29.8TF FP32 for $700 is a crazy value compared to the 13.4TF @ $1200 for the 2080 Ti or 13.8TF @ $700 for the Radeon VII.
They quote SHADER Flops.

Not FP32 ;). Its similar to what they claimed that FP16 performs "the same", enhanced by all that GEMM stuff, in their A100 chip to native FP32.

In reality native FP32 will be exactly that, native FP32.
Yeah, I agree with Glo here. They used the term "Shader-FLOPS" not "FP32 FLOPS". The INT cores actually do single-precision math (i.e. 32-bit) but just not floating point specifically. Going off of the SM diagram for A100, they don't list the INT cores as being capable of doing FP math either so my guess is either Nvidia tuned Ampere for graphics so that the the pipelines for an SM are (2) x 16-wide FP or 16-wide INT + 16-wide FP, or they are just listing Shader-FLOPS as a catch all term for all the concurrent single-precision math the entire GPU can do, INT and FP included. My money is on the latter.