• Guest, The rules for the P & N subforum have been updated to prohibit "ad hominem" or personal attacks against other posters. See the full details in the post "Politics and News Rules & Guidelines."

Question 'Ampere'/Next-gen gaming uarch speculation thread

Page 100 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

AtenRa

Lifer
Feb 2, 2009
13,581
2,639
126
But come on, a new node, a new architecture, and a new crazy high TDP - only for 3090 to be 50% faster than the 2080 TI? I don't buy that.

If the RTX 3090 Ampere is a conservative 30% more efficient than RTX 2080 TI Turing, that alone will translates into a 65% performance uplift over Turing based on the rumored TDP. So then 1.5 x 1.65 = 2.475, which is nearly 150% faster than the 5700 XT. But this is just at rasterization. Nvidia may (or may not) also hold a very commanding lead in RT capabilities.
I strongly believe Ampere architecture is more optimized for RT and ML than Raster performance,.
The new Tensor Cores on Ampere with native INT8/INT4, perhaps will need 2x the transistor count vs Turning Tensor Cores. So if they need 2x the transistor count plus they will increase the Tensor Core count vs Turning, then Tensor Cores will occupy a larger area percentage in Ampere vs Turing. Add extra RT cores, add extra bandwidth lanes for communication and a few other things and a lot of the die size goes to RT and ML performance increases and not for Raster.

For those reasons I believe that RTX3090 will need to increase clocks way above what they originally aiming for in order to reach the +60% over the RTX2080Ti. And that is why we have the TDP increase over Turning RTX2080Ti.
 
  • Like
Reactions: coercitiv

Ajay

Diamond Member
Jan 8, 2001
8,896
3,591
136
I strongly believe Ampere architecture is more optimized for RT and ML than Raster performance,.
The new Tensor Cores on Ampere with native INT8/INT4, perhaps will need 2x the transistor count vs Turning Tensor Cores. So if they need 2x the transistor count plus they will increase the Tensor Core count vs Turning, then Tensor Cores will occupy a larger area percentage in Ampere vs Turing. Add extra RT cores, add extra bandwidth lanes for communication and a few other things and a lot of the die size goes to RT and ML performance increases and not for Raster.
Are the registers in Turing's tensor cores 16b? Is Ampere going to 32b registers?
 

AtenRa

Lifer
Feb 2, 2009
13,581
2,639
126
Are the registers in Turing's tensor cores 16b? Is Ampere going to 32b registers?
I dont know about the registers but,

You need huge transistor allocation to have half the Tensor Cores per SM (4 Tensors per SM in Ampere vs 8 Tensors in Turing) and at the same time have 2x more FP16/FP32 throughput.
That means each Tensor core in Ampere has 4 times the performance of each Tensor Core in Turing.

From NVIDIA Ampere architecture

The A100 SM diagram is shown in Figure 7. Volta and Turing have eight Tensor Cores per SM, with each Tensor Core performing 64 FP16/FP32 mixed-precision fused multiply-add (FMA) operations per clock. The A100 SM includes new third-generation Tensor Cores that each perform 256 FP16/FP32 FMA operations per clock. A100 has four Tensor Cores per SM, which together deliver 1024 dense FP16/FP32 FMA operations per clock, a 2x increase in computation horsepower per SM compared to Volta and Turing.
 

Krynj

Platinum Member
Jun 21, 2006
2,816
6
81
As somebody that is currently waiting to pull the trigger on something in the $400/$500 range for my next build, should I expect any product announcement in that price range? Or will we just see price cuts to the current 20xx cards? The GPU is basically the last component I need before I pull the trigger on my new parts, so I was just looking for a bit of insight on what to expect. I haven't bought an Nvidia card in about 17 years, or built a system in about 8, so I'm a bit out of the loop to say the least.
 

Konan

Senior member
Jul 28, 2017
360
291
106
As somebody that is currently waiting to pull the trigger on something in the $400/$500 range for my next build, should I expect any product announcement in that price range? Or will we just see price cuts to the current 20xx cards? The GPU is basically the last component I need before I pull the trigger on my new parts, so I was just looking for a bit of insight on what to expect. I haven't bought an Nvidia card in about 17 years, so I'm not too familiar with their product launches.
This time next week we have a strong chance of learning official pricing and some product info from Nvidia in their event on Sept. 1.
Could be a RTX 3070/3060 in that price range
 
  • Like
Reactions: Krynj

Stuka87

Diamond Member
Dec 10, 2010
5,434
1,224
136
The TW on the chip indicates Taiwan so TSMC? or maybe diffused in Korea and made in Taiwan ??
'S TW' is for Samsung. Not TSMC.

EDIT: Correction, that's just where the final assembly took place. Its in no relation to where the chip was diffused. nVidia often has that on chips, but also sometimes has 'B KOREA' on the chip.
 
Last edited:
  • Like
Reactions: Konan

Glo.

Diamond Member
Apr 25, 2015
4,807
3,425
136
I think it still will be closer to 40-45% rasterization performance uplift over previous generation.
 

CakeMonster

Golden Member
Nov 22, 2012
1,026
96
91
35% for the steps listed there is very disappointing for a 2 year wait. Especially considering the price speculations.

I don't really believe it, but my qualifications are close to zero.
 

Glo.

Diamond Member
Apr 25, 2015
4,807
3,425
136
The thing that baffles me is...

No way Jenhsen will let it be this slow. Not compared to Navi 2. Nvidia Engineers know, (well they know 90-95% at this point...) how Navi 2 will behave. They can clock it to hell.

Why only 35%?

I don't buy it. It will be faster. Around 40-45% in 1080p, and faster in 4K.
 

Ajay

Diamond Member
Jan 8, 2001
8,896
3,591
136
I dont know about the registers but,

You need huge transistor allocation to have half the Tensor Cores per SM (4 Tensors per SM in Ampere vs 8 Tensors in Turing) and at the same time have 2x more FP16/FP32 throughput.
That means each Tensor core in Ampere has 4 times the performance of each Tensor Core in Turing.

From NVIDIA Ampere architecture
Thanks! Looking over that documentation; I think it’s pretty clear that for GA102, the total cuda core count will go down as will FP64 FMAs and number of tensor cores. I don’t think consumer TPUs will need fp64 either (unless A102 is also for engineering workstations as well). So I think some reasonable cuts can be made, especially if NV is aiming for higher clocks. Hopefully we’ll see actual products soon and know for sure.
 

ASK THE COMMUNITY