Question 'Ampere'/Next-gen gaming uarch speculation thread

Page 157 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Ottonomous

Senior member
May 15, 2014
559
292
136
How much is the Samsung 7nm EUV process expected to provide in terms of gains?
How will the RTX components be scaled/developed?
Any major architectural enhancements expected?
Will VRAM be bumped to 16/12/12 for the top three?
Will there be further fragmentation in the lineup? (Keeping turing at cheaper prices, while offering 'beefed up RTX' options at the top?)
Will the top card be capable of >4K60, at least 90?
Would Nvidia ever consider an HBM implementation in the gaming lineup?
Will Nvidia introduce new proprietary technologies again?

Sorry if imprudent/uncalled for, just interested in the forum member's thoughts.
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
Pascal had 128 FP32 cores per SM, and was unable to issue any of them at same clock as Int ops. During Turing Gen, NV revealed that sampled workloads have average mix of ~100 fp32 + 36 int ops.
So it was logical that Turing moved to SM with 64 Cuda cores, that have 64 FP32 and 64 int cores and is able to issue execution on both. This was done to avoid idling FP32 cores when Int ops are being issued.

With Amper gaming arch, there is additional block of FP32 resources, and SM now has a total 128 Cuda cores, 128 FP32 and 64 int units. So it can execute each clock either 128 FP32 or 64 FP32 + 64 int ops.

Looks like a great fit for that mix revealed during Turing gen? Except the fact that it can't do 128 fp32 + 64 int ops per clock and when there is a mix involved, "peak" throughput per clock is the same as Turing.

It obviuosly is a very good tradeoff as for 100 + 36 average mix fp32 resource utilization will still be very good, but to achieve peak throughput on gaming Ampere, the less int ops in the mix, the better.
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
Do you have source for this info? i am not doubting you but would like to read

It is from
They even go on to explain that perf gain will obviuosly differ depending on instruction mix.

And there are way more factors like register file size and ports, L1/L2 bw and sizes that matter in utilization of Cuda core resources. In fact 3080 even regressed in L2 cache versus 2080ti. It is as if NV ran out of die area and power budget to make it proper 128 fp + 128 int SM monster.
 
  • Like
Reactions: Mopetar

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
No, nVidia has warps with 32 threads. They would have to double everything within a SM. So with Ampere they doubled the FP32 throughput with relativ few transistors.
 

lixlax

Member
Nov 6, 2014
183
150
116
Is Nvidia going to get sued for selling "fake" (CUDA) cores like AMD was for Bulldozer marketing? As I understand there is still only half of the amount of actual CUDA cores compared to what is marketed, just each core can do up to 2 operations per cycle!??
 

itsmydamnation

Platinum Member
Feb 6, 2011
2,773
3,150
136
Is Nvidia going to get sued for selling "fake" (CUDA) cores like AMD was for Bulldozer marketing? As I understand there is still only half of the amount of actual CUDA cores compared to what is marketed, just each core can do up to 2 operations per cycle!??
No,
Understand that there is a difference between
Data paths
Execution units
Instruction dispatch /retire

Ampere has all the execution units claimed 128 fp and 64 int but not the other resources to sustain them every cycle.
 

Bouowmx

Golden Member
Nov 13, 2016
1,138
550
146
Curious, why do you say that? Do you think they plan on getting Hopper out the door by late 2021?
I'm very unsure about Hopper GeForce, but Hopper A100 successor is a possibility.
Coming 2021, Intel Xe-HP with MCM: 4x4096 cores at 1.3 GHz for ~42 TFLOPS. NVIDIA A100 has 6912 cores at 1.4 GHz for 19 TFLOPS, or if all 8192 cores are enabled, 23 TFLOPS. I assume NVIDIA wants to get ahead of this with its own MCM architecture on TSMC 5 nm.
 
  • Like
Reactions: nnunn

CakeMonster

Golden Member
Nov 22, 2012
1,391
498
136
Way too early to speculate about next gen probably. I would like to know, but realistically we won't. If it takes 24 months or more, the 3090 will look better, and the 3080 worse.
 

amenx

Diamond Member
Dec 17, 2004
3,906
2,123
136

MrTeal

Diamond Member
Dec 7, 2003
3,569
1,699
136
What a disaster this would be if true...
Maybe we can get some temporary leeway on the no profanity rule in the tech forums?
 

Kenmitch

Diamond Member
Oct 10, 1999
8,505
2,249
136
What a disaster this would be if true...

Karma rearing it's ugly head. /s
 
Last edited:

Mopetar

Diamond Member
Jan 31, 2011
7,837
5,992
136
Curious, why do you say that? Do you think they plan on getting Hopper out the door by late 2021?

I have a feeling that the Samsung process has a sizable share of the blame concerning the massive power numbers we're seeing for this card. Sure it's fair to say that with all of the other stuff they added that driving PPW wasn't a priority, but it's still a big jump.

If that is the case NVidia wants to get to 5nm as soon as they can. Just look how badly AMD was hamstrung by GF being a node behind Intel for so long.

Assuming AMD finally has a card worth all of the usual hype that the community builds up before a launch that doesn't leave NVidia with a lot of room. Ideally it means we get even better prices than we already have as both companies butt heads for market share.

I wouldn't be surprised if Hopper is more conservative at pushing RT or new features, but offers a massive boost in efficiency. Obviously they get a bigger uplift if the Samsung process really is at fault, but I have no doubt that there are architecture gains to be made, especially on newer technology.
 
  • Like
Reactions: A///

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,560
14,514
136
What a disaster this would be if true...
I hate miners for what they have done to the video card industry.
 

Stuka87

Diamond Member
Dec 10, 2010
6,240
2,559
136
I'm intrigued. You would think being able to accurately measure the GPU power would be something nVidia wouldn't want you to be able to do.

nVidia likes to only show GPU (The chip) power, and exclude all other power consumption (rest of board, memory, etc). Its why their TDP numbers are typically off.
 

Konan

Senior member
Jul 28, 2017
360
291
106

Accord99

Platinum Member
Jul 2, 2001
2,259
172
106
Actually, for newer cards, it only shows % of power limit. At least in HWMonitor.

Try HW Info. The amount of data provided is impressive.

cfvkUCv.png