Question 'Ampere'/Next-gen gaming uarch speculation thread

Page 119 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Ottonomous

Senior member
May 15, 2014
559
292
136
How much is the Samsung 7nm EUV process expected to provide in terms of gains?
How will the RTX components be scaled/developed?
Any major architectural enhancements expected?
Will VRAM be bumped to 16/12/12 for the top three?
Will there be further fragmentation in the lineup? (Keeping turing at cheaper prices, while offering 'beefed up RTX' options at the top?)
Will the top card be capable of >4K60, at least 90?
Would Nvidia ever consider an HBM implementation in the gaming lineup?
Will Nvidia introduce new proprietary technologies again?

Sorry if imprudent/uncalled for, just interested in the forum member's thoughts.
 

Hitman928

Diamond Member
Apr 15, 2012
6,187
10,696
136
Under accessories it lists a "sag holder". Does that mean they'll come with those kick stand type things that prop up the card to prevent it from sagging when vertically mounted?
 
  • Like
Reactions: lightmanek

Tarkin77

Member
Mar 10, 2018
75
163
106
Couldn't Vega pull upwards of 450 with unlocked power? 500 W is believable with ocing. Not sure if an AIB would release a card with 500w tdp though

Sent from my SM-N975F using Tapatalk
I own a Radeon VII, watercooled, +50% Power Target @ 2.075MHz and under max gaming load (Doom Eternal) or ETH Mining it pulls around 400W constantly --- Furmark is different ... my PSU shuts down in 1 sec. (470W+ load). I have a Seasonic Prime Titanium 650W - i think its to weak for this kind of load :D So if the new 3090 RTX pulls around 400+W overclocked, one REALLY needs a VERY GOOD and expensive PSU!
 

tamz_msc

Diamond Member
Jan 5, 2017
3,865
3,729
136
Who's gonna bet that Ampere is but a die-shrunk Turing with more CUDA cores at each tier, with improved RT performance but similar rasterization performance as Turing? It sure seems that way to me, since the rumors point to only 10-15% more performance in the RTX 3080 over the RTX 2080 Ti
 

A///

Diamond Member
Feb 24, 2017
4,351
3,158
136
Doubt it.

AMD could really hit Nvidia where it hurts, and that's great. They'll take turns lowering prices, and I can get a sweet nv card for cheap. Going to say that AMD managed to get within 85% of all of nv's upcoming lineup.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,523
3,038
136
I don't think AMD is so keen on a serious price war with Nvidia. AMD has both CPUs and GPUs on 7nm process and they have a limited allocation of wafers. CPUs compete much better against the competition than GPUs so the manufacturing priority should be on them and because of that they will want to maximize margins instead of gaining market share in the gpu market, they have their GPUs in consoles so It's not like game developers can ignore AMD and mainly optimize for Nvidia.
 
  • Like
Reactions: ozzy702

Gideon

Golden Member
Nov 27, 2007
1,774
4,145
136
Who's gonna bet that Ampere is but a die-shrunk Turing with more CUDA cores at each tier, with improved RT performance but similar rasterization performance as Turing? It sure seems that way to me, since the rumors point to only 10-15% more performance in the RTX 3080 over the RTX 2080 Ti

Nvidia has added a boatload of transistors and pushed the TBP from 250W to 350W. The shader count has only increased 20% and clocks are roughly the same.

They must have used the extra transistors and TDP somewhere (we know that cache's have been resized considerably for instance). Sure extra tensor-cores and RT-cores are part of it, but they cannot account for all
 

A///

Diamond Member
Feb 24, 2017
4,351
3,158
136
I don't think AMD is so keen on a serious price war with Nvidia. AMD has both CPUs and GPUs on 7nm process and they have a limited allocation of wafers. CPUs compete much better against the competition than GPUs so the manufacturing priority should be on them and because of that they will want to maximize margins instead of gaining market share in the gpu market, they have their GPUs in consoles so It's not like game developers can ignore AMD and mainly optimize for Nvidia.
They'll price accordingly and aim for a larger margin than they'd normally go for. Not that I know myself, but I doubt their royalty per gaming unit sold is much. It's a lot of hardware being sold for a rumored $500. You could probably get a vague idea off of their quarterly reports if they list semi custom/custom sales.
 

Krteq

Senior member
May 22, 2015
993
672
136
Nvidia has added a boatload of transistors and pushed the TBP from 250W to 350W. The shader count has only increased 20% and clocks are roughly the same.

They must have used the extra transistors and TDP somewhere (we know that cache's have been resized considerably for instance). Sure extra tensor-cores and RT-cores are part of it, but they cannot account for all
Was that transistors count confirmed?
 

Gideon

Golden Member
Nov 27, 2007
1,774
4,145
136
Was that transistors count confirmed?
AFAIK it hasn't.

Yet the die size is 627mm^2 and it's on a 7nm process (too many pictures and around for these to be fake).

Even if it's using a "fake Samsung 7nm" process as some claimed (which i highly doubt) that means a considerable transistor budget increase. Even more so if it's TSMC 7nm or Samsung 7nm EUV.

EDIT:
By the way, We know that 826 mm^2 A100 is 54 million transistors on TSMC 7nm. If A102 is 627mm^2 , 34.5b seems quite plausible
 
Last edited:

TESKATLIPOKA

Platinum Member
May 1, 2020
2,523
3,038
136
They'll price accordingly and aim for a larger margin than they'd normally go for. Not that I know myself, but I doubt their royalty per gaming unit sold is much. It's a lot of hardware being sold for a rumored $500. You could probably get a vague idea off of their quarterly reports if they list semi custom/custom sales.
And where was I talking about royalty per SoC in consoles? I was talking about something totally different.
 

A///

Diamond Member
Feb 24, 2017
4,351
3,158
136
And where was I talking about royalty per SoC in consoles? I was talking about something totally different.
Then why talk about margins? Can't ignore a large segment. AMD has had APUs/CPUs/GPUs in consoles for a long time. Remember? Where did that get AMD? Or do you mean the semi-custom Zen2 processor being used? Because I still fail to see how that'll translate well into the PC market. The gaming industry is slow to evolve. You'll be stuck waiting half a decade before a majority of developers begin using upwards of 8 cores or more regularly.
 
Last edited:

Gideon

Golden Member
Nov 27, 2007
1,774
4,145
136
Remember when AMD increased transistor budget for Vega in order to hit high clocks? This could be the same for Ampere if it's using Samsung's process.
1. We know the clock-speeds. They are almost unchanged from Turing (yet TDP is up 100W) so it can't be that.
2. AMD ended up using way too much transistors. In the end the added clock-speeds were very disappointing, they hitting power/heat limits. way too soon vs Pascal (see next point). Probably it was due to shoe-string budgets of AMD at the time.
3. Pascal managed to increase clocks way higher (even the 14nm Samsung versions) with much less transistors used to achieve it (vs Maxwell). They did a lot of work on the low-level layout side of things on both Maxwell and Pascal.
4. Based on Xbox Series X specs RDNA2 seems to be to AMD what Maxwell was to Nvidia, huge perf/watt increase and higher clocks from almost minimal extra transistors.

TL;DR:
  • Wasting a bunch of transistors to get the clock speed up almost never works nor is a good idea. See Pentium 4, Bulldozer, Vega as examples. I don't believe NVIDIA is doing it.
  • Getting clock-speeds up due to better physical implementation seems to work way better (See Pascal, Renoir Vega, possibly RDNA2)
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,361
136
Who's gonna bet that Ampere is but a die-shrunk Turing with more CUDA cores at each tier, with improved RT performance but similar rasterization performance as Turing? It sure seems that way to me, since the rumors point to only 10-15% more performance in the RTX 3080 over the RTX 2080 Ti

Take A100, remove two GPCs, remove the FP64 from each SM, change the HBM memory controllers to GDDR-6x and you got the GA102.
RT and Tensor cores architecture is the same as A100. That means 4x higher throughput per Tensor Core vs Turing. Although we dont have anything on RT cores, i will say they should at least have double throughput vs per Turning RT Cores.
 
  • Like
Reactions: lightmanek

Gideon

Golden Member
Nov 27, 2007
1,774
4,145
136
Take A100, remove two GPCs, remove the FP64 from each SM, change the HBM memory controllers to GDDR-6x and you got the GA102.
RT and Tensor cores architecture is the same as A100. That means 4x higher throughput per Tensor Core vs Turing. Although we dont have anything on RT cores, i will say they should at least have double throughput vs per Turning RT Cores.
Seems about correct.

I'm still not 100% convinced they'll go with the full 4x throughput increase for the tensor cores. Looks like an overkill for consumer-class hardware (e.g. like 1/2 FP64 perf). Unless they have some totally new features (other than DLSS) that also require tensor cores, it seems like a waste of transistor while 2x increase would still be massive.
 

tamz_msc

Diamond Member
Jan 5, 2017
3,865
3,729
136
1. We know the clock-speeds. They are almost unchanged from Turing (yet TDP is up 100W) so it can't be that.
2. AMD ended up using way too much transistors. In the end the added clock-speeds were very disappointing, they hitting power/heat limits. way too soon vs Pascal (see next point). Probably it was due to shoe-string budgets of AMD at the time.
3. Pascal managed to increase clocks way higher (even the 14nm Samsung versions) with much less transistors used to achieve it (vs Maxwell). They did a lot of work on the low-level layout side of things on both Maxwell and Pascal.
4. Based on Xbox Series X specs RDNA2 seems to be to AMD what Maxwell was to Nvidia, huge perf/watt increase and higher clocks from almost minimal extra transistors.

TL;DR:
  • Wasting a bunch of transistors to get the clock speed up almost never works nor is a good idea. See Pentium 4, Bulldozer, Vega as examples. I don't believe NVIDIA is doing it.
  • Getting clock-speeds up due to better physical implementation seems to work way better (See Pascal, Renoir Vega, possibly RDNA2)
That is plausible if Ampere was always planned around Samsung's 8nm process, improvements to layout leading to clock speed increase. However, if the rumor of that TSMC-Nvidia deal falling through is true, then choosing Samsung and redoing the design for their node must have caused less room for transistor optimization.
 

Qwertilot

Golden Member
Nov 28, 2013
1,604
257
126
Seems about correct.

I'm still not 100% convinced they'll go with the full 4x throughput increase for the tensor cores. Looks like an overkill for consumer-class hardware (e.g. like 1/2 FP64 perf). Unless they have some totally new features (other than DLSS) that also require tensor cores, it seems like a waste of transistor while 2x increase would still be massive.

These cards - at least the bigger ones - are going to have to do double duty as workstation cards for deep learning/scientific things that don't merit the full A100, so they've got quite strong motivation to keep the tensor cores about in the same format as on A100.

Question perhaps more whether they can find a way to utilise all of them.
 
  • Like
Reactions: nnunn

AtenRa

Lifer
Feb 2, 2009
14,003
3,361
136
Seems about correct.

I'm still not 100% convinced they'll go with the full 4x throughput increase for the tensor cores. Looks like an overkill for consumer-class hardware (e.g. like 1/2 FP64 perf). Unless they have some totally new features (other than DLSS) that also require tensor cores, it seems like a waste of transistor while 2x increase would still be massive.

Turing GPUs are not only used in consumer products (gaming) but also for AI DL/ML with cards like the TITAN RTX and Turing GPUs used in the NVIDIA T4 Enterprise Severs.
If you see NVIDIAs Q2 2021FY , enterprise revenue was higher vs Gaming revenue and Im sure they allocated a lot of transistors in GA102 for Tensors and Server performance/communications.

Also to note that Turing GPUs have higher throughput per Tensor Core vs Volta V100
 

Glo.

Diamond Member
Apr 25, 2015
5,803
4,777
136
If Ampere GPUs have almost all of the GEMM stuff from A100 die in them it makes actually sense, why Nvidia is calling them Ampere, why it has so high density and so high Transistor count, with only few SM's.

I really cannot wait for Whitepaper on gaming cards.
 
  • Like
Reactions: nnunn