Question 'Ampere'/Next-gen gaming uarch speculation thread

Page 122 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Ottonomous

Senior member
May 15, 2014
559
292
136
How much is the Samsung 7nm EUV process expected to provide in terms of gains?
How will the RTX components be scaled/developed?
Any major architectural enhancements expected?
Will VRAM be bumped to 16/12/12 for the top three?
Will there be further fragmentation in the lineup? (Keeping turing at cheaper prices, while offering 'beefed up RTX' options at the top?)
Will the top card be capable of >4K60, at least 90?
Would Nvidia ever consider an HBM implementation in the gaming lineup?
Will Nvidia introduce new proprietary technologies again?

Sorry if imprudent/uncalled for, just interested in the forum member's thoughts.
 

Gideon

Golden Member
Nov 27, 2007
1,646
3,712
136
Remember when AMD increased transistor budget for Vega in order to hit high clocks? This could be the same for Ampere if it's using Samsung's process.
1. We know the clock-speeds. They are almost unchanged from Turing (yet TDP is up 100W) so it can't be that.
2. AMD ended up using way too much transistors. In the end the added clock-speeds were very disappointing, they hitting power/heat limits. way too soon vs Pascal (see next point). Probably it was due to shoe-string budgets of AMD at the time.
3. Pascal managed to increase clocks way higher (even the 14nm Samsung versions) with much less transistors used to achieve it (vs Maxwell). They did a lot of work on the low-level layout side of things on both Maxwell and Pascal.
4. Based on Xbox Series X specs RDNA2 seems to be to AMD what Maxwell was to Nvidia, huge perf/watt increase and higher clocks from almost minimal extra transistors.

TL;DR:
  • Wasting a bunch of transistors to get the clock speed up almost never works nor is a good idea. See Pentium 4, Bulldozer, Vega as examples. I don't believe NVIDIA is doing it.
  • Getting clock-speeds up due to better physical implementation seems to work way better (See Pascal, Renoir Vega, possibly RDNA2)
 

AtenRa

Lifer
Feb 2, 2009
14,001
3,357
136
Who's gonna bet that Ampere is but a die-shrunk Turing with more CUDA cores at each tier, with improved RT performance but similar rasterization performance as Turing? It sure seems that way to me, since the rumors point to only 10-15% more performance in the RTX 3080 over the RTX 2080 Ti

Take A100, remove two GPCs, remove the FP64 from each SM, change the HBM memory controllers to GDDR-6x and you got the GA102.
RT and Tensor cores architecture is the same as A100. That means 4x higher throughput per Tensor Core vs Turing. Although we dont have anything on RT cores, i will say they should at least have double throughput vs per Turning RT Cores.
 
  • Like
Reactions: lightmanek

Gideon

Golden Member
Nov 27, 2007
1,646
3,712
136
Take A100, remove two GPCs, remove the FP64 from each SM, change the HBM memory controllers to GDDR-6x and you got the GA102.
RT and Tensor cores architecture is the same as A100. That means 4x higher throughput per Tensor Core vs Turing. Although we dont have anything on RT cores, i will say they should at least have double throughput vs per Turning RT Cores.
Seems about correct.

I'm still not 100% convinced they'll go with the full 4x throughput increase for the tensor cores. Looks like an overkill for consumer-class hardware (e.g. like 1/2 FP64 perf). Unless they have some totally new features (other than DLSS) that also require tensor cores, it seems like a waste of transistor while 2x increase would still be massive.
 

tamz_msc

Diamond Member
Jan 5, 2017
3,821
3,643
136
1. We know the clock-speeds. They are almost unchanged from Turing (yet TDP is up 100W) so it can't be that.
2. AMD ended up using way too much transistors. In the end the added clock-speeds were very disappointing, they hitting power/heat limits. way too soon vs Pascal (see next point). Probably it was due to shoe-string budgets of AMD at the time.
3. Pascal managed to increase clocks way higher (even the 14nm Samsung versions) with much less transistors used to achieve it (vs Maxwell). They did a lot of work on the low-level layout side of things on both Maxwell and Pascal.
4. Based on Xbox Series X specs RDNA2 seems to be to AMD what Maxwell was to Nvidia, huge perf/watt increase and higher clocks from almost minimal extra transistors.

TL;DR:
  • Wasting a bunch of transistors to get the clock speed up almost never works nor is a good idea. See Pentium 4, Bulldozer, Vega as examples. I don't believe NVIDIA is doing it.
  • Getting clock-speeds up due to better physical implementation seems to work way better (See Pascal, Renoir Vega, possibly RDNA2)
That is plausible if Ampere was always planned around Samsung's 8nm process, improvements to layout leading to clock speed increase. However, if the rumor of that TSMC-Nvidia deal falling through is true, then choosing Samsung and redoing the design for their node must have caused less room for transistor optimization.
 

Qwertilot

Golden Member
Nov 28, 2013
1,604
257
126
Seems about correct.

I'm still not 100% convinced they'll go with the full 4x throughput increase for the tensor cores. Looks like an overkill for consumer-class hardware (e.g. like 1/2 FP64 perf). Unless they have some totally new features (other than DLSS) that also require tensor cores, it seems like a waste of transistor while 2x increase would still be massive.

These cards - at least the bigger ones - are going to have to do double duty as workstation cards for deep learning/scientific things that don't merit the full A100, so they've got quite strong motivation to keep the tensor cores about in the same format as on A100.

Question perhaps more whether they can find a way to utilise all of them.
 
  • Like
Reactions: nnunn

AtenRa

Lifer
Feb 2, 2009
14,001
3,357
136
Seems about correct.

I'm still not 100% convinced they'll go with the full 4x throughput increase for the tensor cores. Looks like an overkill for consumer-class hardware (e.g. like 1/2 FP64 perf). Unless they have some totally new features (other than DLSS) that also require tensor cores, it seems like a waste of transistor while 2x increase would still be massive.

Turing GPUs are not only used in consumer products (gaming) but also for AI DL/ML with cards like the TITAN RTX and Turing GPUs used in the NVIDIA T4 Enterprise Severs.
If you see NVIDIAs Q2 2021FY , enterprise revenue was higher vs Gaming revenue and Im sure they allocated a lot of transistors in GA102 for Tensors and Server performance/communications.

Also to note that Turing GPUs have higher throughput per Tensor Core vs Volta V100
 

Glo.

Diamond Member
Apr 25, 2015
5,711
4,559
136
If Ampere GPUs have almost all of the GEMM stuff from A100 die in them it makes actually sense, why Nvidia is calling them Ampere, why it has so high density and so high Transistor count, with only few SM's.

I really cannot wait for Whitepaper on gaming cards.
 
  • Like
Reactions: nnunn

DDH

Member
May 30, 2015
168
168
111
Doubt it.

AMD could really hit Nvidia where it hurts, and that's great. They'll take turns lowering prices, and I can get a sweet nv card for cheap. Going to say that AMD managed to get within 85% of all of nv's upcoming lineup.
And there in lies the problem. You only care about AMD competing so you can get a cheaper NVIDIA card. If you won't buy an AMD card then you helped enable the higher prices from NVIDIA. If AMD doesn't compete in the high-end then sucks to be you, giving NVIDIA shareholders a nice slice of your hard earned

Sent from my SM-N975F using Tapatalk
 

DDH

Member
May 30, 2015
168
168
111
1. We know the clock-speeds. They are almost unchanged from Turing (yet TDP is up 100W) so it can't be that.
2. AMD ended up using way too much transistors. In the end the added clock-speeds were very disappointing, they hitting power/heat limits. way too soon vs Pascal (see next point). Probably it was due to shoe-string budgets of AMD at the time.
3. Pascal managed to increase clocks way higher (even the 14nm Samsung versions) with much less transistors used to achieve it (vs Maxwell). They did a lot of work on the low-level layout side of things on both Maxwell and Pascal.
4. Based on Xbox Series X specs RDNA2 seems to be to AMD what Maxwell was to Nvidia, huge perf/watt increase and higher clocks from almost minimal extra transistors.

TL;DR:
  • Wasting a bunch of transistors to get the clock speed up almost never works nor is a good idea. See Pentium 4, Bulldozer, Vega as examples. I don't believe NVIDIA is doing it.
  • Getting clock-speeds up due to better physical implementation seems to work way better (See Pascal, Renoir Vega, possibly RDNA2)
I think point one is likely wrong. The clock speeds for turning, like Pascal, boosted way higher than reported from NVIDIA. I wouldn't be surprised to see ampere clock speeds remaon closer to what NVIDIA has listed.

Sent from my SM-N975F using Tapatalk
 

Glo.

Diamond Member
Apr 25, 2015
5,711
4,559
136
I think point one is likely wrong. The clock speeds for turning, like Pascal, boosted way higher than reported from NVIDIA. I wouldn't be surprised to see ampere clock speeds remaon closer to what NVIDIA has listed.

Sent from my SM-N975F using Tapatalk
If 350W and more is true for GA102 chip, no way in hell they stay around the rated clock speeds.

They HAVE TO clock higher under lighter load conditions.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,362
2,854
106
Then why talk about margins? Can't ignore a large segment. AMD has had APUs/CPUs/GPUs in consoles for a long time. Remember? Where did that get AMD? Or do you mean the semi-custom Zen2 processor being used? Because I still fail to see how that'll translate well into the PC market. The gaming industry is slow to evolve. You'll be stuck waiting half a decade before a majority of developers begin using upwards of 8 cores or more regularly.
Please read once more my post which you quoted first, but I will repeat myself here once more.
I was talking about a price war between AMD and Nvidia. AMD won't cut their margins on GPUs to instead gain a market share, If the available supply of 7nm wafers is limited and they can sell the limited number of GPUs for higher prices. Look at RDNA1 and their selling prices, why do you think they kept them so high? Because they could sell everything they made and the customers were willing to pay the asked price.
CPUs are the largest market for AMD not to mention the most successful so they have the highest priority to be made and most of 7nm wafers will be used on them.
The reason why I mentioned SoC was because of the GPUs! AMD doesn't need to worry about game developers not optimizing for their Hardware(GPU) even with a smaller market share in PC GPUs If AMD has SoCs(CPU+GPU) in 2 mayor consoles, which are soon to be released.
 
Last edited:

kurosaki

Senior member
Feb 7, 2019
258
250
86
So, double the RT-cores and the image distorting tech DLSS. DLSS is not a nice implementation if you want games to look good, it's a way to make games look ass and crank up perf. If you want more fps, just lower the res instead and apply some regular AA..
 
  • Like
Reactions: Panino Manino

KompuKare

Golden Member
Jul 28, 2009
1,016
934
136
AMD could really hit Nvidia where it hurts, and that's great. They'll take turns lowering prices, and I can get a sweet nv card for cheap. Going to say that AMD managed to get within 85% of all of nv's upcoming lineup.
No /sarcasm tag, so I presume you are serious.
Well, if that attitude is common then AMD might as well exit the GPU business.
If "I wish AMD could compete at the high end" is just code for "I want them to compete to bring the prices down so I can buy Nvidia cheaper" then that's not a viable business model.
 

psolord

Golden Member
Sep 16, 2009
1,920
1,194
136
So, double the RT-cores and the image distorting tech DLSS. DLSS is not a nice implementation if you want games to look good, it's a way to make games look ass and crank up perf. If you want more fps, just lower the res instead and apply some regular AA..

Won't you lose the screen's 1:1 pixel mapping when running on non native resolution, making things even worse?

Digital Foundry has said some neat things regarding DLSS 2.0 though, but I don't have a DLSS capable card to test myself. Seems ok in the video.
 

kurosaki

Senior member
Feb 7, 2019
258
250
86

I don't know man. Sitting on a 1440p myself and would neither go lower in res manually or start to upscale. It's not worth it with upscaling.
Just meant with my former reply that DLSS or manual downgrading by lowering resolution is equally bad. We are cheating ourselves to higher framerates, but it looks good on paper though... "4k with RTX and DLSS 3.0" WOWZA! But all i hear is a fancy phrase for "1440p upscaled with crappy image and higher FPS" But we make it look like it's all great features and clearly superior to the competition.
 
Last edited:

Gideon

Golden Member
Nov 27, 2007
1,646
3,712
136
Wow, so a ~2x perfomance increase with DLSS and RTX @ 4K.

Impressive for the settings, but considering it has more than 4x the tensor cores,new RTX cores and 50% more bandwidth (that RT really needs) it's the best-case scenario ...

Should make it abundantly clear that pure rasterization improvement can't really be much more than ~50%, at best.
 

Gideon

Golden Member
Nov 27, 2007
1,646
3,712
136
I don't know man. Sitting on a 1440p myself and would neither go lower in res manually or start to upscale. It's not worth it with upscaling.
Just meant with my former reply that DLSS or manual downgrading by lowering resolution is equally bad. We are cheating ourselves to higher framerates, but it looks good on paper though...

That Video is old DLSS implementation which was quite bad (sometimes even worse than simple upscaling). This is a more accurate comparison:
and this:
 
  • Like
Reactions: psolord

kurosaki

Senior member
Feb 7, 2019
258
250
86
That Video is old DLSS implementation which was quite bad (sometimes even worse than simple upscaling). This is a more accurate comparison:
and this:
But it will never look as good, the tradeoff is going to be a janky ride, from almost nice upscaling, to quite bad. We are cheating ourselves to higher framerates, but it looks good on paper though... "4k with RTX and DLSS 3.0" WOWZA! But all i hear is a fancy phrase for "1440p upscaled with crappy image and higher FPS" But we make it look like it's all great features and clearly superior 4k perf. to the competition.
 

n0x1ous

Platinum Member
Sep 9, 2010
2,572
248
106
I think the 4k gains will be substantial (which is what I care about) but lower resolutions the gains will cpu limited like they already are on 2080ti today
 

Asterox

Golden Member
May 15, 2012
1,026
1,775
136
1. We know the clock-speeds. They are almost unchanged from Turing (yet TDP is up 100W) so it can't be that.
2. AMD ended up using way too much transistors. In the end the added clock-speeds were very disappointing, they hitting power/heat limits. way too soon vs Pascal (see next point). Probably it was due to shoe-string budgets of AMD at the time.
3. Pascal managed to increase clocks way higher (even the 14nm Samsung versions) with much less transistors used to achieve it (vs Maxwell). They did a lot of work on the low-level layout side of things on both Maxwell and Pascal.
4. Based on Xbox Series X specs RDNA2 seems to be to AMD what Maxwell was to Nvidia, huge perf/watt increase and higher clocks from almost minimal extra transistors.

TL;DR:
  • Wasting a bunch of transistors to get the clock speed up almost never works nor is a good idea. See Pentium 4, Bulldozer, Vega as examples. I don't believe NVIDIA is doing it.
  • Getting clock-speeds up due to better physical implementation seems to work way better (See Pascal, Renoir Vega, possibly RDNA2)

Well, for X Box X 52CU/1800mhz GPU can eat 130-140W.

If we compare that to RX 5700XT/40 CU 1750mhz(it can eat up to 240W), then it is very simple what AMD have with RDNA2.