Question 'Ampere'/Next-gen gaming uarch speculation thread

Page 54 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Ottonomous

Senior member
May 15, 2014
559
292
136
How much is the Samsung 7nm EUV process expected to provide in terms of gains?
How will the RTX components be scaled/developed?
Any major architectural enhancements expected?
Will VRAM be bumped to 16/12/12 for the top three?
Will there be further fragmentation in the lineup? (Keeping turing at cheaper prices, while offering 'beefed up RTX' options at the top?)
Will the top card be capable of >4K60, at least 90?
Would Nvidia ever consider an HBM implementation in the gaming lineup?
Will Nvidia introduce new proprietary technologies again?

Sorry if imprudent/uncalled for, just interested in the forum member's thoughts.
 

Ajay

Lifer
Jan 8, 2001
15,429
7,849
136
Maxwell -> Pascal performance increase is primarily through increasing shader counts and a huge uptick in clocks. uArch differences were much smaller than other uArch changes, just the jump from going 28nm -> 16nm was huge.

Anyway, everybody's acting like 8LPP is the end of the world, but it's really not. It's still like a 40-50% improvement in power/perf vs TSMC N16 (give or take), it's just abysmal in density is all (compared to N7 that is).
Well, I guess if you want to put a fine point on it wrt to core architecture, the major changes were Kepler->Maxwell and Pascal->Turing. Though I'm not sure how much impact the Turing changes had on gaming performance.
 
  • Like
Reactions: uzzi38

DXDiag

Member
Nov 12, 2017
165
121
116
Doesn't seem like that's the case, otherwise they would have quoted a much higher raw FP32/FP64 performance. You'd have to make use of the tensor cores.
No, NVIDIA already quoted those large gains in their general HPC benchmarks (which make no use of Tensor cores). Raw FP32 throughput is something and the effective performance out of it is something else.

1592775942456.png
 

FaaR

Golden Member
Dec 28, 2007
1,056
412
136
Regarding that crazy reference cooler we've seen pictures of... I've been thinking a lot about WHY you would design something like that (obviously it wouldn't just be for the hell of it), and while some aspects elude me - such as the essentially passive cooling fins in the midsection; seems like a terrible design choice to me honestly - there might actually be a rational explanation for the strange appearance of this cooler.

I'm thinking it has to do with the four very large diameter heatpipes we see in the pictures (rather than any imaginary ASICs on the rear side of the board, hah...)

The front side fan would be positioned over what we believe to be a vapor chamber, which typically are pretty thin, so there would be room to embed a fan into the fin stack on top of the vapor chamber, no problems there. However, the heat pipes leading off to the other fin stack seem to be entirely straight along the "depth" axis of the cooler (they widen apart somewhat in the "height" axis; this combined with being so thick as they are (they appear to be at least 10mm in diameter), that means there would be very little space left in the fin stack for a fan in the traditional front facing position, as the heat pipes block most of the available room on what is considered the top side of the cooler.

So by instead sticking the fan in a reverse position under the heat pipes, some additional space can be utilized to fit everything in. The somewhat oddball thing here is the opposite airflow of the two fans by having the reverse side fan push air downwards through the cooler (as mounted in a traditional PC tower), because it's rare to see PC chassis with any significant air exhaust in the bottom end of the casing, and this might thus lead to warm air recirculating from the reverse side fan into the top fan.

So I'm thinking maybe it's hard to achieve sufficient static pressure with a fan sucking air through such a dense finstack, and that's why the fan is in a push configuration.

Oh well.

Just my odd musings. TL: DR... :p
 

raghu78

Diamond Member
Aug 23, 2012
4,093
1,475
136
Maxwell -> Pascal performance increase is primarily through increasing shader counts and a huge uptick in clocks. uArch differences were much smaller than other uArch changes, just the jump from going 28nm -> 16nm was huge.

Anyway, everybody's acting like 8LPP is the end of the world, but it's really not. It's still like a 40-50% improvement in power/perf vs TSMC N16 (give or take), it's just abysmal in density is all (compared to N7 that is).

SS 8LPP is a slightly improved version of SS 10LPP node and comparable to TSMC N10 in terms of transistor perf and density. TSMC N10 delivers 15% higher transistor perf at iso power vs TSMC 16FF+ or 35% lower power at iso speed . If Nvidia Ampere based Geforce GPUs are manufactured at SS 8LPP, its going to put them at a significant disadvantage in terms of process against AMD RDNA2 GPUs manufactured on TSMC N7P or N7+. N7P/N7+ is around 25-30% higher transistor perf than SS 8LPP. I think the 300-350w power numbers for GA102 could indicate its manufactured at SS 8LPP.
 
  • Like
Reactions: Glo. and Saylick

Konan

Senior member
Jul 28, 2017
360
291
106
SS 8LPP is a slightly improved version of SS 10LPP node and comparable to TSMC N10 in terms of transistor perf and density. TSMC N10 delivers 15% higher transistor perf at iso power vs TSMC 16FF+ or 35% lower power at iso speed . If Nvidia Ampere based Geforce GPUs are manufactured at SS 8LPP, its going to put them at a significant disadvantage in terms of process against AMD RDNA2 GPUs manufactured on TSMC N7P or N7+. N7P/N7+ is around 25-30% higher transistor perf than SS 8LPP. I think the 300-350w power numbers for GA102 could indicate its manufactured at SS 8LPP.

Nvidia with Turing at 12/16nm TSMC and if moving to Samsung 8nm with Ampere will give a 45% better Node density. IF this high end Ampere is at SS 8nm then it is not comparable to TSMC 10n

Node Densities:
TSMC 12/16nm Density is 33.8 MTr/mm2
Samsung 8nm (uHD) = 61.2 MTr/mm2

TSMC 10nm =52.51 MTr/mm2
TSMC N7 = 96.49 MTr/mm2
TSMC N7P = 96.49 MTr/mm2 (no density difference)
TSMC N7+ EUV = 115.8 MTr/mm2

In recent years AMD has always been typically the first to a new node. Every product launch Nvidia is behind on process node and every single year Nvidia always has performance crown.
RDNA2 is probably not on the + node as said a few pages back and in its own thread.
Besides, IMO there is a clear winner with which company has the better encoders, the better drivers and overall better software. At the end of the day for performance, I don’t think majority of buyers would have issue buying something with ~50w more if it turns out like that and I reckon Nvidia knows that...
 

itsmydamnation

Platinum Member
Feb 6, 2011
2,764
3,131
136
RDNA2 is probably not on the + node as said a few pages back and in its own thread.
Besides, IMO there is a clear winner with which company has the better encoders, the better drivers and overall better software. At the end of the day for performance, I don’t think majority of buyers would have issue buying something with ~50w more if it turns out like that and I reckon Nvidia knows that...
Assuming insane power budget on high end Ampere that then gives power budget for an unplanned , extra binned XT PE part. If i had to bet which process fundamentally scales better with clocks i would bet on TSMC.
 
  • Like
Reactions: Konan

Konan

Senior member
Jul 28, 2017
360
291
106
Assuming insane power budget on high end Ampere that then gives power budget for an unplanned , extra binned XT PE part. If i had to bet which process fundamentally scales better with clocks i would bet on TSMC.
Totally agree with you. It would be even more interesting to me if these apparently three high end SKUs are actually on TSMC 7nm and are baked as much as the rumours say. Then again apparently going to 5nm 18 months later or so Is like two node jumps in little time.
 
Last edited:

raghu78

Diamond Member
Aug 23, 2012
4,093
1,475
136
Nvidia with Turing at 12/16nm TSMC and if moving to Samsung 8nm with Ampere will give a 45% better Node density. IF this high end Ampere is at SS 8nm then it is not comparable to TSMC 10n

Node Densities:
TSMC 12/16nm Density is 33.8 MTr/mm2
Samsung 8nm (uHD) = 61.2 MTr/mm2

TSMC 10nm =52.51 MTr/mm2
TSMC N7 = 96.49 MTr/mm2
TSMC N7P = 96.49 MTr/mm2 (no density difference)
TSMC N7+ EUV = 115.8 MTr/mm2

In recent years AMD has always been typically the first to a new node. Every product launch Nvidia is behind on process node and every single year Nvidia always has performance crown.
RDNA2 is probably not on the + node as said a few pages back and in its own thread.
Besides, IMO there is a clear winner with which company has the better encoders, the better drivers and overall better software. At the end of the day for performance, I don’t think majority of buyers would have issue buying something with ~50w more if it turns out like that and I reckon Nvidia knows that...

SS 8nm UHD is 61 MTr/sq mm but Ampere GPUs which need to clock 2+ Ghz are not using UHD cells. The UHD cell has a cell height on 376 nm while the HD cell is 420nm.


So with HD cells the transistor density is around 55 MTx/sq mm. Meanwhile you can say Nvidia has better drivers/software and I would say that is true to some extent. Nvidia having the perf crown in the past does not mean they will have with Ampere. Meanwhile from my analysis of the Xbox Series X SoC specs - die size , CU count, clocks, and power supply my estimate is Navi 21 is likely to deliver 24 TF at roughly 270w. We will wait and see where fully enabled GA102 and Navi 21 specs/perf land. But I am quite confident in saying its going to be the closest contest for the GPU crown in a decade.
 
Last edited:

Konan

Senior member
Jul 28, 2017
360
291
106
SS 8nm UHD is 61 MTr/sq mm but Ampere GPUs which need to clock 2+ Ghz are not using UHD cells.

Where does it say Nvidia is not using SS uHD? (Curious - not that it needs to be debated really)
Why does Ampere need to clock 2+ GHZ?
(Nvidia will do what it has to, to keep the top crown)

I think possibly that at launch will see two of the three high end Nvidia cards released. NV will hold one back to see what AMD has and tweak if needed depending on confidence levels.

Honestly, comparing chip densities for devices that have different architectures and design trade-offs isn’t the best thing. I/O and Cache are big components and not all circuit elements have the same density. Less cache less heat.

The main point of the node density conversation for me was that SS 8nm (if that it is what it is) is quite superior to what Turing is on now. So with that in mind and considering the Ampere arch. bodes well. If the competition can get 30 to 40% better than the 2080 TI, so can Nvidia.
Agree things sound close. If it really is it’ll be great for the consumer and overall pricing segmentation.
 

Saylick

Diamond Member
Sep 10, 2012
3,125
6,296
136
Meanwhile from my analysis of the Xbox Series X SoC specs - die size , CU count, clocks, and power supply my estimate is Navi 21 is likely to deliver 24 TF at roughly 270w. We will wait and see where fully enabled GA102 and Navi 21 specs/perf land. But I am quite confident in saying its going to be the closest contest for the GPU crown in a decade.
You mind sharing your math on that estimate? I personally have a 21-22 TF @ 300W expectation for Big Navi / N21 and am curious as to how you arrived at your estimate, but I totally agree that this will be the closest contest at the upper end in a long time.
 

Konan

Senior member
Jul 28, 2017
360
291
106
Nvidia with Turing at 12/16nm TSMC and if moving to Samsung 8nm with Ampere will give a 45% better Node density.

Node Densities:
TSMC 12/16nm Density is 33.8 MTr/mm2
Samsung 8nm (uHD) = 61.2 MTr/mm2

TSMC 10nm =52.51 MTr/mm2
TSMC N7 = 96.49 MTr/mm2
TSMC N7P = 96.49 MTr/mm2 (no density difference)
TSMC N7+ EUV = 115.8 MTr/mm2

Just to note - According to techpowerup looks like I was wrong with the standard N7 density with NAVI 10 at 41 MTr/mm2. That kind of makes sense now with the performance/watt statement
And for reference
VEGA 20 40 MTr/mm2
TU104 24.9 MTr/mm2
GP102 25 MTr/mm2

Looking forward to seeing some thing with a better indication than time spy graphics.
Taking a look at 3D Mark average results for 2080 It looks like to me Rogame’s increase in his charts compared to what he had before are a little too much, somewhere In between would have been better.
 

raghu78

Diamond Member
Aug 23, 2012
4,093
1,475
136
You mind sharing your math on that estimate? I personally have a 21-22 TF @ 300W expectation for Big Navi / N21 and am curious as to how you arrived at your estimate, but I totally agree that this will be the closest contest at the upper end in a long time.

Sent you a direct message. I agree that based on the newer info on RDNA2 relating to dual pipe graphics command processor on Sienna Cichlid (which could affect my area scaling calculations) that the CU count could be 80 and performance could be a bit lower. But I am confident of 21+ TF.
 
  • Like
Reactions: Saylick

beginner99

Diamond Member
Jun 2, 2009
5,210
1,580
136
Navi 2x will be clocked over 2GHz, based on the PS5/XBSX. It would be unlikely nVidia will be able to beat it with a less-than 2GHz clock.

The numbers for the PS5 are max clocks (turbo) which you will most likely never reach in an actual game. The rumor is Sony was surprised by MS GPU performance (number of CUs) and hence by not revealing "game clock" are trying to hide their very big GPU deficit while putting focus on their ssd tech.
 
  • Like
Reactions: Konan

Leadbox

Senior member
Oct 25, 2010
744
63
91
The numbers for the PS5 are max clocks (turbo) which you will most likely never reach in an actual game. The rumor is Sony was surprised by MS GPU performance (number of CUs) and hence by not revealing "game clock" are trying to hide their very big GPU deficit while putting focus on their ssd tech.
Given the size of the thing, I suspect it will hold clocks for the most part. It's that big for a reason.
 
  • Like
Reactions: Konan and Saylick

Tabalan

Member
Feb 23, 2020
41
25
91
How certain (reliable leaks) it is that Nvidia will be using SS 8LPP? Samsung has further improved 8 nm process, design for high clock/high power 8LPU, which seems like a better solution for big, power hungry and highly clocked GPUs.
 
  • Like
Reactions: Lodix

uzzi38

Platinum Member
Oct 16, 2019
2,622
5,880
146
How certain (reliable leaks) it is that Nvidia will be using SS 8LPP? Samsung has further improved 8 nm process, design for high clock/high power 8LPU, which seems like a better solution for big, power hungry and highly clocked GPUs.
8LPP has never been mentionned by name, it could also be 8LPU. I've just been shorthanding everything to 8LPP because afaik 7LPP was the original node. The only thing rumoured is some form of Samsung's 10nm, be it one of the 10 nodes itself or one of the 8nm nodes.

Or so I was first told. I'm not sure that was even the case at this point. someone I've been talking to recently is sure that EUV was never planned for Ampere. So at this point who the hell knows.
 

jpiniero

Lifer
Oct 1, 2010
14,584
5,206
136
No, NVIDIA already quoted those large gains in their general HPC benchmarks (which make no use of Tensor cores). Raw FP32 throughput is something and the effective performance out of it is something else.

I guess we will have to wait for actual reviews then. The A100 PCIe has it's TDP dropped to 250 W yet nVidia didn't lower the quoted specs. Maybe it can boost well above the advertised frequencies and the PCIe version just can just not hold it as long, hence why nVidia says it's 10% slower.
 

Glo.

Diamond Member
Apr 25, 2015
5,705
4,549
136
Not unless nVidia throw more cores at the problem rather than raising clocks.
Samsung's 10 nm process will give you worse Transistor density than even botched job that Navi 10 is on 7 nm TSMC.

TSMC's N7 is capable of packing 60 mln xTors/mm2 at high clock speeds. Navi 10 has 40 mln xTors/mm2.

I would expect at best 32-36 mln xTors/mm2 on Samsung's 10 nm node, which is called 8 nm. Its simply not that more dense than 12 nm FFN.
 

RetroZombie

Senior member
Nov 5, 2019
464
386
96
In recent years AMD has always been typically the first to a new node. Every product launch Nvidia is behind on process node and every single year Nvidia always has performance crown.
And that's a very good strategy by nvidia.
I think it's the correct thing to do with gpus. In the past ATI with the radeon 9700 did the same and it worked very well for them.

Besides, IMO there is a clear winner with which company has the better encoders, the better drivers and overall better software.
Well about that not really, or would you say that nvidia drivers are bad when they under deliver?

So Mark Cerny explicitly lied, when he mentioned that PS5 will will hold that clock the vast majority of the time, bar extreme cases?
Inside PlayStation 5
«"When that worst case game arrives, it will run at a lower clock speed. But not too much lower, to reduce power by 10 per cent it only takes a couple of percent reduction in frequency, so I'd expect any downclocking to be pretty minor," he explains. "All things considered, the change to a variable frequency approach will show significant gains for PlayStation gamers." »