Question 'Ampere'/Next-gen gaming uarch speculation thread

Page 68 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Ottonomous

Senior member
May 15, 2014
559
292
136
How much is the Samsung 7nm EUV process expected to provide in terms of gains?
How will the RTX components be scaled/developed?
Any major architectural enhancements expected?
Will VRAM be bumped to 16/12/12 for the top three?
Will there be further fragmentation in the lineup? (Keeping turing at cheaper prices, while offering 'beefed up RTX' options at the top?)
Will the top card be capable of >4K60, at least 90?
Would Nvidia ever consider an HBM implementation in the gaming lineup?
Will Nvidia introduce new proprietary technologies again?

Sorry if imprudent/uncalled for, just interested in the forum member's thoughts.
 

BenSkywalker

Diamond Member
Oct 9, 1999
9,140
67
91
Nope, GTX950 is exactly what RTX2060 is today. GTX960 which is using a full GM206 is what the RTX2070 is today

The 2060 Super exists. 2060 is a third tier part off of that die.

If RDNA2 is more than 40 mln xTors/mm2 then you can throw your calculations into toilet.

Math doesn't change because you are happy with something, and what does RDNA2 have to do with how the 3050 will compare to the 2060....?

Transistor density is not defined by what is possible on this node

Yes, it is, by definition.

If AMD builds one chip with a huge cache amount and another with a very small cache amount the chip with the larger cache will have greater density. More transistors in and of themselves aren't going to help- put 1GB of L1 cache on RDNA and the density would jump sharply, and performance would barely budge.

When comparing like products, like two graphics chips from the same company, a relative balance of what each chip consists of can be assumed. Some outliers like the A100 are going to look different because of the balance of the design.

How big the actual transistors are is something determined by the process node- that is what a process node means.
 

Glo.

Diamond Member
Apr 25, 2015
5,803
4,777
136
It should be a lot better than that. TU102 is 24.7m/mm2, V100 is 25.8. Should be more like 45-50. A100 is 66, btw.
A100 is N7, not 8LPP.

Remember, 8LPP is slightly tweaked 10 nm process node, which is one node ago, compared to N7. 45-50 mln xTors/mm2 are the high-end of what is possible on this node, just like 60-80 mln xTors/mm2 is the highest achieved on N7.

I personally expect anywhere between 36-38 mln xTors/mm2, in transistor density for next gen Nvidia products, on this node.
 

Glo.

Diamond Member
Apr 25, 2015
5,803
4,777
136
Yes, it is, by definition.

If AMD builds one chip with a huge cache amount and another with a very small cache amount the chip with the larger cache will have greater density. More transistors in and of themselves aren't going to help- put 1GB of L1 cache on RDNA and the density would jump sharply, and performance would barely budge.

When comparing like products, like two graphics chips from the same company, a relative balance of what each chip consists of can be assumed. Some outliers like the A100 are going to look different because of the balance of the design.

How big the actual transistors are is something determined by the process node- that is what a process node means.
What I mean: On N7 node there is possible 80 mln xTors/mm2, to be achieved, just like Apple did, with their designs.

AMD achieved comparably only 40 mln, and Nvidia achieved 60 mln. This is what I meant by saying: Transistor density is not defined by the process node itself, but by how good effort physical design team has done.

N7 is two nodes away from 14 nm GloFo/16 nm TSMC, with 10 nm in between them.
 

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
A100 is N7, not 8LPP.

Remember, 8LPP is slightly tweaked 10 nm process node, which is one node ago, compared to N7. 45-50 mln xTors/mm2 are the high-end of what is possible on this node, just like 60-80 mln xTors/mm2 is the highest achieved on N7.

I personally expect anywhere between 36-38 mln xTors/mm2, in transistor density for next gen Nvidia products, on this node.

Snapdragon S845 has 55 mln xTors/mm2 on 10nm. 8nm is 10% denser.
On 14nm S825 had 30 mln xTors/mm2 and GP107 had 25 mln xTors/mm2.

So dont make stuff up.
 
  • Like
Reactions: Lodix

Glo.

Diamond Member
Apr 25, 2015
5,803
4,777
136
Snapdragon S845 has 55 mln xTors/mm2 on 10nm. 8nm is 10% denser.
On 14nm S825 had 30 mln xTors/mm2 and GP107 had 25 mln xTors/mm2.

So dont make stuff up.
Even on 16 and 12 NM TSMC process was possible to pack 35 mln xTors/mm2. So why neither Nvidia, nor AMD achieved that transistor density, and the best they could come up with was 25 mln xTors/mm2?

I think you are smart enough to understand the difference between high density libraries that are used in ultra mobile chips, versus high-performance libraries used in high-performance chips.

You do realize that if Nvidia will use high-density libraries, they will sacrifice the clock speeds and performance of their products?

Tone your expectations down.
 

Lodix

Senior member
Jun 24, 2016
340
116
116
Samsung's 8LPP is a full node shrink from 14nm ( which has a slight density advantage over TSMC 16nm ).

Samsung 8LPP has a theorical density of ~62 MTr/mm^2 compared to ~29 MTr/mm^2 for 16nm of TSMC.
 

Stuka87

Diamond Member
Dec 10, 2010
6,240
2,559
136
Samsung's 8LPP is a full node shrink from 14nm ( which has a slight density advantage over TSMC 16nm ).

Samsung 8LPP has a theorical density of ~62 MTr/mm^2 compared to ~29 MTr/mm^2 for 16nm of TSMC.

nVidia isn't moving from 14, they are moving from TSMC 12.

And as mentioned above, non-mobile GPU's never use max density, it kills performance. Its far better for performance and cooling to be less dense.
 

Lodix

Senior member
Jun 24, 2016
340
116
116
nVidia isn't moving from 14, they are moving from TSMC 12.

And as mentioned above, non-mobile GPU's never use max density, it kills performance. Its far better for performance and cooling to be less dense.
The "12nm" that Nvidia is using is just a more efficient 16nm from TSMC with no improvements in density. So that is irrelevant.

The numbers that I posted are the maximal theorical densities of the nodes. Depending on the libraries you chose you trade performance for density. But that is a given to any process node and again irrelevant. It doesn't change the fact that 8LPP would bring a full node shrink (~2x density) from the previous generation of products.
 
  • Like
Reactions: xpea

FaaR

Golden Member
Dec 28, 2007
1,056
412
136
You do realize that if Nvidia will use high-density libraries, they will sacrifice the clock speeds and performance of their products?

Tone your expectations down.
Apple's A13 SoC clocks up to 2.66GHz it seems and it is denser than most anything else out there. So there seems to be potential for decent clocks even with a high density layout.
 

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
Even on 16 and 12 NM TSMC process was possible to pack 35 mln xTors/mm2. So why neither Nvidia, nor AMD achieved that transistor density, and the best they could come up with was 25 mln xTors/mm2?

I think you are smart enough to understand the difference between high density libraries that are used in ultra mobile chips, versus high-performance libraries used in high-performance chips.

You do realize that if Nvidia will use high-density libraries, they will sacrifice the clock speeds and performance of their products?

Tone your expectations down.

Apple's A10 had 26,4 mln xTors/mm2 on TSMC's 16nmFF: https://en.wikipedia.org/wiki/Apple_A10
GP104 had 22,9 which is 14% less dense.

So i guess nVidia will lose 14% again (compare A100 with Apple's SoC on TSMC 7nm DUV process).
 

Hitman928

Diamond Member
Apr 15, 2012
6,186
10,693
136
The "12nm" that Nvidia is using is just a more efficient 16nm from TSMC with no improvements in density. So that is irrelevant.

The numbers that I posted are the maximal theorical densities of the nodes. Depending on the libraries you chose you trade performance for density. But that is a given to any process node and again irrelevant. It doesn't change the fact that 8LPP would bring a full node shrink (~2x density) from the previous generation of products.

According to TSMC, 12 nm has a tighter metal pitch which leads to a small density improvement. They don't give any hard numbers though so I can't say if it is of any real significance or not.
 

Lodix

Senior member
Jun 24, 2016
340
116
116
According to TSMC, 12 nm has a tighter metal pitch which leads to a small density improvement. They don't give any hard numbers though so I can't say if it is of any real significance or not.
Yes, there was a 12nm "Compact" version dedicated for low end and cheaper products. As a low cost version of Finfet. But as far as I know Nvidia is not using this.
 

BenSkywalker

Diamond Member
Oct 9, 1999
9,140
67
91
I personally expect anywhere between 36-38 mln xTors/mm2

If we assume you are correct, the 2060 is actively using about 8B transistors which with that density level would be fractionally larger than the 1650 is right now. Honestly I'm expecting a bit better density then that even on the low end parts on SS 8nm, but even assuming you are right that still puts a 2060 level 3050 as the logical landing spot. I'm thinking it should be at least a bit faster, more than a bit with RT.
 

maddie

Diamond Member
Jul 18, 2010
4,881
4,951
136
If we assume you are correct, the 2060 is actively using about 8B transistors which with that density level would be fractionally larger than the 1650 is right now. Honestly I'm expecting a bit better density then that even on the low end parts on SS 8nm, but even assuming you are right that still puts a 2060 level 3050 as the logical landing spot. I'm thinking it should be at least a bit faster, more than a bit with RT.
Concerning the part you are responding to. Who wrote that exactly? I think I speak for most in saying that we don't have the time to search for snippets of posts when the solution is readily available AND easy to use.

I know your stated reason for this. BUT. Is it possible to be less self centered and just use the reply function? We will all be more easily able to follow threads.

Of course, I'm assuming that you don't want to stuff up things. Then also, who knows, maybe you're leading a 1 man revolution to change our habits and protocols.
 

nurturedhate

Golden Member
Aug 27, 2011
1,767
773
136
Concerning the part you are responding to. Who wrote that exactly? I think I speak for most in saying that we don't have the time to search for snippets of posts when the solution is readily available AND easy to use.

I know your stated reason for this. BUT. Is it possible to be less self centered and just use the reply function? We will all be more easily able to follow threads.

Of course, I'm assuming that you don't want to stuff up things. Then also, who knows, maybe you're leading a 1 man revolution to change our habits and protocols.
They stated they are doing it on purpose. The end result being obfuscation and removal of context. We can all make of that what we will.
 

Kenmitch

Diamond Member
Oct 10, 1999
8,505
2,249
136
I know your stated reason for this. BUT. Is it possible to be less self centered and just use the reply function? We will all be more easily able to follow threads.

Click > highlight>choose quote is all it takes. Most likely less effort then typing it in. Maybe his fingers need the exercise is why he takes the long way around?
 

Stuka87

Diamond Member
Dec 10, 2010
6,240
2,559
136
Concerning the part you are responding to. Who wrote that exactly? I think I speak for most in saying that we don't have the time to search for snippets of posts when the solution is readily available AND easy to use.

I know your stated reason for this. BUT. Is it possible to be less self centered and just use the reply function? We will all be more easily able to follow threads.

Of course, I'm assuming that you don't want to stuff up things. Then also, who knows, maybe you're leading a 1 man revolution to change our habits and protocols.

I actually wish it was against the rules. Not only does it remove any context and make it harder to follow, but the person being quoted it not notified that they have been quoted, so they are unlikely to give a rebuttal.
 

Bouowmx

Golden Member
Nov 13, 2016
1,147
551
146
A100 tested in MLPerf 0.7. See 8xA100 results 0.7-18, 19, and 20, and compare with 8xV100 results 39, 40, 41. In general, about half the time, or twice the speed.

Of course, not implying raster or standard compute (CUDA cores only) performance, but I do wish for ~1.5-1.7x performance in the same tier (GeForce RTX 2070 -> 3070).
 
  • Like
Reactions: DXDiag

nurturedhate

Golden Member
Aug 27, 2011
1,767
773
136
It is real simple, if you cannot properly cite your work you don't get credit.
Read the thread.

Anyone who is too lazy to read the thread they are posting in please put me on ignore. If I wanted to partake in sound bite stupidity I'd go the Twitter route.

Reading a thread and having a decent conversation isn't done by focusing on the person talking, but the merit of the idea and the perspective it brings, then counter perspective it offers from others.

You want a personal discussion, there are forums for that, this is a tech forum.
You refuse to properly cite your work. You purposely remove context from posts. You obfuscate.