NVIDIA GeForce 20 Series (Volta) to be released later this year - GV100 announced

Page 8 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

ThatBuzzkiller

Golden Member
Nov 14, 2014
1,120
260
136
GV100 has an awfully uncanny transistor density to the GP100 for what Nvidia advertises to be '12nm' ...

That's like 2.7% more dense than 16nm ?
 

ThatBuzzkiller

Golden Member
Nov 14, 2014
1,120
260
136
Because it is very likely that it's a high power version of 12nm FFC.

So you're telling me that the high powered version of 12nm matches the high powered version of 16nm in terms of density ...

Or did Nvidia use the dense version of 16nm to create their entire Pascal line up ?
 

IllogicalGlory

Senior member
Mar 8, 2013
934
346
136
I wonder if they'll go all-out with die sizes for the desktop parts too? Or perhaps they'll just be Maxwell-like (big but not unprecedented).
 

tamz_msc

Diamond Member
Jan 5, 2017
3,865
3,730
136
So you're telling me that the high powered version of 12nm matches the high powered version of 16nm in terms of density ...

Or did Nvidia use the dense version of 16nm to create their entire Pascal line up ?
Pascal was 16FF+, folks over at beyond3D are saying FFN N=Nvidia, custom node exclusively for their use.
 
Mar 10, 2006
11,715
2,012
126

Glo.

Diamond Member
Apr 25, 2015
5,930
4,991
136
NVIDIA disagrees with you.
The layout over high level of the architecture is the same, if they will use it in consumer space. Take out Tensor core and FP64 cores from it, and you get what you basically will get in GV102 chip.

However the SM will also have the same 64 cores, same as GP100. The difference is that L2 cache is bigger and Registry File size is MASSIVELY bigger, and it will affect the performance the most.

Let me give you an example. GP100 architecture ported to consumer, will be 30-40% core for core, clock for clock, from GP102(which is basically Maxwell on 16 nm process, high-level layout of architecture is the same). GV100 will be another 40% faster than that.

So imagine this. 1280 CUDA core chip in the end 2x faster than GTX 1060, at the same clock.

This is why Nvidia claims that the architecture is 50% more efficient.
 
Mar 10, 2006
11,715
2,012
126
Oh well, whatever.

Wonder how the clocks will be on the consumer parts, with the new scheduling techniques it might even decrease, who knows.

Doubt it.

rchitected to deliver higher performance, the Volta SM has lower instruction and cache latencies than past SM designs and includes new features to accelerate deep learning applications.

Major Features include:

  • New mixed-precision FP16/FP32 Tensor Cores purpose-built for deep learning matrix arithmetic;
  • Enhanced L1 data cache for higher performance and lower latency;
  • Streamlined instruction set for simpler decoding and reduced instruction latencies;
  • Higher clocks and higher power efficiency.
 

tamz_msc

Diamond Member
Jan 5, 2017
3,865
3,730
136
Let me give you an example. GP100 architecture ported to consumer, will be 30-40% core for core, clock for clock, from GP102
How can you port all those fancy hardware for FP64 and INT32 ops in a consumer GPU?
 

Det0x

Golden Member
Sep 11, 2014
1,481
5,059
136
Dedicated Tensor cores apparently do 2*FP16 MUL + FP32 ADD at a very high rate (exclusively for 4x4 matrix processing?), hence the 120 mixed TFLOPs. ?
900GB/s HBM2 means it's using chips running at 1.8Gbps, up from the ~1.4Gbps in P100.
 

Magic Hate Ball

Senior member
Feb 2, 2017
290
250
96
800+mm2 is ridiculously impressive. Too bad that level of GPU has fully left the price bracket of the mere mortal, it would be sweet to get hands on something like that

With AMD hopefully perfecting the art of integrating multiple dies into one large logical processor with Naples, I wonder how they would be able to compete if they could achieve the same feat with Vega and Navi?
 

Glo.

Diamond Member
Apr 25, 2015
5,930
4,991
136
You mean Quadro GP100? That's not a consumer card.
What I mean is that GP100 chip is around 30% faster core for core, clock for clock than GP102 chip if it would be put in consumer GPU(without the non important features).

GV100 should be around the same level faster than GP100.
 

swilli89

Golden Member
Mar 23, 2010
1,558
1,181
136
That is specific to the SM layout of the full SM with all the FP32,FP64,INT32 and Tensor bits I think, not what you'd get in GV104.
Right so what is the takeaway for actual gamers? Like with GP100 we at least saw that while Pascal was basically a shrunk Maxwell it had a few marvels such as craftmanship which enabled it to allow the 16nm process to make it clock a lot higher which is where all the performance gains came from aside from packing more shaders in per mm also thanks to the process .

With this basically being same clocks and density and with all the Volta changes announced as deep learning specific.. Are we just expecting Gv104 and GV102 just to have slightly higher shaders at about the same clocks? Maybe AMD was on to something when they teased "Poor Volta".
 

tamz_msc

Diamond Member
Jan 5, 2017
3,865
3,730
136
What I mean is that GP100 chip is around 30% faster core for core, clock for clock than GP102 chip if it would be put in consumer GPU(without the non important features).

GV100 should be around the same level faster than GP100.
But without the non-important features it is already 3840 cores in the P6000 and Titan Xp. 3584->3840 is not 30%.
 

Head1985

Golden Member
Jul 8, 2014
1,867
699
136
Right so what is the takeaway for actual gamers? Like with GP100 we at least saw that while Pascal was basically a shrunk Maxwell it had a few marvels such as craftmanship which enabled it to allow the 16nm process to make it clock a lot higher which is where all the performance gains came from aside from packing more shaders in per mm also thanks to the process .

With this basically being same clocks and density and with all the Volta changes announced as deep learning specific.. Are we just expecting Gv104 and GV102 just to have slightly higher shaders at about the same clocks? Maybe AMD was on to something when they teased "Poor Volta".
More SP
GTX1080 2560sp gtx2080 3584SP
TITANXP 3840sp Volta TITAN 5376SP

If they manage 10% IPC gain it will be 50% faster than pascal cards.Also they will have GDDR6
 
  • Like
Reactions: psolord

Glo.

Diamond Member
Apr 25, 2015
5,930
4,991
136
But without the non-important features it is already 3840 cores in the P6000 and Titan Xp. 3584->3840 is not 30%.
But the most important thing, that affects performance of Nvidia GPUs in biggest way is increased. By a factor of 2.

Registry File size. Titan Xp has 3840 CUDA cores, each SM in it has 128 Cores. Each SM is fed by particular Registry File size. The same Registry File size is available to 64 cores in GP100 chip. So they are less starved for resources. This is the reason why Maxwell was such huge jump in efficiency over Kepler. It was not because of Tile Based Rasterization, it does not increase performance, but efficiency(saves power required to move the data). It was because of lowered number of cores in Maxwell architecture that had access to particular pool of resources.

Funniest part: GP102, GP104 have had the same SM/Registry file size layout as Maxwell, that is why there was no difference in performance clock for clock/core for core.

Maxwell 128 cores have had 90% of performance of Kepler 192 cores, because of this very reason.

So you should get right now the picture.

Either way, even if Nvidia will reuse GP100 architecture in consumer market, you will get the improvement in performance. However, using GV100 architecture, without HPC stuff - that will increase performance even further.
 

tamz_msc

Diamond Member
Jan 5, 2017
3,865
3,730
136
Right so what is the takeaway for actual gamers? Like with GP100 we at least saw that while Pascal was basically a shrunk Maxwell it had a few marvels such as craftmanship which enabled it to allow the 16nm process to make it clock a lot higher which is where all the performance gains came from aside from packing more shaders in per mm also thanks to the process .

With this basically being same clocks and density and with all the Volta changes announced as deep learning specific.. Are we just expecting Gv104 and GV102 just to have slightly higher shaders at about the same clocks? Maybe AMD was on to something when they teased "Poor Volta".
There are two possibilities - similar die size with slightly faster clocks(unlikely as architectural changes wont alone be enough for the generational improvement from x80Ti->x80 we have seen before), or a bigger 420-450mm^2 GV104 with more cores that should easily beat the GTX 1080Ti in the form of a GTX 2080.

EDIT: I believe the gains would be along the lines of the GTX 780Ti to GTX 980.
 

Head1985

Golden Member
Jul 8, 2014
1,867
699
136
But the most important thing, that affects performance of Nvidia GPUs in biggest way is increased. By a factor of 2.

Registry File size. Titan Xp has 3840 CUDA cores, each SM in it has 128 Cores. Each SM is fed by particular Registry File size. The same Registry File size is available to 64 cores in GP100 chip. So they are less starved for resources. This is the reason why Maxwell was such huge jump in efficiency over Kepler. It was not because of Tile Based Rasterization, it does not increase performance, but efficiency(saves power required to move the data). It was because of lowered number of cores in Maxwell architecture that had access to particular pool of resources.

Funniest part: GP102, GP104 have had the same SM/Registry file size layout as Maxwell, that is why there was no difference in performance clock for clock/core for core.

Maxwell 128 cores have had 90% of performance of Kepler 192 cores, because of this very reason.

So you should get right now the picture.

Either way, even if Nvidia will reuse GP100 architecture in consumer market, you will get the improvement in performance. However, using GV100 architecture, without HPC stuff - that will increase performance even further.
Wrong.Register file size is same per SM as pascal GP100 or maxwell.
2017-05-10vfs2p.jpg
 

Glo.

Diamond Member
Apr 25, 2015
5,930
4,991
136
Wrong.Register file size is same per SM as pascal GP100 or maxwell.
2017-05-10vfs2p.jpg
Yes it is. But the amount of cores that have access to this "pool" of data is lower in GP100 chip than any Maxwell/Consumer Pascal GPU. Similar situation is with Kepler vs Maxwell.
Kepler 192 cores/256 KB RF Size.
Maxwell - 128 Cores/256 KB RF Size.
GP10X - 128 cores/256 KB RF Size.
GP100 chip - 64 cores/256 KB RF Size.
GV100 - 64 cores/256 KB RF Size.

That is why you get increase in performance in Nvidia GPUs. The cores are "less starved" for resources with each generation.

I have to say. Right now I am a bit staggered. I have looked in the wrong part of the diagram, after all.

There may be no difference in FP32 performance in GV100 compared to GP100 chip, clock for clock, core for core. It has the same 256 KB available to the same 64 cores as are in GP100.


It will be actually interesting to observe the performance of GV100 chip. It appears that there was a point why Nvidia demoed today only DL and nothing else. No word on gaming, FP32 improvement, nothing else. It appears that GV100 only architectural improvements may be in DL and in scheduling, but not overall throughput of the GPUs.
 
Last edited:
May 11, 2008
22,924
1,505
126
Wrong.Register file size is same per SM as pascal GP100 or maxwell.
2017-05-10vfs2p.jpg

Aha, just i thought. gv100 runs a bit slower in clocks. But has way more cuda cores. Yet, if you compare the theoretical FP32 throughput of both, gv100 is actually a bit slower than gp100 at the same clock speed.