NVIDIA Volta Rumor Thread

Page 9 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

PeterScott

Platinum Member
Jul 7, 2017
2,605
1,540
136
With SK Hynix confirmed they're entering volume production of GDDR6 ram for a high end graphics card early 2018. 12 nm FinFET is scheduled to hit mass production before the year is over.
Both those together suggests an Q1 2018 launch for Volta unless nvidia sits on it. It could be that TSMC's "new 12nm node" is just a rehash of their 16nm node not necessarily a die shrink. Then again, it could very well be 12nm is taped out early Q1 2018, with decent supplies starting at Q2 2018

12 nm FFN was what was used for the V100 die. It is mainly an enhancement of 16nm without much of density increase. The N in FFN is for NVidia. It's a process customized for NVidia.

To me that implies a lot of work (money) from both parties tweaking that process specifically for NVidias needs.

It only makes sense that after an investment of time and money in the process, NVidia would use it in mainstream products. That they already have a niche product built on it, shows it is working, and as a tweak on 16nm, it is likely less risk as well.

So I think NVidia will go the safe route and build the next generation on 12nm FFN. They will have to make the dies a bit larger to boost performance, but costs are likely not that different between building larger on a mature process and building smaller on bleeding edge one.

I think tapeout is in 2017, with Q1 2018 is Volta starting to roll out. 12nm FFN is a safe process bet that should pose no risk for this timeline.
 

Dayman1225

Golden Member
Aug 14, 2017
1,152
973
146
https://www.servethehome.com/nvidia-v100-volta-update-hot-chips-2017/

Nvidia talk about Volta at hot chips:

Highlights
GV100 SM
  • Twice the schedulers

  • Large, fast L1 cache

  • Improved SIMT model

  • Tensor acceleration

  • +50% energy efficiency vs GP100 SM
SM Microarchitecture
  • Shared L1 cache

  • 4 independently scheduled sub-cores

  • Shared MIO
Sub-Core
  • Warp Scheduler - 1 Warp instruction/clock, L0 cache, branch unit

  • Math Dispatch Unit - Keeps 2+ datapaths busy

  • MIO Instruction queue

  • Two 4x4x4 Tensor Cores
L1 and Shared Memory
  • Streaming L1 cache - 4x bandwidth vs GP100, 4x capacity vs GP100

  • Shared Memory - Unified Storage with L1 cache, Configurable up to 96KB
Tensor Core
  • Mixed Precision Matrix Math 4x4 matrices

  • With improved scheduling, GV100 can do 16x16 matrix math

  • V100 (CUDA 9 + Tensor Cores) = 9.3x faster for cuBLAS Mixed Precision vs P100 (CUDA 8)
NVLINK Updates
  • New GV100 NVLINK offer 1.9x more bandwidth vs GP100

  • 6 NVLINK connections
 

Rifter

Lifer
Oct 9, 1999
11,522
751
126
So with the vega flop im going to go g-sync and Volta for my next GPU upgrade.

What kind of performance uplift can we expect over 10xx series? 20-30%
 
  • Like
Reactions: Kuosimodo

crisium

Platinum Member
Aug 19, 2001
2,643
615
136
We can only go by history. They tend to have managed to do +50% in each product stack, i.e. 1080-2080. 1080 Ti to 2080 will be much less. With 50% efficiency improvement claims, this helps make that possible if they target same TDP.

But it could be less, who knows.
 

raghu78

Diamond Member
Aug 23, 2012
4,093
1,475
136
Volta is shaping up to be one of the truly great GPU architectures after Fermi and Maxwell. Nvidia has been executing flawlessly for the past few generations. Its unfortunate the opposite is the case with the competition.
 

PeterScott

Platinum Member
Jul 7, 2017
2,605
1,540
136
They got ~50% from V100 vs P100, so something of that magnitude is likely for the 'normal' cards.

But the die size grew dramatically, something you won't expect from normal cards unless all the price points shift up again.
 

Puffnstuff

Lifer
Mar 9, 2005
16,011
4,781
136
Last edited:

Glo.

Diamond Member
Apr 25, 2015
5,642
4,379
136
They got ~50% from V100 vs P100, so something of that magnitude is likely for the 'normal' cards.
They got 50% more TFLOPs. Im wondering if it is possible 1024 CUDA cores on GV107, with this improvement: 1024 CUDA cores, 1.5 GHz = 3 TFLOPs. 50% more than GTX 1050 Ti.

On paper makes sense.
 

Glo.

Diamond Member
Apr 25, 2015
5,642
4,379
136
I have looked at latest benchmarks, and benchmark differences between GTX 1060 and GTX 980 Ti. The difference between both of those GPU is so small that I genuinely think that GV107 chip can be extremely close to GTX 980 Ti in performance, If the rumored performance increase of Volta will happen and translate everywhere.

This is getting more interesting with every single day.
 

Tup3x

Senior member
Dec 31, 2016
940
922
136
GTX 1060 is close to GTX 980. Step down Volta probably is close to GTX 980 as well but not the Ti. GTX 2070 should be really interesting... If it can do the same what GTX 1070 did.
 

Glo.

Diamond Member
Apr 25, 2015
5,642
4,379
136
GTX 1060 is close to GTX 980. Step down Volta probably is close to GTX 980 as well but not the Ti. GTX 2070 should be really interesting... If it can do the same what GTX 1070 did.
GTX 980 Ti is on average 20% faster than GTX 1060, in new games. Sometimes the gap between both of them is smaller, around 7-10%. It depends on game and settings.

For example in BF1 GTX 980 Ti is 15% faster than GTX 1060. Let GTX 2050 Ti be 10-15% faster than GTX 1060, maintaining the same performance gap between GTX 960 and GTX 1050 Ti, and you are looking at that performance level.
 

Tup3x

Senior member
Dec 31, 2016
940
922
136
GTX 980 Ti is on average 20% faster than GTX 1060, in new games. Sometimes the gap between both of them is smaller, around 7-10%. It depends on game and settings.

For example in BF1 GTX 980 Ti is 15% faster than GTX 1060. Let GTX 2050 Ti be 10-15% faster than GTX 1060, maintaining the same performance gap between GTX 960 and GTX 1050 Ti, and you are looking at that performance level.
Different situation. That was architectural changes and massive node shrink. At best they can do similar improvement than what they did with Maxwell but I somewhat doubt that they can pull improvement like that again.
 

Glo.

Diamond Member
Apr 25, 2015
5,642
4,379
136
Different situation. That was architectural changes and massive node shrink. At best they can do similar improvement than what they did with Maxwell but I somewhat doubt that they can pull improvement like that again.
Maxwell increased the throughput of cores in CUDA architecture, thanks to shift from 192 cores/256 KB Register File size, to 128 core/256 KB Register File Size. Consumer Pascal maintained this layout, hence the no difference in core for core, clock for clock performance.

Pascal GP100 and Volta GV100 have 64 cores/256 KB Register File Size. So if Nvidia will use this layout, we will see again the same performance increase as we have seen with Maxwell versus Kepler.

However. Volta is much more advanced layout, and will have much better utilization, and IPC than even Pascal using the same core/RFS layout as Volta, because it is not as advanced.

From what Sweeper has touted, GV104 will score somewhere between 16000 and 17000 pts in 3dMark Fire Strike Extreme. This means GV104 will be 65% faster than GP104 has been.

I don't think its possible to pull off this type of improvement without changing the layout of the architecture, on similarly sized process node.
 

PeterScott

Platinum Member
Jul 7, 2017
2,605
1,540
136
If they do manage to get 50% thats amazing. And ill be one of the first in line for one!

Yeah. I don't think anyone should expect that. That is how you end up with Vega level disappointment. People read about new Vega features and start assuming IPC increases of 30%, and then it gets delivered and no IPC gains materialize...

New features these days are more likely to show improvements in some games, but not big across the board increases.

The real across the board increases will likely depend a lot on Core count increases (or clock speed increases), and if using 12nm FFN, there isn't much density increase, so core count increases would depend on making chips bigger. Making them 50% bigger doesn't seem to be in the cards.

So don't expect 50% increases across the board. If that turns up it would be a pleasant surprise, but I certainly wouldn't expect it.
 
  • Like
Reactions: Kuosimodo

crisium

Platinum Member
Aug 19, 2001
2,643
615
136
Nvidia have consistently delivered 50% or greater from Kepler-Maxwell-Pascal. I see no reason they cannot do so again. Maxwell chips were also larger than Kepler in addition to higher IPC in order to get that 50%, so they can do the same with Volta.
 

IEC

Elite Member
Super Moderator
Jun 10, 2004
14,321
4,903
136
Yeah. I don't think anyone should expect that. That is how you end up with Vega level disappointment. People read about new Vega features and start assuming IPC increases of 30%, and then it gets delivered and no IPC gains materialize...

New features these days are more likely to show improvements in some games, but not big across the board increases.

The real across the board increases will likely depend a lot on Core count increases (or clock speed increases), and if using 12nm FFN, there isn't much density increase, so core count increases would depend on making chips bigger. Making them 50% bigger doesn't seem to be in the cards.

So don't expect 50% increases across the board. If that turns up it would be a pleasant surprise, but I certainly wouldn't expect it.

This is nVidia we are talking about, not RTG. Past performance may not necessarily predict the future... but given their R&D budget and the capability to make a gaming-focused design in addition to their HPC/datacenter designs, Volta is not going to be another Vega.
 
  • Like
Reactions: xpea

PeterScott

Platinum Member
Jul 7, 2017
2,605
1,540
136
Nvidia have consistently delivered 50% or greater from Kepler-Maxwell-Pascal. I see no reason they cannot do so again. Maxwell chips were also larger than Kepler in addition to higher IPC in order to get that 50%, so they can do the same with Volta.

Just because they did it twice, doesn't mean they can do it at will, forever.

This is nVidia we are talking about, not RTG. Past performance may not necessarily predict the future... but given their R&D budget and the capability to make a gaming-focused design in addition to their HPC/datacenter designs, Volta is not going to be another Vega.

I am not saying it will be a fiasco, like Vega. But it could very will be 35-40% improvement, at each tier, which would still be great, but if you go in expecting 50%, then we get a bunch of wailing because it wasn't 50%. Ultimately we have to wait and see.
 
  • Like
Reactions: Lodix and Kuosimodo

Tup3x

Senior member
Dec 31, 2016
940
922
136
Maxwell increased the throughput of cores in CUDA architecture, thanks to shift from 192 cores/256 KB Register File size, to 128 core/256 KB Register File Size. Consumer Pascal maintained this layout, hence the no difference in core for core, clock for clock performance.

Pascal GP100 and Volta GV100 have 64 cores/256 KB Register File Size. So if Nvidia will use this layout, we will see again the same performance increase as we have seen with Maxwell versus Kepler.

However. Volta is much more advanced layout, and will have much better utilization, and IPC than even Pascal using the same core/RFS layout as Volta, because it is not as advanced.

From what Sweeper has touted, GV104 will score somewhere between 16000 and 17000 pts in 3dMark Fire Strike Extreme. This means GV104 will be 65% faster than GP104 has been.

I don't think its possible to pull off this type of improvement without changing the layout of the architecture, on similarly sized process node.
Here's the thing... I'm not sure if the current Volta whitepaper applies to consumer version. For example NVIDIA will remove tensor cores for sure. Who knows what other things they will cut. For that reason I am a bit conservative with my expectations.
 

Glo.

Diamond Member
Apr 25, 2015
5,642
4,379
136
Here's the thing... I'm not sure if the current Volta whitepaper applies to consumer version. For example NVIDIA will remove tensor cores for sure. Who knows what other things they will cut. For that reason I am a bit conservative with my expectations.
Even reusing GP100 chip architecture will bring 30-40% performance increase at the same clock speeds and core counts. Using Volta - the possibilities are bigger, because of improved cache, separate FP and INT cores, improved scheduling, massively, and the partitioning of each SM into 4 portions which improves load balancing, and scheduling even further.

Considering there are rumors about bigger die size of GV104 compared to GP104, there is a chance we will see actual Volta architecture.

And one thing that could confirm this also is the rumor that consumer GPUs have to have Tensor cores for compatibility reasons(writing and testing software would that way had to be done on GV100 chips, if Nvidia would remove tensor cores completely).

I think what we can see from Nvidia is this, as it goes for GDDR VRAM in GPUs:
GV104 - GDDR6, 256 Bit, 16 GB
GV106 - GDDR5X, 256 Bit, 8 GB
GV107 - GDDR5X 192 Bit, 6 GB.
 
Last edited:

raghu78

Diamond Member
Aug 23, 2012
4,093
1,475
136
Even reusing GP100 chip architecture will bring 30-40% performance increase at the same clock speeds and core counts. Using Volta - the possibilities are bigger, because of improved cache, separate FP and INT cores, improved scheduling, massively, and the partitioning of each SM into 4 portions which improves load balancing, and scheduling even further.

Considering there are rumors about bigger die size of GV104 compared to GP104, there is a chance we will see actual Volta architecture.

And one thing that could confirm this also is the rumor that consumer GPUs have to have Tensor cores for compatibility reasons(writing and testing software would that way had to be done on GV100 chips, if Nvidia would remove tensor cores completely).

I think what we can see from Nvidia is this, as it goes for GDDR VRAM in GPUs:
GV104 - GDDR6, 256 Bit, 16 GB
GV106 - GDDR5X, 256 Bit, 8 GB
GV107 - GDDR5X 192 Bit, 6 GB.

I am quite sure that Volta architecture will power the next gen Geforce stack. I think this is how it could turn out

GV102 - 384 bit GDDR6 at 14-16 Gbps - 672 - 768 GB/s
GV104 - 256 bit GDDR6 at 14-16 Gbps - 448 - 512 GB/s
GV106 - 192 bit GDDR5X at 11 Gbps - 264 GB/s
GV107 - 128 bit GDDR5X at 11 Gbps - 176 GB/s

Nvidia would want to keep the memory bus at the same sizes as Pascal as that would allow them to keep memory I/O power and board costs under control. Thats the reason I do not see Nvidia increase memory bus width for GV106/GV107. Volta is shaping up to be a true powerhouse and could go down as one of the most successful and forward looking GPU architectures ever after the legendary G80.
 

Glo.

Diamond Member
Apr 25, 2015
5,642
4,379
136
I am quite sure that Volta architecture will power the next gen Geforce stack. I think this is how it could turn out

GV102 - 384 bit GDDR6 at 14-16 Gbps - 672 - 768 GB/s
GV104 - 256 bit GDDR6 at 14-16 Gbps - 448 - 512 GB/s
GV106 - 192 bit GDDR5X at 11 Gbps - 264 GB/s
GV107 - 128 bit GDDR5X at 11 Gbps - 176 GB/s

Nvidia would want to keep the memory bus at the same sizes as Pascal as that would allow them to keep memory I/O power and board costs under control. Thats the reason I do not see Nvidia increase memory bus width for GV106/GV107. Volta is shaping up to be a true powerhouse and could go down as one of the most successful and forward looking GPU architectures ever after the legendary G80.
I assumed that increase in bus width is what is required to properly fed GV architecture, and each generation Nvidia increases the amounts of VRAM available in specific price tiers, and also increased the memory bus width.

For example if they decide to use GV107 with 192 Bit GDDR5X, they can for example offer something like this:
GTX 2050 Ti - 6 GB GDDR5X
GTX 2050 - 3 GB GDDR5. In both cases it would be improvement over the GPUs they replaced. GDDR5X memory controller is backwards compatible with GDDR5.