NVIDIA Volta Rumor Thread

stahlhart · Aug 22, 2017

biostud said:
Any volta rumors in this thread or just business talk?

Agree -- thread cleaned; get this discussion back on topic.
-- stahlhart

PeterScott · Aug 22, 2017

Konan said:
With SK Hynix confirmed they're entering volume production of GDDR6 ram for a high end graphics card early 2018. 12 nm FinFET is scheduled to hit mass production before the year is over.
Both those together suggests an Q1 2018 launch for Volta unless nvidia sits on it. It could be that TSMC's "new 12nm node" is just a rehash of their 16nm node not necessarily a die shrink. Then again, it could very well be 12nm is taped out early Q1 2018, with decent supplies starting at Q2 2018

12 nm FFN was what was used for the V100 die. It is mainly an enhancement of 16nm without much of density increase. The N in FFN is for NVidia. It's a process customized for NVidia.

To me that implies a lot of work (money) from both parties tweaking that process specifically for NVidias needs.

It only makes sense that after an investment of time and money in the process, NVidia would use it in mainstream products. That they already have a niche product built on it, shows it is working, and as a tweak on 16nm, it is likely less risk as well.

So I think NVidia will go the safe route and build the next generation on 12nm FFN. They will have to make the dies a bit larger to boost performance, but costs are likely not that different between building larger on a mature process and building smaller on bleeding edge one.

I think tapeout is in 2017, with Q1 2018 is Volta starting to roll out. 12nm FFN is a safe process bet that should pose no risk for this timeline.

Dayman1225 · Aug 22, 2017

https://www.servethehome.com/nvidia-v100-volta-update-hot-chips-2017/

Nvidia talk about Volta at hot chips:

Highlights
GV100 SM

Twice the schedulers
Large, fast L1 cache
Improved SIMT model
Tensor acceleration
+50% energy efficiency vs GP100 SM

SM Microarchitecture

Shared L1 cache
4 independently scheduled sub-cores
Shared MIO

Sub-Core

Warp Scheduler - 1 Warp instruction/clock, L0 cache, branch unit
Math Dispatch Unit - Keeps 2+ datapaths busy
MIO Instruction queue
Two 4x4x4 Tensor Cores

L1 and Shared Memory

Streaming L1 cache - 4x bandwidth vs GP100, 4x capacity vs GP100
Shared Memory - Unified Storage with L1 cache, Configurable up to 96KB

Tensor Core

Mixed Precision Matrix Math 4x4 matrices
With improved scheduling, GV100 can do 16x16 matrix math
V100 (CUDA 9 + Tensor Cores) = 9.3x faster for cuBLAS Mixed Precision vs P100 (CUDA 8)

NVLINK Updates

New GV100 NVLINK offer 1.9x more bandwidth vs GP100
6 NVLINK connections

Rifter · Aug 22, 2017

So with the vega flop im going to go g-sync and Volta for my next GPU upgrade.

What kind of performance uplift can we expect over 10xx series? 20-30%

crisium · Aug 22, 2017

We can only go by history. They tend to have managed to do +50% in each product stack, i.e. 1080-2080. 1080 Ti to 2080 will be much less. With 50% efficiency improvement claims, this helps make that possible if they target same TDP.

But it could be less, who knows.

raghu78 · Aug 22, 2017

Volta is shaping up to be one of the truly great GPU architectures after Fermi and Maxwell. Nvidia has been executing flawlessly for the past few generations. Its unfortunate the opposite is the case with the competition.

Qwertilot · Aug 22, 2017

They got ~50% from V100 vs P100, so something of that magnitude is likely for the 'normal' cards.

PeterScott · Aug 22, 2017

Qwertilot said:
They got ~50% from V100 vs P100, so something of that magnitude is likely for the 'normal' cards.

But the die size grew dramatically, something you won't expect from normal cards unless all the price points shift up again.

Puffnstuff · Aug 22, 2017

PeterScott said:
So does Anand, and it's more detailed:
http://www.anandtech.com/show/11367...v100-gpu-and-tesla-v100-accelerator-announced

Consumer Volta could be taped out on 12 nm FFN by now, and they are working toward an early 2018 release.

Always good to have multiple sources of information to correlate the data.

http://www.tomshardware.com/news/nvidia-tesla-v100-volta-gpu,34379.html

Glo. · Aug 22, 2017

Qwertilot said:
They got ~50% from V100 vs P100, so something of that magnitude is likely for the 'normal' cards.

They got 50% more TFLOPs. Im wondering if it is possible 1024 CUDA cores on GV107, with this improvement: 1024 CUDA cores, 1.5 GHz = 3 TFLOPs. 50% more than GTX 1050 Ti.

On paper makes sense.

Rifter · Aug 22, 2017

If they do manage to get 50% thats amazing. And ill be one of the first in line for one!

Glo. · Aug 23, 2017

I have looked at latest benchmarks, and benchmark differences between GTX 1060 and GTX 980 Ti. The difference between both of those GPU is so small that I genuinely think that GV107 chip can be extremely close to GTX 980 Ti in performance, If the rumored performance increase of Volta will happen and translate everywhere.

This is getting more interesting with every single day.

Tup3x · Aug 24, 2017

GTX 1060 is close to GTX 980. Step down Volta probably is close to GTX 980 as well but not the Ti. GTX 2070 should be really interesting... If it can do the same what GTX 1070 did.

Glo. · Aug 24, 2017

Tup3x said:
GTX 1060 is close to GTX 980. Step down Volta probably is close to GTX 980 as well but not the Ti. GTX 2070 should be really interesting... If it can do the same what GTX 1070 did.

GTX 980 Ti is on average 20% faster than GTX 1060, in new games. Sometimes the gap between both of them is smaller, around 7-10%. It depends on game and settings.

For example in BF1 GTX 980 Ti is 15% faster than GTX 1060. Let GTX 2050 Ti be 10-15% faster than GTX 1060, maintaining the same performance gap between GTX 960 and GTX 1050 Ti, and you are looking at that performance level.

Tup3x · Aug 24, 2017

Glo. said:
GTX 980 Ti is on average 20% faster than GTX 1060, in new games. Sometimes the gap between both of them is smaller, around 7-10%. It depends on game and settings.

For example in BF1 GTX 980 Ti is 15% faster than GTX 1060. Let GTX 2050 Ti be 10-15% faster than GTX 1060, maintaining the same performance gap between GTX 960 and GTX 1050 Ti, and you are looking at that performance level.

Different situation. That was architectural changes and massive node shrink. At best they can do similar improvement than what they did with Maxwell but I somewhat doubt that they can pull improvement like that again.

Glo. · Aug 24, 2017

Tup3x said:
Different situation. That was architectural changes and massive node shrink. At best they can do similar improvement than what they did with Maxwell but I somewhat doubt that they can pull improvement like that again.

Maxwell increased the throughput of cores in CUDA architecture, thanks to shift from 192 cores/256 KB Register File size, to 128 core/256 KB Register File Size. Consumer Pascal maintained this layout, hence the no difference in core for core, clock for clock performance.

Pascal GP100 and Volta GV100 have 64 cores/256 KB Register File Size. So if Nvidia will use this layout, we will see again the same performance increase as we have seen with Maxwell versus Kepler.

However. Volta is much more advanced layout, and will have much better utilization, and IPC than even Pascal using the same core/RFS layout as Volta, because it is not as advanced.

From what Sweeper has touted, GV104 will score somewhere between 16000 and 17000 pts in 3dMark Fire Strike Extreme. This means GV104 will be 65% faster than GP104 has been.

I don't think its possible to pull off this type of improvement without changing the layout of the architecture, on similarly sized process node.

PeterScott · Aug 24, 2017

Rifter said:
If they do manage to get 50% thats amazing. And ill be one of the first in line for one!

Yeah. I don't think anyone should expect that. That is how you end up with Vega level disappointment. People read about new Vega features and start assuming IPC increases of 30%, and then it gets delivered and no IPC gains materialize...

New features these days are more likely to show improvements in some games, but not big across the board increases.

The real across the board increases will likely depend a lot on Core count increases (or clock speed increases), and if using 12nm FFN, there isn't much density increase, so core count increases would depend on making chips bigger. Making them 50% bigger doesn't seem to be in the cards.

So don't expect 50% increases across the board. If that turns up it would be a pleasant surprise, but I certainly wouldn't expect it.

crisium · Aug 24, 2017

Nvidia have consistently delivered 50% or greater from Kepler-Maxwell-Pascal. I see no reason they cannot do so again. Maxwell chips were also larger than Kepler in addition to higher IPC in order to get that 50%, so they can do the same with Volta.

IEC · Aug 24, 2017

PeterScott said:
Yeah. I don't think anyone should expect that. That is how you end up with Vega level disappointment. People read about new Vega features and start assuming IPC increases of 30%, and then it gets delivered and no IPC gains materialize...

New features these days are more likely to show improvements in some games, but not big across the board increases.

The real across the board increases will likely depend a lot on Core count increases (or clock speed increases), and if using 12nm FFN, there isn't much density increase, so core count increases would depend on making chips bigger. Making them 50% bigger doesn't seem to be in the cards.

So don't expect 50% increases across the board. If that turns up it would be a pleasant surprise, but I certainly wouldn't expect it.

This is nVidia we are talking about, not RTG. Past performance may not necessarily predict the future... but given their R&D budget and the capability to make a gaming-focused design in addition to their HPC/datacenter designs, Volta is not going to be another Vega.

PeterScott · Aug 24, 2017

crisium said:
Nvidia have consistently delivered 50% or greater from Kepler-Maxwell-Pascal. I see no reason they cannot do so again. Maxwell chips were also larger than Kepler in addition to higher IPC in order to get that 50%, so they can do the same with Volta.

Just because they did it twice, doesn't mean they can do it at will, forever.

IEC said:
This is nVidia we are talking about, not RTG. Past performance may not necessarily predict the future... but given their R&D budget and the capability to make a gaming-focused design in addition to their HPC/datacenter designs, Volta is not going to be another Vega.

I am not saying it will be a fiasco, like Vega. But it could very will be 35-40% improvement, at each tier, which would still be great, but if you go in expecting 50%, then we get a bunch of wailing because it wasn't 50%. Ultimately we have to wait and see.

Tup3x · Aug 24, 2017

Glo. said:
Maxwell increased the throughput of cores in CUDA architecture, thanks to shift from 192 cores/256 KB Register File size, to 128 core/256 KB Register File Size. Consumer Pascal maintained this layout, hence the no difference in core for core, clock for clock performance.

Pascal GP100 and Volta GV100 have 64 cores/256 KB Register File Size. So if Nvidia will use this layout, we will see again the same performance increase as we have seen with Maxwell versus Kepler.

However. Volta is much more advanced layout, and will have much better utilization, and IPC than even Pascal using the same core/RFS layout as Volta, because it is not as advanced.

From what Sweeper has touted, GV104 will score somewhere between 16000 and 17000 pts in 3dMark Fire Strike Extreme. This means GV104 will be 65% faster than GP104 has been.

I don't think its possible to pull off this type of improvement without changing the layout of the architecture, on similarly sized process node.

Here's the thing... I'm not sure if the current Volta whitepaper applies to consumer version. For example NVIDIA will remove tensor cores for sure. Who knows what other things they will cut. For that reason I am a bit conservative with my expectations.

Glo. · Aug 24, 2017

Tup3x said:
Here's the thing... I'm not sure if the current Volta whitepaper applies to consumer version. For example NVIDIA will remove tensor cores for sure. Who knows what other things they will cut. For that reason I am a bit conservative with my expectations.

Even reusing GP100 chip architecture will bring 30-40% performance increase at the same clock speeds and core counts. Using Volta - the possibilities are bigger, because of improved cache, separate FP and INT cores, improved scheduling, massively, and the partitioning of each SM into 4 portions which improves load balancing, and scheduling even further.

Considering there are rumors about bigger die size of GV104 compared to GP104, there is a chance we will see actual Volta architecture.

And one thing that could confirm this also is the rumor that consumer GPUs have to have Tensor cores for compatibility reasons(writing and testing software would that way had to be done on GV100 chips, if Nvidia would remove tensor cores completely).

I think what we can see from Nvidia is this, as it goes for GDDR VRAM in GPUs:
GV104 - GDDR6, 256 Bit, 16 GB
GV106 - GDDR5X, 256 Bit, 8 GB
GV107 - GDDR5X 192 Bit, 6 GB.

raghu78 · Aug 24, 2017

Glo. said:
Even reusing GP100 chip architecture will bring 30-40% performance increase at the same clock speeds and core counts. Using Volta - the possibilities are bigger, because of improved cache, separate FP and INT cores, improved scheduling, massively, and the partitioning of each SM into 4 portions which improves load balancing, and scheduling even further.

Considering there are rumors about bigger die size of GV104 compared to GP104, there is a chance we will see actual Volta architecture.

And one thing that could confirm this also is the rumor that consumer GPUs have to have Tensor cores for compatibility reasons(writing and testing software would that way had to be done on GV100 chips, if Nvidia would remove tensor cores completely).

I think what we can see from Nvidia is this, as it goes for GDDR VRAM in GPUs:
GV104 - GDDR6, 256 Bit, 16 GB
GV106 - GDDR5X, 256 Bit, 8 GB
GV107 - GDDR5X 192 Bit, 6 GB.

I am quite sure that Volta architecture will power the next gen Geforce stack. I think this is how it could turn out

GV102 - 384 bit GDDR6 at 14-16 Gbps - 672 - 768 GB/s
GV104 - 256 bit GDDR6 at 14-16 Gbps - 448 - 512 GB/s
GV106 - 192 bit GDDR5X at 11 Gbps - 264 GB/s
GV107 - 128 bit GDDR5X at 11 Gbps - 176 GB/s

Nvidia would want to keep the memory bus at the same sizes as Pascal as that would allow them to keep memory I/O power and board costs under control. Thats the reason I do not see Nvidia increase memory bus width for GV106/GV107. Volta is shaping up to be a true powerhouse and could go down as one of the most successful and forward looking GPU architectures ever after the legendary G80.

Glo. · Aug 24, 2017

raghu78 said:
I am quite sure that Volta architecture will power the next gen Geforce stack. I think this is how it could turn out

GV102 - 384 bit GDDR6 at 14-16 Gbps - 672 - 768 GB/s
GV104 - 256 bit GDDR6 at 14-16 Gbps - 448 - 512 GB/s
GV106 - 192 bit GDDR5X at 11 Gbps - 264 GB/s
GV107 - 128 bit GDDR5X at 11 Gbps - 176 GB/s

Nvidia would want to keep the memory bus at the same sizes as Pascal as that would allow them to keep memory I/O power and board costs under control. Thats the reason I do not see Nvidia increase memory bus width for GV106/GV107. Volta is shaping up to be a true powerhouse and could go down as one of the most successful and forward looking GPU architectures ever after the legendary G80.

I assumed that increase in bus width is what is required to properly fed GV architecture, and each generation Nvidia increases the amounts of VRAM available in specific price tiers, and also increased the memory bus width.

For example if they decide to use GV107 with 192 Bit GDDR5X, they can for example offer something like this:
GTX 2050 Ti - 6 GB GDDR5X
GTX 2050 - 3 GB GDDR5. In both cases it would be improvement over the GPUs they replaced. GDDR5X memory controller is backwards compatible with GDDR5.

nvgpu · Aug 24, 2017

http://www.anandtech.com/show/11398...am-gddr6-added-to-catalogue-gddr5-gets-faster

Nvidia could use GDDR6 with 192bit memory bus, 12GT/s GDDR6 would give 288GB/s of memory bandwidth on 192bit memory bus and since Samsung, SK Hynix and Micron will make GDDR6 memory chips, there will be competition and better pricing.

NVIDIA Volta Rumor Thread

Super Moderator Graphics Cards

Platinum Member

Golden Member

Lifer

Platinum Member

Diamond Member

Golden Member

Platinum Member

Lifer

Diamond Member

Lifer

Diamond Member

Senior member

Diamond Member

Senior member

Diamond Member

Platinum Member

Platinum Member

Elite Member

Platinum Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Senior member