NVIDIA GeForce 20 Series (Volta) to be released later this year - GV100 announced

Page 16 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

crisium

Platinum Member
Aug 19, 2001
2,643
615
136
i expecting volta will have better delta color compresion and 2070 wont need GDDR6.

2080
3584SP
256bit 12GB GDDR6 14-16Ghz
10-20% faster than 1080TI

2070
2560sp
256bit 12GB GDDR5x 11-12Ghz
2070 will run max out with low oc headroom.10% slower than 1080TI.I dont expecting 2070 match 1080TI because there is huge gap now vs them.1080TI is 65-75% faster than 1070.They manage only 50% performance gain with 970 to 1070.So 2070 will need bring 70% performance gain vs 1070 and thats not gonna happen.

Edit:
2060
1792SP
192bit 9Ghz DDR5 or 10Ghz GDDR5x
10%faster than 1070

970 and 1070 each were able to barely equal or beat out the old Titan (not Titan Black though). 1070 was cut even more because Nv could afford to do so and still beat the Titan X.

I find it hard to believe that Nvidia would abandon this tradition. If a 2560cc GV104 is 10% slower than a 1080 Ti as you are guessing, then I would expect the 1070 to be at least 2816cc.

Nvidia has twice been able to proclaim $1000 performance for only $330/$380 with their x70. They will not suddenly settle for 90% of $700 performance for only ~$400 (or wherever they target), imo.

Fun to speculate. I have no idea the core count, but am betting that it will at least come within 1-2% of 1080 Ti if not the TXp. You could very well be right though.
 
Last edited:

LTC8K6

Lifer
Mar 10, 2004
28,520
1,575
126
I don't think GDDR5X would be used any longer. By early 2018 Hynix, Micron and Samsung should shave GDDR6 available in plentiful supply.
Yes, GDDR5 is plenty good enough for the lower end cards, so GDDR5X will probably fade out and we will have GDDR5 and GDDR6 and also HBM2.
 

Head1985

Golden Member
Jul 8, 2014
1,867
699
136
970 and 1070 each were able to barely equal or beat out the old Titan (not Titan Black though). 1070 was cut even more because Nv could afford to do so and still beat the Titan X.

I find it hard to believe that Nvidia would abandon this tradition. If a 2560cc GV104 is 10% slower than a 1080 Ti as you are guessing, then I would expect the 1070 to be at least 2816cc.

Nvidia has twice been able to proclaim $1000 performance for only $330/$380 with their x70. They will not suddenly settle for 90% of $700 performance for only ~$400 (or wherever they target), imo.

Fun to speculate. I have no idea the core count, but am betting that it will at least come within 1-2% of 1080 Ti if not the TXp. You could very well be right though.
If they want match 1080Ti they will need increase performance from 1070 by 70%.They will need cutdown it very little vs GTX2080 to manage that.Nv usually target 50-55% perf increase.
 

crisium

Platinum Member
Aug 19, 2001
2,643
615
136
970-1070 61% at 1070 launch
https://tpucdn.com/reviews/NVIDIA/GeForce_GTX_1070/images/perfrel_2560_1440.png

1070-1080 Ti 56% at 1080 Ti launch
https://tpucdn.com/reviews/NVIDIA/GeForce_GTX_1080_Ti/images/perfrel_2560_1440.png

Or using the latest TPU review (using reference models):
970-1070 = 55% and 1070-1080Ti = 61%
https://tpucdn.com/reviews/Zotac/GeForce_GTX_1080_Ti_Amp_Extreme/images/perfrel_2560_1440.png

And that's using 1440p. At 1080p (still surprisingly popular) it's a much closer lead. I wouldn't use 4K for a card like this. But they look about the same.

It's up to how they want to compete with themselves though. A strong x70 relative to older Tis and Titans helps sell the x70. A strong x70 relative to the new x80 hurts the higher margin x80. But the x70 is the big mover.

With Pascal they were able to beat the Titan X and cut the 1070 down more than before. They may have to choose here, and I'm betting on beefier 2070 in that case.

But who knows Voltas true performance. Fun to speculate. ;-)
 

Head1985

Golden Member
Jul 8, 2014
1,867
699
136
970-1070 61% at 1070 launch
https://tpucdn.com/reviews/NVIDIA/GeForce_GTX_1070/images/perfrel_2560_1440.png

1070-1080 Ti 56% at 1080 Ti launch
https://tpucdn.com/reviews/NVIDIA/GeForce_GTX_1080_Ti/images/perfrel_2560_1440.png

Or using the latest TPU review (using reference models):
970-1070 = 55% and 1070-1080Ti = 61%
https://tpucdn.com/reviews/Zotac/GeForce_GTX_1080_Ti_Amp_Extreme/images/perfrel_2560_1440.png

And that's using 1440p. At 1080p (still surprisingly popular) it's a much closer lead. I wouldn't use 4K for a card like this. But they look about the same.

It's up to how they want to compete with themselves though. A strong x70 relative to older Tis and Titans helps sell the x70. A strong x70 relative to the new x80 hurts the higher margin x80. But the x70 is the big mover.

With Pascal they were able to beat the Titan X and cut the 1070 down more than before. They may have to choose here, and I'm betting on beefier 2070 in that case.

But who knows Voltas true performance. Fun to speculate. ;-)
If GTX2080 have 3584SP thats 7xTPC per GPC or 896SP.And all x04card have 4x GPC.
If they cut whole GPC out like with 1070 it is 2688SP.So probably 2688SP minimum.

Btw with 2688SP they will need 1980Mhz to match stock 1080TI flops performance(at 1480mhz)
 
  • Like
Reactions: psolord

Glo.

Diamond Member
Apr 25, 2015
5,845
4,855
136
2560 CUDA core Volta chip will be much faster than GP104 chip, with 2560 CUDA cores, even with lower core clock.

Your calculations, and estimations are so far way off target.
 

Head1985

Golden Member
Jul 8, 2014
1,867
699
136
2560 CUDA core Volta chip will be much faster than GP104 chip, with 2560 CUDA cores, even with lower core clock.

Your calculations, and estimations are so far way off target.
same as you.How do you know volta is much faster per clock?:)
 
Mar 10, 2006
11,715
2,012
126
same as you.How do you know volta is much faster per clock?:)

From NVIDIA's blog:

  • New Streaming Multiprocessor (SM) Architecture Optimized for Deep Learning Volta features a major new redesign of the SM processor architecture that is at the center of the GPU. The new Volta SM is 50% more energy efficient than the previous generation Pascal design, enabling major boosts in FP32 and FP64 performance in the same power envelope. New Tensor Cores designed specifically for deep learning deliver up to 12x higher peak TFLOPs for training. With independent, parallel integer and floating point datapaths, the Volta SM is also much more efficient on workloads with a mix of computation and addressing calculations. Volta’s new independent thread scheduling capability enables finer-grain synchronization and cooperation between parallel threads. Finally, a new combined L1 Data Cache and Shared Memory subsystem significantly improves performance while also simplifying programming.

And...

Overall shared memory across the entire GV100 GPU is increased due to the increased SM count and potential for up to 96 KB of Shared Memory per SM, compared to 64 KB in GP100.

Unlike Pascal GPUs, which could not execute FP32 and INT32 instructions simultaneously, the Volta GV100 SM includes separate FP32 and INT32 cores, allowing simultaneous execution of FP32 and INT32 operations at full throughput, while also increasing instruction issue throughput. Dependent instruction issue latency is also reduced for core FMA math operations, requiring only four clock cycles on Volta, compared to six cycles on Pascal.
 

Glo.

Diamond Member
Apr 25, 2015
5,845
4,855
136
same as you.How do you know volta is much faster per clock?:)
I have already posted the answer for this question in this thread on previous pages.

The most significant thing for FP32 performance - Registry File Size/Core count ratio has been increased 2 times.

Kepler had 192 core/256 KB Register File Size
Maxwell - 128 cores/256 KB Register File Size
Consumer Pascal - 128 cores/256 KB Register File size
GP100 chip - 64 cores/256 KB Register File Size
Volta has 64 cores/256 KB register File size.

Thanks to this switch Nvidia was able to maintain 90% of performance in 128 cores of Maxwell compared to 192 cores of Kepler architecure.

Expect similar switch with Volta.
 

Despoiler

Golden Member
Nov 10, 2007
1,967
772
136
i expecting volta will have better delta color compresion and 2070 wont need GDDR6.

2080
3584SP
256bit 12GB GDDR6 14-16Ghz
10-20% faster than 1080TI

2070
2560sp
256bit 12GB GDDR5x 11-12Ghz
2070 will run max out with low oc headroom.10% slower than 1080TI.I dont expecting 2070 match 1080TI because there is huge gap now vs them.1080TI is 65-75% faster than 1070.They manage only 50% performance gain with 970 to 1070.So 2070 will need bring 70% performance gain vs 1070 and thats not gonna happen.

Edit:
2060
1792SP
192bit 9Ghz DDR5 or 10Ghz GDDR5x
10%faster than 1070


I thought both camps are already close to extracting the most they can from delta color compression. There is only so much you can compress after all.
 

Head1985

Golden Member
Jul 8, 2014
1,867
699
136
I have. Search for it. FP32 is what you are looking at. Or you can just look at my edited post, you quoted.
Again wrong how do you know volta for gaming have 64SP per TPC?Pascal for deep learning have also 64SP per TPC but for gaming its still 128SP/TPC like maxwell.
 

Glo.

Diamond Member
Apr 25, 2015
5,845
4,855
136
Again wrong how do you know volta for gaming have 64SP per TPC?Pascal for deep learning have also 64SP per TPC but for gaming its still 128SP/TPC like maxwell.
Yes, you are correct that Nvidia still can offer rebranded Maxwell Architecture, as Volta, like they did with Pascal. But that will be just ridiculous.

If they want progress that comes from anywhere else, they have to use Volta or at least GP100 chip architecture.
 

Head1985

Golden Member
Jul 8, 2014
1,867
699
136
Yes, you are correct that Nvidia still can offer rebranded Maxwell Architecture, as Volta, like they did with Pascal. But that will be just ridiculous.

If they want progress that comes from anywhere else, they have to use Volta or at least GP100 chip architecture.
They really dont need do anything with that amount of SP.Faster GDDR6 and new delta color compresion to compensate 256bit.Thats all.
With same clock speed as pascal the performance will be there.
 

Glo.

Diamond Member
Apr 25, 2015
5,845
4,855
136
They really dont need do anything with that amount of SP.Faster GDDR6 and new delta color compresion to compensate 256bit.
Nope.

The cores are not smaller. There will be no technological advantage if they will stay with Maxwell/Pascal architecture layout.
 

Head1985

Golden Member
Jul 8, 2014
1,867
699
136
Nope.

The cores are not smaller. There will be no technological advantage if they will stay with Maxwell/Pascal architecture layout.
They dont need it they have monopoly lol.
3584SP at 1900Mhz with 16Ghz memory will beat 1080TI even with zero IPC gain by 10-20%.
 

crisium

Platinum Member
Aug 19, 2001
2,643
615
136
How do you know that to 64 cores has significant increase in gaming FP32 performance? A P100 must have been benchmark for gaming performance somewhere right? How does it compare to GP102?
 

Glo.

Diamond Member
Apr 25, 2015
5,845
4,855
136
They dont need it they have monopoly lol.
3584SP at 1900Mhz with 16Ghz memory will beat 1080TI even with zero IPC gain by 10-20%.
You are underestimating AMD. Thats where the problem lies.
 

Glo.

Diamond Member
Apr 25, 2015
5,845
4,855
136
How do you know that to 64 cores has significant increase in gaming FP32 performance? A P100 must have been benchmark for gaming performance somewhere right? How does it compare to GP102?
Because Maxwell was huge leap in gaming performance versus core count, compared to Kepler. Wasn't it?
Pascal clock for clock, core for core was no different in gaming performance than Maxwell. I think it is obvious right now, why.

Tile Based Rasterization does not bring performance increase, unless you get improved culling mechanism, but it is still few % in performance increase. It brings massive efficiency savings, tho, because it saves quite a lot of power consumed, and required to move the data.
 

Head1985

Golden Member
Jul 8, 2014
1,867
699
136
Because Maxwell was huge leap in gaming performance versus core count, compared to Kepler. Wasn't it?
Pascal clock for clock, core for core was no different in gaming performance than Maxwell. I think it is obvious right now, why.

Tile Based Rasterization does not bring performance increase, unless you get improved culling mechanism, but it is still few % in performance increase. It brings massive efficiency savings, tho, because it saves quite a lot of power consumed, and required to move the data.
Maxwell doubled Rops, have new polymorph engine and 4x bigger L2cache vs kepler + tile rendering and L1 cache was 50% bigger.
https://www.computerbase.de/2014-09/geforce-gtx-980-970-test-sli-nvidia/
 
Last edited:

Glo.

Diamond Member
Apr 25, 2015
5,845
4,855
136
Maxwell doubled Rops, have new polymorph engine and 4x L2cache vs kepler + tile rendering and L1 cache was 50% bigger.
https://www.computerbase.de/2014-09/geforce-gtx-980-970-test-sli-nvidia/
Do you know the difference between Rendering and Rasterization?

Secondly, there is no Tiling going on in Maxwell, which has been actually confirmed by Nvidia:
http://www.realworldtech.com/forum/?threadid=159876&curpostid=168154
http://www.hardware.fr/news/15027/gdc-nvidia-parle-tile-caching-maxwell-pascal.html

Thirdly, nothing what you quoted increased the throughput of the cores. The cores became less starved for resources, thats why they had increased throughput.

This change actually not only increased gaming performance, but also, raw compute performance. 4 TFLOPs GTX 980 was faster in compute than GTX 780 Ti, which have had more cores, and more compute performance(theoretical).

Check the benchmarks: http://www.anandtech.com/show/8526/nvidia-geforce-gtx-980-review/20