NVIDIA GeForce 20 Series (Volta) to be released later this year - GV100 announced

tamz_msc · May 10, 2017

swilli89 said:
I don't hope its delayed/late. I love when new, faster tech comes out!

But what shape will GV104 take? If its IPC is not a Kepler - Maxwell jump then they are just basically refining and slotting down GP102's die size to the GV104 market for this year. If they have truly redone the architecture and there are large IPC gains then yeah its going to extremely fast.

It just sorta seems like to me that this incarnation of Volta is a tick for 16nm and won't really change their per mm2 performance.

More like Volta is the 'tock' while Pascal was the 'tick'. This time around there are quite significant changes in the arrangement of cores within each SM, even though overall parameters are quite similar.

Pascal never existed in the original roadmap, and if consumer cards are only a few months away, say middle of H2 2017, then Pascal looks likely to be the shortest lived architecture that got replaced in just over a year.

Arachnotronic · May 10, 2017

IllogicalGlory said:
It wasn't performance/mm^2, it was transistors/mm^2.

D'oh! I misread. My bad!

tamz_msc · May 10, 2017

xpea said:
I have edited my post.
But transistor density is not an important metric for us end-users. Performance is.

Transistor density is only for comparison between different nodes.

I would argue that area is the more important criteria for end-users to figure out where the price would end up. Looks like GV102 might end up at 600m^2, as an upper limit, coupled with the fact that you have an exclusive process, TSMC is undoubtedly charging more for wafers.

In short, if you're expecting your GTX 2080Ti at 700$, prepare to be disappointed.

swilli89 · May 10, 2017

xpea said:
Volta offers MASSIVE gain of efficiency !
It means that for same TDP, it offers much more performance. We can expect around 40% more theoretical performance for same die size...

This a very bold claim. To do this on the same process would be akin to the Maxwell jump.

beginner99 · May 11, 2017

xpea said:
But transistor density is not an important metric for us end-users. Performance is.

No. Performance/dollar is.

tamz_msc · May 11, 2017

beginner99 said:
No. Performance/dollar is.

This is an NVIDIA topic, can't make those arguments. /s

xpea · May 11, 2017

beginner99 said:
No. Performance/dollar is.

Not everyone cares about value for money. And if we compare earning of AMD vs Nvidia, I think the latter is right...

Ajay · May 11, 2017

xpea said:
I don't think so.
Compared to GP100, GV100 has 33% bigger die for 42% more FP32 FLOPS at same 300W TDP. And I don't even count the addition of the new scheduler and the Tensor units.
Volta offers MASSIVE gain of efficiency !
It means that for same TDP, it offers much more performance. If GV100 is any indication of consumer Volta, then we can expect around 40% more theoretical performance for same die size...

What percentage of the die is just for the normal cuda cores (subtracting out the TPCs)? If it is around 600mm^2, then yeah - a huge jump and bodes extremely well for GV104 (which obviously won't have Tensor Cores). Hmmm

Ajay · May 11, 2017

Uh, why isn't this under the Nvidia sub-forum like the Pascal thread?
Would seem appropriate to 'sandbox' it a bit.

xpea · May 11, 2017

swilli89 said:
This a very bold claim. To do this on the same process would be akin to the Maxwell jump.

From https://devblogs.nvidia.com/parallelforall/inside-volta/

The new Volta SM is 50% more energy efficient than the previous generation Pascal design, enabling major boosts in FP32 and FP64 performance in the same power envelope. New Tensor Cores designed specifically for deep learning deliver up to 12x higher peak TFLOPs for training. With independent, parallel integer and floating point datapaths, the Volta SM is also much more efficient on workloads with a mix of computation and addressing calculations. Volta’s new independent thread scheduling capability enables finer-grain synchronization and cooperation between parallel threads. Finally, a new combined L1 Data Cache and Shared Memory subsystem significantly improves performance while also simplifying programming.

its not a simple die shrink. To reach these claims, they made some serious rework on Pascal SMs...

tamz_msc · May 11, 2017

Ajay said:
What percentage of the die is just for the normal cuda cores (subtracting out the TPCs)? If it is around 600mm^2, then yeah - a huge jump and bodes extremely well for GV104 (which obviously won't have Tensor Cores). Hmmm

600mm^2 would likely be GV102, 450mm^2 for GV104. Expect huge jumps in prices as well, unless this 12nm FFN is only for GV100 and they can still make consumer Volta on 16nm FF+.

tamz_msc · May 11, 2017

You can see what changes they made to the GV100 SM in Volta.

GP100:

GV100:

They removed/condensed the Special Function Units, thereby reducing power consumption and rearranged the Load/Stores to make room for more cores within each cluster.

That's pretty clever, even though the various parameters aren't much different.

Ajay · May 11, 2017

tamz_msc said:
600mm^2 would likely be GV102, 450mm^2 for GV104. Expect huge jumps in prices as well, unless this 12nm FFN is only for GV100 and they can still make consumer Volta on 16nm FF+.

Well, who know what size GV104 will be. I was just mentioning ~600mm^2 relative to the idea that there is 40% more performance for a given die (in this case GV100 vs GP100).
In terms of differences, clearly the GP100 is much, much faster for many machine language use cases (where demand for compute performance is insatiable).

Glo. · May 11, 2017

JDG1980 said:
This is not true. The block diagram (which you can see on this page) for GP104 clearly indicates that each SM has 64 CUDA cores, just like on GP100. The only difference with GP100 SMs is that they also have extra dedicated FP64 CUDA cores (which are needed for HPC but serve no purpose during gaming).

Reducing CUDA cores per SM from 128 to 64 had minimal impact on performance; as you note, on a per-TFlop basis, there was little difference between Maxwell and Pascal. (The performance boosts that Nvidia obtained with Pascal were largely gotten through higher clock speeds, and packing in a couple more shaders due to the die shrink.) If going from 128 to 64 CUDA cores per SM didn't do much, then it seems unlikely that the massive gains in perf/TFlop (about 33%) from Kepler to Maxwell were due to going from 192 to 128 cores per SM.

The addition of tiled rendering remains the most likely explanation for Maxwell's substantial boost in utilization (DX11 perf/TFlop). Whether any other such breakthroughs are on the horizon remains to be seen. It's safe to say that Nvidia will try to position GV104 (when it arrives) above GP102, since they have historically tried to have each new card beat the previous generation's card from one tier up. That could be accomplished with 3584 CUDA cores at about 2 GHz, if they were fed with adequate memory bandwidth (probably via GDDR6, and a 256-bit bus as always for 4-series chips). It would not necessarily require improved perf/TFlop and I'm not sure we will see additional meaningful gains on that front. Maybe 5%-10%. I'm assuming a ~400mm^2 chip, similar in size to GM204, with 4/6 as many shaders as the big chip. Presumably the new "12FFN" process offers better clock speeds, since calculating the transistor density increase from GV100 over GP100 shows only about a 4% improvement on that front.

I suppose it was answered, by other member, but I will respond either way.

No. The SM's have 128 cores, just like Maxwell architecture. Tile Based Rasterization, if you will ask any person who understands how GPUs work, does not increase performance. It increases efficiency by saving power required to move data, and culling data, that is not used, so the power is not wasted on it.

Performance increase in Maxwell came from lowering the number of cores available to 256 KB Registry File Size, as I have mentioned earlier.

My take on Volta architecture.

Clock for clock, core for core, expect gaming GPUs using Volta architecture to be at least 30% faster than Pascal, however, it appears that if Nvidia will go for increased core numbers in the GPUs, the power consumption may go higher, if it will not be mitigated by lower core clocks. Overall effect you will see is that bigger impact on performance you will have with increased number of cores in Volta, than by increased core clocks, because the architecture appears to have higher IPC.

How does it faire to Vega? I do not know yet, but it appears that the "Poor Volta" marketing gimmick from AMD may have had its point in some cases. Volta is not as groundbreaking, revolutionary as I initially thought.

SpaceBeer · May 11, 2017

If you say IPC is ~30% higher, isn't that groundbreaking/revolutionary?

Glo. · May 11, 2017

SpaceBeer said:
If you say IPC is ~30% higher, isn't that groundbreaking/revolutionary?

Versus Consumer Pascal - yes.
Versus HPC Pascal? No. There is no difference clock for clock, core for core, in FP32 performance versus HPC Pascal GPU.

P.S. The IPC is at least 30% higher versus consumer Pascal GPUs

.

beginner99 · May 11, 2017

xpea said:
Not everyone cares about value for money. And if we compare earning of AMD vs Nvidia, I think the latter is right...

For the average human being it is obvious that performance/dollar usually means at a specific performance point. Bleeding-edge will always cost more than low-end.

BUT:

Everyone cares about value for money. The question is what each user considers as value. And that's why NV is winning. by marketing. To make you feel you got good value for the price. Value doesn't mean performance. It can include it but also power use, brand value, vendor specific features (gsync/freesynv, CUDA), being bleeding edge and so forth. For example for scientific computing many stuff is available in CUDA only. So you are basically forced to buy NV or rewrite stuff (which cost much more than higher hardware cost). So NV has higher value regardless of higher cost.

xpea · May 11, 2017

Glo. said:
Versus Consumer Pascal - yes.
Versus HPC Pascal? No. There is no difference clock for clock, core for core, in FP32 performance versus HPC Pascal GPU.

Of course, if you remove Tensor cores, the biggest change between Pascal and Volta. Something that provides 6 to 12 times speed up in the most important today workflow...

Glo. · May 11, 2017

xpea said:
Of course, if you remove Tensor cores, the biggest change between Pascal and Volta. Something that provides 6 to 12 times speed up in the most important today workflow...

You are buying too much in the Nvidia marketing.

Tensor Core is for Deep Learning. Gamers will not benefit from it at all. It also does not affect FP32 performance, which still is 90% of market. DL is still small, but rapidly growing market.

xpea · May 11, 2017

Glo. said:
You are buying too much in the Nvidia marketing.

Tensor Core is for Deep Learning. Gamers will not benefit from it at all. It also does not affect FP32 performance, which still is 90% of market. DL is still small, but rapidly growing market.

are we talking about the only Volta GPU that we know or nor ? If yes then I stand still.
If you want to talk about an unannounced product, then feel free to extrapolate but without me. Only thing we know is that Volta is massively more power efficient than Pascal in basically the same (optimized) node (at least 40% if we compare GP100 vs GV100 and even 50% on the SM if we believe Nvidia own documentation)

xpea · May 11, 2017

Glo. said:
You are buying too much in the Nvidia marketing.

of course we can't eat everything any marketing dpt says, but in terms of deep learning, Nvidia has been pretty accurate. No later than yesterday, google published some deep learning benchs on DGX-1 and the scaling is exactly what they claim:
https://developers.googleblog.com/2017/05/tensorflow-benchmarks-and-new-high.html

Glo. · May 11, 2017

xpea said:
are we talking about the only Volta GPU that we know or nor ? If yes then I stand still.
If you want to talk about an unannounced product, then feel free to extrapolate but without me. Only thing we know is that Volta is massively more power efficient than Pascal in basically the same (optimized) node (at least 40% if we compare GP100 vs GV100 and even 50% on the SM if we believe Nvidia own documentation)

Yes, I am talking about GV100 chip. You are buying too much in Nvidia marketing.

FP32 performance is the same clock for clock, core for core as GP100 chip. It has higher compute output but it is achieved through massively increased die size, and core counts. There is no revolution here. Tensor Core is for Deep Learning. Gamers and FP32 workloads market will not benefit from it.

Consumer versions will not have, most likely, Tensor Core, to save die size, and manufacturing costs, alongside of FP64 cores.

xpea said:
of we can eat everything any marketing dpt says, but in terms of deep learning, Nvidia has been pretty accurate. No later than yesterday, google published some deep learning benchs:
https://developers.googleblog.com/2017/05/tensorflow-benchmarks-and-new-high.html

On DGX-1 !!!

Ok, and how does this affect FP32 performance? How gamers will benefit from this?

We are here to talk about game performance and GPU architectures, not brand cheerleading. You are trying to move the goalpost away from the merit of my posts, to prove that GV100 is revolutionary. In FP32 - it isn't. In DL it is huge leap forward.

But 90% of market is still FP32 Workload. Gaming is still FP32, and right now it starts to be in small steps FP16.

tamz_msc · May 11, 2017

xpea said:
of course we can't eat everything any marketing dpt says, but in terms of deep learning, Nvidia has been pretty accurate. No later than yesterday, google published some deep learning benchs on DGX-1 and the scaling is exactly what they claim:
https://developers.googleblog.com/2017/05/tensorflow-benchmarks-and-new-high.html

You're trying to put a spin on things as if the people who plopped 1500$ on a GTX 1080Ti and a 1440p 144Hz Gsync monitor suddenly care about deep learning.

That 50 percent figure is specifically about GV100 vs GP100.

Krteq · May 11, 2017

krumme · May 11, 2017

The surpricing takeaway is nv is going head on to try to monopolize the dl market. 3B investment is a gigantic amount.
Its a risk, but imo it shows where the future is and how many ressources will be poured into that market going forward.
Interesting - i think they are betting right here. Its a sign that the DL market will explode. It is the future. And they are building using the strongest platform on the market bar none; Cuda. This is a driver for Cuda.
People still dont get the memo, and whats happening, and are stuck in comparing this to whatever AMD have for gaming gpu. Wake up.
It looks to me they are using capital and competences from gaming market to take on a future market that will probably explode in revenue, value and profit.
If anything this is an attack on Intel, not AMD.

NVIDIA GeForce 20 Series (Volta) to be released later this year - GV100 announced

Diamond Member

Lifer

Diamond Member

Golden Member

Diamond Member

Diamond Member

Senior member

Lifer

Lifer

Senior member

Diamond Member

Diamond Member

Lifer

Diamond Member

Senior member

Diamond Member

Diamond Member

Senior member

Diamond Member

Senior member

Senior member

Diamond Member

Diamond Member

Golden Member

Diamond Member