Do you think Volta will be a beast?


  • Total voters
    36

FORTHEWIND

Member
Jul 23, 2015
25
1
11
I call dibs nvidia will unveil its own Async Compute implementation called AsyncWorks with Volta and buried AMD/RTG with the new and improved GameFinallyWorks program. /s

Joking aside, how are your thought on nvidia's upcoming Volta Architecture? It will have a substantial performance boost compared to Pascal but that's all Nvidia wrote in its roadmap (Heck, Pascal stole all of Volta features in recent roadmap). So what do you think nvidia change in Volta.

I'll start with mine :)

When DX11 came, nvidia designed Fermi for it. Fermi brings a lot to the table (in a hot and noisy way). Tessellation and DirectCompute (compute shaders) were the norm feature. Fermi was nvidia's first wooden screw step designing a DX11 capable GPU. Fermi bring GPC into the mix, kicking out the TPC from GT200
as the biggest execution block on the GPU. Fermi was a beast.

Then came Kepler. Kepler was another beast. It put a SM into another SM creating a SMX. It tried and succeeded fixing Fermi hot and noisy introduction. It also thrown out Fermi's HW scheduler which make it not dynamic at scheduling for a while (a tradeoff for energy efficiency). It literally looks like a refined Fermi since not alot has change but rather fixed and enlarged. It also still target DX11.

Maxwell was a fundamental shift from nvidia previous design. It was nvidia's first architecture using their new “mobile first” design strategy and with it brings a whole bunch of energy efficiency enhancement. Kepler SMX were partitioned into 4 smaller SM creating the SMM. The PolyMorph engine gets an update with geometry/rasterization specific function like raster order views and conservative raster. And it was nvidia's first GPU to partially support DX12 since it was not finalized yet. It was an efficient beast furthering nvidia's DX11 performance. But fall short on async compute because it was too efficient to begin with. It was also not intelligent enough for it too.

Pascal seems like a stopgap for Volta and a testbed for 16nmFF. It got all of Volta planned features plus another update for the PolyMorph engine. It also implement some function to elevate the async compute deficit by being more intelligent in scheduling and switching workloads. And with 16nmFF it was supercharge with a very high frequency. It's a beast. But more of a HPC beast. A GCN CU look-a-like SM exist on GP100. But that's about it. A lot of it benefit HPC more than gaming. And it also introduced back TPC from GT200.

So what about Volta?

I think Volta will be multi engine, having a unit akin to GCN ACE functionality (my head pops up mini GigaThread engine per GPC). Volta will have a new and pure DX12 focused design. And I think it will reintroduce back HW scheduling to better accommodate async compute execution. The DX11 execution model (Fermi, Kepler, Maxwell, Pascal) are over. Time for a new concurrent model.

It's time for a new Fermi moment
 

Cookie Monster

Diamond Member
May 7, 2005
5,161
32
86
Volta may well be the next G80 or an evolution of their Pascal design. If you look at nVIDIA's GPU history, they normally milk 3~4 generations worth from a single GPU architecture (with process shrinks, optimizations and additions along the way) e.g NV30 -> NV40 -> G70/NV47 -> G71 or G80 -> G92 ->GT200. Compare those examples to GF100 -> GK110 -> GM200 -> GP102. So its about time they introduce something radically different which coincidentally fits with s time where DX12/Vulkan will become the primary standard (2017 onwards).

Also I have to mention that the graphics pipeline is one complex beast in todays modern GPUs. It is not quite serial as some users are led to believe (unless your talking decades ago) nor does the marketing slides with "xxx engine" mean anything useful at a technical level. There are many many things happening in parallel just in the graphics pipe alone to speed up all kind of workloads. Think of it has cooking a recipe (a 3D workload). You have the food preparation (resource managements at all levels for different tasks - there are other schedulers for instance on the GPU other than the one that we all like to talk about for example), the cooking tools (e.g. fixed function units), cooking process (rasterisation, culling, pixel shading etc) and all these depending on workload can be done in parallel, bottle-necked at some tasks etc.

There isn't any tools in the public to show just what is bottlenecked in a GPU architecture but im sure both IHVs do. I expect Volta to further improve on this aspect because you want to fundamentally focus on how to make your graphics "pipeline" to crunch the graphics workloads faster. One good silent example of this is nVIDIA's rasterisation technique. One good known/advertised example is their memory compression technique optimized over each generation.

Async compute on the other hand is when you try to speed up the work by utilizing compute shaders to get some of the graphics work done. Its important to note that using this isn't always beneficial but could hinder performance. Its just one of many tools to improve performance if possible. When you look at GCN architecture or in particular the Polaris P10, it features roughly twice more ALUs than its competing GPU the GP106! Not just that but these ALUs are configured and able to crunch both 3D/Compute workloads if fed properly (more finer in granularity than nVIDIA for sure). If optimised, the P10 may have the performance advantage. However if left under-utilised, it will consume more energy for less performance vs the competition.

So it will be interesting to see what Volta brings onto the table. Its quite evident that although the low level API's are extending the life of a GCN architecture (by being able to utilise alot more of the idling hardware because DX11/drivers etc wasnt enough to fully utilise the immense ALU power these GPU possess), the trade off is power consumption i.e. heat/noise etc. I think nVIDIA will definitely try to make their next gen GPU crunch 3D and compute workloads in parallel alot more efficiently than they do now but the key challenge here is energy efficiency. Or they further focus on making the graphics pipeline go even faster. I think architectural wise, Vega and Volta will be more interesting than Pascal (Maxwell on steroids!) for sure.

Just my 2 cents.
 

R0H1T

Platinum Member
Jan 12, 2013
2,582
162
106
So it will be interesting to see what Volta brings onto the table. Its quite evident that although the low level API's are extending the life of a GCN architecture (by being able to utilise alot more of the idling hardware because DX11/drivers etc wasnt enough to fully utilise the immense ALU power these GPU possess), the trade off is power consumption i.e. heat/noise etc. I think nVIDIA will definitely try to make their next gen GPU crunch 3D and compute workloads in parallel alot more efficiently than they do now but the key challenge here is energy efficiency. Or they further focus on making the graphics pipeline go even faster. I think architectural wise, Vega and Volta will be more interesting than Pascal (Maxwell on steroids!) for sure.

Just my 2 cents.
This is to say that Vega will be (not?) much different from Polaris? As for Volta I'd be surprised if it's a major deviation from Pascal, high clocks & low compute.
If it is then I expect the clock speeds to go down (for some reason) but the most interesting aspect, for me, would be if they gimp (DP) compute on gaming Volta & how that affects DX12/Vulkan games three years from now, when Polaris effect will be in full swing.
 

Cookie Monster

Diamond Member
May 7, 2005
5,161
32
86
This is to say that Vega will be (not?) much different from Polaris? As for Volta I'd be surprised if it's a major deviation from Pascal, high clocks & low compute.
If it is then I expect the clock speeds to go down (for some reason) but the most interesting aspect, for me, would be if they gimp (DP) compute on gaming Volta & how that affects DX12/Vulkan games three years from now, when Polaris effect will be in full swing.

Only because of HBM2.0 and from my knowledge Vega will have an updated GCN architecture vs Polaris. If its just a beefed up Polaris with HBM2.0, would be disappointing in terms of architectural changes but if they can bring the performance to the table with good energy efficiency it would be great regardless.
 

tviceman

Diamond Member
Mar 25, 2008
6,734
514
126
www.facebook.com
Volta is going to be slightly less of an upgrade over Pascal than Maxwell was over Kepler. Instead of GV104 being 50% faster at launch over GP104 (like GM204 was over GK104), it'll be 40%. The upside to this is that Titan X proves GP104 is currently bandwidth constrained so there is still room for GP104 to improve in speed in future product iterations.
 

Piroko

Senior member
Jan 10, 2013
905
79
91
I'm starting to think that Nvidia will need to implement functionality that allows them to reduce CPU load. The GTX1080 has seen some games not scale very well due to system/CPU bottlenecks and the Titan XP will only struggle more. Volta might be the first Nvidia architecture that sees most of its performance increase only in resolutions higher than 1080p.
 

Qwertilot

Golden Member
Nov 28, 2013
1,604
257
126
Well the top end will be so frighteningly fast at 1080p that I doubt if that'll bother them too much. The big thing they'll have to do is push efficiency more. Presumably somehow wider/slow clocks than Pascal has.
 

nathanddrews

Graphics Cards, CPU Moderator
Aug 9, 2016
965
534
136
www.youtube.com
Volta might be the first Nvidia architecture that sees most of its performance increase only in resolutions higher than 1080p.
I'm hoping that the release of more Vulkan/DX12 games as well as 120Hz 4K displays will make that a reality. Just like comparing 720p performance today is largely a waste of time, I'd like to see 1080p become that trivial in the next gen.
 

FORTHEWIND

Member
Jul 23, 2015
25
1
11
Volta may well be the next G80 or an evolution of their Pascal design. If you look at nVIDIA's GPU history, they normally milk 3~4 generations worth from a single GPU architecture (with process shrinks, optimizations and additions along the way) e.g NV30 -> NV40 -> G70/NV47 -> G71 or G80 -> G92 ->GT200. Compare those examples to GF100 -> GK110 -> GM200 -> GP102. So its about time they introduce something radically different which coincidentally fits with s time where DX12/Vulkan will become the primary standard (2017 onwards).

Somebody sees what I'm seeing :D. I feel nvidia likes to designed an architecture around an API or a major feature and enhanced it until the next big design shift comes (akin to your examples). NV30 till NV47 was fixed shader pipeline DX7-8-9 era. G80 till GT200 was unified shaders and DX10. Fermi and Kepler were DX11. Maxwell and Pascal are partial DX12. They can't execute it efficiently since they're already efficient enough without it and they still based on Fermi DX11 design/Kepler execution model. It's time for a change (for nvidia sake)

Also I have to mention that the graphics pipeline is one complex beast in todays modern GPUs. It is not quite serial as some users are led to believe (unless your talking decades ago) nor does the marketing slides with "xxx engine" mean anything useful at a technical level. There are many many things happening in parallel just in the graphics pipe alone to speed up all kind of workloads. Think of it has cooking a recipe (a 3D workload). You have the food preparation (resource managements at all levels for different tasks - there are other schedulers for instance on the GPU other than the one that we all like to talk about for example), the cooking tools (e.g. fixed function units), cooking process (rasterisation, culling, pixel shading etc) and all these depending on workload can be done in parallel, bottle-necked at some tasks etc.

There isn't any tools in the public to show just what is bottlenecked in a GPU architecture but im sure both IHVs do. I expect Volta to further improve on this aspect because you want to fundamentally focus on how to make your graphics "pipeline" to crunch the graphics workloads faster. One good silent example of this is nVIDIA's rasterisation technique. One good known/advertised example is their memory compression technique optimized over each generation.

Async compute on the other hand is when you try to speed up the work by utilizing compute shaders to get some of the graphics work done. Its important to note that using this isn't always beneficial but could hinder performance. Its just one of many tools to improve performance if possible. When you look at GCN architecture or in particular the Polaris P10, it features roughly twice more ALUs than its competing GPU the GP106! Not just that but these ALUs are configured and able to crunch both 3D/Compute workloads if fed properly (more finer in granularity than nVIDIA for sure). If optimised, the P10 may have the performance advantage. However if left under-utilised, it will consume more energy for less performance vs the competition.

So it will be interesting to see what Volta brings onto the table. Its quite evident that although the low level API's are extending the life of a GCN architecture (by being able to utilise alot more of the idling hardware because DX11/drivers etc wasnt enough to fully utilise the immense ALU power these GPU possess), the trade off is power consumption i.e. heat/noise etc. I think nVIDIA will definitely try to make their next gen GPU crunch 3D and compute workloads in parallel alot more efficiently than they do now but the key challenge here is energy efficiency. Or they further focus on making the graphics pipeline go even faster. I think architectural wise, Vega and Volta will be more interesting than Pascal (Maxwell on steroids!) for sure.

Just my 2 cents.

No argument there. You said it better than I can. Volta will focus on making the graphics pipeline faster (with async compute help to fill up rendering bubbles) and I think it will try to be more smart on the power it uses.


Volta is going to be slightly less of an upgrade over Pascal than Maxwell was over Kepler. Instead of GV104 being 50% faster at launch over GP104 (like GM204 was over GK104), it'll be 40%. The upside to this is that Titan X proves GP104 is currently bandwidth constrained so there is still room for GP104 to improve in speed in future product iterations.

It depends if nvidia just enhance Pascal design and call it a day or if it's like G80 or Fermi design shift. DX12/Vulkan game are coming. Nvidia have to make a DX12 optimize GPU to stay relevant. Otherwise, more people will choose GCN for better performance on those API. And yeah, GP104 is still not fully unleash yet. Expect Geforce 11 series to still use GP104 for 1170 with more bandwidth for it (I'm thinking higher G5X frequency).

I'm starting to think that Nvidia will need to implement functionality that allows them to reduce CPU load. The GTX1080 has seen some games not scale very well due to system/CPU bottlenecks and the Titan XP will only struggle more. Volta might be the first Nvidia architecture that sees most of its performance increase only in resolutions higher than 1080p.

You are right there. GCN has those feature that make them semi-independent from the CPU. Fermi has it too for DX11 (command list for Civ V if I remember but that just one of many). Hope nvidia will make Volta much less bottleneck from the CPU than Pascal. Resolution-wise, maybe but if the CPU bottleneck is elevated. Maybe not yet.
 

imported_bman

Senior member
Jul 29, 2007
262
54
101
Seeing that 16nm will be more mature I think Nvidia will make the die sizes to be more inline with the 900 series (~600mm^2 for 1180ti and Titan XV, ~390mm^2 for 1180, ~230mm^2 for 1160). GDDR5X and GDDR5 will still be used for all of these parts, but the GDDR5X will be more mature (14-16Gbps per pin). I am curious how much more Nvidia can squeeze out of memory compression. Also I think Nvidia will have support for the AV1 codec since they are a part of the Alliance for Open Media.
 

2is

Diamond Member
Apr 8, 2012
4,281
131
106
Well the top end will be so frighteningly fast at 1080p that I doubt if that'll bother them too much. The big thing they'll have to do is push efficiency more. Presumably somehow wider/slow clocks than Pascal has.

Agreed. Their drivers are already quite efficient. That is very apparent when we look at API's like Vulkan and DX12 where driver inefficiencies are less of a factor and we see how much more they benefit AMD. It would be like them trying overly hard to get even better performance @ 720p today. It's just not necessary.
 

poohbear

Platinum Member
Mar 11, 2003
2,284
5
81
OP you said time for another Fermi, but the Fermi release was a disaster for Nvidia. It got panned by reviewers and ran way too hot with a leafblower for a fan. From all post Fermi releases it looks like they learned their lesson and it certainly wont be another fermi release.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
I think Volta will be multi engine, having a unit akin to GCN ACE functionality (my head pops up mini GigaThread engine per GPC). Volta will have a new and pure DX12 focused design. And I think it will reintroduce back HW scheduling to better accommodate async compute execution. The DX11 execution model (Fermi, Kepler, Maxwell, Pascal) are over. Time for a new concurrent model.

It's time for a new Fermi moment

If anything, NVidia have shown that hardware schedulers are unnecessary. NVidia has beaten AMD at the performance per watt metric ever since they abandoned hardware schedulers starting with Kepler. And yes, I know that Kepler aged horribly, but that's not due to a lack of hardware scheduling. That's because Kepler's compute performance was comparatively much weaker than AMD's GCN parts, and also because the usage of compute shaders have become more and more pervasive over time.. A vast oversimplification I know, but that's the gist of it I think.

Maxwell had much stronger compute performance than Kepler, but NVidia favored power usage over performance and was very conservative with the final clock speeds. That's why comparisons with GCN look more favorable to AMD when reference clock speeds are used, rather than aftermarket Maxwell parts which are clocked significantly higher.

Regarding Volta, I think NVidia is going to continue the trend, which are:

1) Boost IPC and power efficiency
2) Focus on maximizing clock speeds
3) Moderate increase in ALU count and width
4) Improve efficiency and intelligence of their compiler
5) Improve dynamic scheduling and fine grained preemption for concurrent tasks

Whilst AMD fans like to hype asynchronous compute, it's shown itself to have a relatively minor impact on performance for discrete GPUs, compared to consoles. This shouldn't really be surprising, because consoles being a fixed platform allows developers to have much greater control over the workflow than on PC. Also, the performance impact of asynchronous compute is very dependent on the kind of architecture. Since AMD favors very wide designs with high ALU count, asynchronous compute will likely always be bigger deal for them.

So I definitely don't expect Volta to have anything remotely resembling fixed asynchronous shaders similar to what AMD has.
 

FORTHEWIND

Member
Jul 23, 2015
25
1
11
OP you said time for another Fermi, but the Fermi release was a disaster for Nvidia. It got panned by reviewers and ran way too hot with a leafblower for a fan. From all post Fermi releases it looks like they learned their lesson and it certainly wont be another fermi release.

My bad :) I should've chosen my words more correctly. What I meant is (the corrected version) it's time for a new nvidia GPU radically different like Fermi but without the disaster. Maybe a Fermi/Maxwell.

If anything, NVidia have shown that hardware schedulers are unnecessary. NVidia has beaten AMD at the performance per watt metric ever since they abandoned hardware schedulers starting with Kepler. And yes, I know that Kepler aged horribly, but that's not due to a lack of hardware scheduling. That's because Kepler's compute performance was comparatively much weaker than AMD's GCN parts, and also because the usage of compute shaders have become more and more pervasive over time.. A vast oversimplification I know, but that's the gist of it I think.

You are correct :). Nvidia literally did blow AMD at both performance and energy efficiency. But a lot of those win are only on DX11. With more DX12 games coming, if nvidia can still do it with what they have without a big change in their GPU then it's fine by me :). I also think Kepler aged horribly isn't because of low compute performance. It's because it relied on optimized shader code too much.

Maxwell had much stronger compute performance than Kepler, but NVidia favored power usage over performance and was very conservative with the final clock speeds. That's why comparisons with GCN look more favorable to AMD when reference clock speeds are used, rather than aftermarket Maxwell parts which are clocked significantly higher.

Yup. I agreed.

Regarding Volta, I think NVidia is going to continue the trend, which are:

1) Boost IPC and power efficiency
2) Focus on maximizing clock speeds
3) Moderate increase in ALU count and width
4) Improve efficiency and intelligence of their compiler
5) Improve dynamic scheduling and fine grained preemption for concurrent tasks

Whilst AMD fans like to hype asynchronous compute, it's shown itself to have a relatively minor impact on performance for discrete GPUs, compared to consoles. This shouldn't really be surprising, because consoles being a fixed platform allows developers to have much greater control over the workflow than on PC. Also, the performance impact of asynchronous compute is very dependent on the kind of architecture. Since AMD favors very wide designs with high ALU count, asynchronous compute will likely always be bigger deal for them.

So I definitely don't expect Volta to have anything remotely resembling fixed asynchronous shaders similar to what AMD has.

For the most part, I think your prediction on Volta may just what nvidia will do :(. But do remember nvidia likes to change their design every 4 architecture akin to @Cookie Monster and my explanation. But like I said, If nvidia can still make due with current design then it's fine by me. About async, it does have an impact (depends on the game and workload) on GPU like you said but remember with DX12 developers will start to use it in the future. Nvidia need a GPU that can handle it efficiently (Not to say Pascal is bad but it could be better).
 

Head1985

Golden Member
Jul 8, 2014
1,863
685
136
I am 100% sure volta will be:
Much better in dx12 than pascal
Much faster than pascal
Even more overpriced than pascal

my gues:
800USD GTX1180
500USD GTX1170
 
  • Like
Reactions: DamZe

alcoholbob

Diamond Member
May 24, 2005
6,271
323
126
Volta will probably be less of a performance gain than Pascal.

Pascal simultaneously benefited from the new node process improving performance per mm, as well as an increase in clock speed per core by 35-40%.

Whereas Volta will be purely an architectural improvement. So we might see 30-35% improvement in performance per mm of die size assuming clock speeds haven't dropped. It'll be actually a lower improvement than Pascal due to these two factors. I dont believe yields will improve enough for significantly larger dies to offset the node shrink advantage for Pascal. Im guessing GV104 will be about 350mm2 and GV102 will be around 500-520mm2.
 
Last edited:
  • Like
Reactions: MangoX

Qwertilot

Golden Member
Nov 28, 2013
1,604
257
126
Doubt if they will release until yields have improved enough to let them go a bit bigger - they've got a very easy way to do a Spring 2017 refresh using Pascal if they have to. (P102 into the normal GPU stack at current 1080ish pricing etc.).
 

CFP

Senior member
Apr 26, 2006
544
6
81
I haven't been paying attention to timeframe slides, but when can we reasonably expect the 1080 volta equivalent?
 

Genx87

Lifer
Apr 8, 2002
41,095
513
126
Volta will probably be less of a performance gain than Pascal.

Pascal simultaneously benefited from the new node process improving performance per mm, as well as an increase in clock speed per core by 35-40%.

Whereas Volta will be purely an architectural improvement. So we might see 30-35% improvement in performance per mm of die size assuming clock speeds haven't dropped. It'll be actually a lower improvement than Pascal due to these two factors. I dont believe yields will improve enough for significantly larger dies to offset the node shrink advantage for Pascal. Im guessing GV104 will be about 350mm2 and GV102 will be around 500-520mm2.

I think the final product will have the capability to be quite a bit faster than Pascal. Pascal tweaked Maxwell and shrunk the die. The die size is conservative, the clocks are conservative, and it is using an older memory technology. I think the gloves come off Volta. We will see a new arch with efficiency gains there, new memory(HBM 2), and large dies that consume 250+ watts. Even the new titan is at 205 watts. And that thing screams compared to the 1080. Think Titan but with more efficiency and another 50 watts of consumption to throw at performance.
 

exar333

Diamond Member
Feb 7, 2004
8,518
8
91
I am pretty optimistic for Volta. NV appears to be treating 16nm similarly to how Intel approaches new process nodes through shrinking an existing arch with some optimizations. With Pascal going very smoothly (and profitable) Volta likely will be tailored for 16nm and hopefully a real compute beast, built from the ground up for DX12.
 

Qwertilot

Golden Member
Nov 28, 2013
1,604
257
126
I haven't been paying attention to timeframe slides, but when can we reasonably expect the 1080 volta equivalent?

Very hard to say. Mostly watch what they do with Pascal - how soon we get a 1080ti style thing - and especially in Spring 2017.

If Volta is a Spring 2018 product then they'll drop some P102 cards onto the market then and reposition to 1070/80 a bit too. If they launch a 1080ti in Autumn some time then we might be seeing Volta a bit sooner than that. They are definitely rolling Pascal out very fast, which does make you wonder slightly. Maybe just because they can though :)