Nvidia Volta Discussion

FORTHEWIND · Aug 11, 2016

I call dibs nvidia will unveil its own Async Compute implementation called AsyncWorks with Volta and buried AMD/RTG with the new and improved GameFinallyWorks program. /s

Joking aside, how are your thought on nvidia's upcoming Volta Architecture? It will have a substantial performance boost compared to Pascal but that's all Nvidia wrote in its roadmap (Heck, Pascal stole all of Volta features in recent roadmap). So what do you think nvidia change in Volta.

I'll start with mine

When DX11 came, nvidia designed Fermi for it. Fermi brings a lot to the table (in a hot and noisy way). Tessellation and DirectCompute (compute shaders) were the norm feature. Fermi was nvidia's first wooden screw step designing a DX11 capable GPU. Fermi bring GPC into the mix, kicking out the TPC from GT200
as the biggest execution block on the GPU. Fermi was a beast.

Then came Kepler. Kepler was another beast. It put a SM into another SM creating a SMX. It tried and succeeded fixing Fermi hot and noisy introduction. It also thrown out Fermi's HW scheduler which make it not dynamic at scheduling for a while (a tradeoff for energy efficiency). It literally looks like a refined Fermi since not alot has change but rather fixed and enlarged. It also still target DX11.

Maxwell was a fundamental shift from nvidia previous design. It was nvidia's first architecture using their new “mobile first” design strategy and with it brings a whole bunch of energy efficiency enhancement. Kepler SMX were partitioned into 4 smaller SM creating the SMM. The PolyMorph engine gets an update with geometry/rasterization specific function like raster order views and conservative raster. And it was nvidia's first GPU to partially support DX12 since it was not finalized yet. It was an efficient beast furthering nvidia's DX11 performance. But fall short on async compute because it was too efficient to begin with. It was also not intelligent enough for it too.

Pascal seems like a stopgap for Volta and a testbed for 16nmFF. It got all of Volta planned features plus another update for the PolyMorph engine. It also implement some function to elevate the async compute deficit by being more intelligent in scheduling and switching workloads. And with 16nmFF it was supercharge with a very high frequency. It's a beast. But more of a HPC beast. A GCN CU look-a-like SM exist on GP100. But that's about it. A lot of it benefit HPC more than gaming. And it also introduced back TPC from GT200.

So what about Volta?

I think Volta will be multi engine, having a unit akin to GCN ACE functionality (my head pops up mini GigaThread engine per GPC). Volta will have a new and pure DX12 focused design. And I think it will reintroduce back HW scheduling to better accommodate async compute execution. The DX11 execution model (Fermi, Kepler, Maxwell, Pascal) are over. Time for a new concurrent model.

It's time for a new Fermi moment

Cookie Monster · Aug 11, 2016

Volta may well be the next G80 or an evolution of their Pascal design. If you look at nVIDIA's GPU history, they normally milk 3~4 generations worth from a single GPU architecture (with process shrinks, optimizations and additions along the way) e.g NV30 -> NV40 -> G70/NV47 -> G71 or G80 -> G92 ->GT200. Compare those examples to GF100 -> GK110 -> GM200 -> GP102. So its about time they introduce something radically different which coincidentally fits with s time where DX12/Vulkan will become the primary standard (2017 onwards).

Also I have to mention that the graphics pipeline is one complex beast in todays modern GPUs. It is not quite serial as some users are led to believe (unless your talking decades ago) nor does the marketing slides with "xxx engine" mean anything useful at a technical level. There are many many things happening in parallel just in the graphics pipe alone to speed up all kind of workloads. Think of it has cooking a recipe (a 3D workload). You have the food preparation (resource managements at all levels for different tasks - there are other schedulers for instance on the GPU other than the one that we all like to talk about for example), the cooking tools (e.g. fixed function units), cooking process (rasterisation, culling, pixel shading etc) and all these depending on workload can be done in parallel, bottle-necked at some tasks etc.

There isn't any tools in the public to show just what is bottlenecked in a GPU architecture but im sure both IHVs do. I expect Volta to further improve on this aspect because you want to fundamentally focus on how to make your graphics "pipeline" to crunch the graphics workloads faster. One good silent example of this is nVIDIA's rasterisation technique. One good known/advertised example is their memory compression technique optimized over each generation.

Async compute on the other hand is when you try to speed up the work by utilizing compute shaders to get some of the graphics work done. Its important to note that using this isn't always beneficial but could hinder performance. Its just one of many tools to improve performance if possible. When you look at GCN architecture or in particular the Polaris P10, it features roughly twice more ALUs than its competing GPU the GP106! Not just that but these ALUs are configured and able to crunch both 3D/Compute workloads if fed properly (more finer in granularity than nVIDIA for sure). If optimised, the P10 may have the performance advantage. However if left under-utilised, it will consume more energy for less performance vs the competition.

So it will be interesting to see what Volta brings onto the table. Its quite evident that although the low level API's are extending the life of a GCN architecture (by being able to utilise alot more of the idling hardware because DX11/drivers etc wasnt enough to fully utilise the immense ALU power these GPU possess), the trade off is power consumption i.e. heat/noise etc. I think nVIDIA will definitely try to make their next gen GPU crunch 3D and compute workloads in parallel alot more efficiently than they do now but the key challenge here is energy efficiency. Or they further focus on making the graphics pipeline go even faster. I think architectural wise, Vega and Volta will be more interesting than Pascal (Maxwell on steroids!) for sure.

Just my 2 cents.

R0H1T · Aug 11, 2016

Cookie Monster said:
So it will be interesting to see what Volta brings onto the table. Its quite evident that although the low level API's are extending the life of a GCN architecture (by being able to utilise alot more of the idling hardware because DX11/drivers etc wasnt enough to fully utilise the immense ALU power these GPU possess), the trade off is power consumption i.e. heat/noise etc. I think nVIDIA will definitely try to make their next gen GPU crunch 3D and compute workloads in parallel alot more efficiently than they do now but the key challenge here is energy efficiency. Or they further focus on making the graphics pipeline go even faster. I think architectural wise, Vega and Volta will be more interesting than Pascal (Maxwell on steroids!) for sure.

Just my 2 cents.

This is to say that Vega will be (not?) much different from Polaris? As for Volta I'd be surprised if it's a major deviation from Pascal, high clocks & low compute.
If it is then I expect the clock speeds to go down (for some reason) but the most interesting aspect, for me, would be if they gimp (DP) compute on gaming Volta & how that affects DX12/Vulkan games three years from now, when Polaris effect will be in full swing.

Cookie Monster · Aug 11, 2016

R0H1T said:
This is to say that Vega will be (not?) much different from Polaris? As for Volta I'd be surprised if it's a major deviation from Pascal, high clocks & low compute.
If it is then I expect the clock speeds to go down (for some reason) but the most interesting aspect, for me, would be if they gimp (DP) compute on gaming Volta & how that affects DX12/Vulkan games three years from now, when Polaris effect will be in full swing.

Only because of HBM2.0 and from my knowledge Vega will have an updated GCN architecture vs Polaris. If its just a beefed up Polaris with HBM2.0, would be disappointing in terms of architectural changes but if they can bring the performance to the table with good energy efficiency it would be great regardless.

tviceman · Aug 11, 2016

Volta is going to be slightly less of an upgrade over Pascal than Maxwell was over Kepler. Instead of GV104 being 50% faster at launch over GP104 (like GM204 was over GK104), it'll be 40%. The upside to this is that Titan X proves GP104 is currently bandwidth constrained so there is still room for GP104 to improve in speed in future product iterations.

Piroko · Aug 11, 2016

I'm starting to think that Nvidia will need to implement functionality that allows them to reduce CPU load. The GTX1080 has seen some games not scale very well due to system/CPU bottlenecks and the Titan XP will only struggle more. Volta might be the first Nvidia architecture that sees most of its performance increase only in resolutions higher than 1080p.

Qwertilot · Aug 11, 2016

Well the top end will be so frighteningly fast at 1080p that I doubt if that'll bother them too much. The big thing they'll have to do is push efficiency more. Presumably somehow wider/slow clocks than Pascal has.

nathanddrews · Aug 11, 2016

Piroko said:
Volta might be the first Nvidia architecture that sees most of its performance increase only in resolutions higher than 1080p.

I'm hoping that the release of more Vulkan/DX12 games as well as 120Hz 4K displays will make that a reality. Just like comparing 720p performance today is largely a waste of time, I'd like to see 1080p become that trivial in the next gen.

FORTHEWIND · Aug 11, 2016

Cookie Monster said:
Volta may well be the next G80 or an evolution of their Pascal design. If you look at nVIDIA's GPU history, they normally milk 3~4 generations worth from a single GPU architecture (with process shrinks, optimizations and additions along the way) e.g NV30 -> NV40 -> G70/NV47 -> G71 or G80 -> G92 ->GT200. Compare those examples to GF100 -> GK110 -> GM200 -> GP102. So its about time they introduce something radically different which coincidentally fits with s time where DX12/Vulkan will become the primary standard (2017 onwards).

Somebody sees what I'm seeing

. I feel nvidia likes to designed an architecture around an API or a major feature and enhanced it until the next big design shift comes (akin to your examples). NV30 till NV47 was fixed shader pipeline DX7-8-9 era. G80 till GT200 was unified shaders and DX10. Fermi and Kepler were DX11. Maxwell and Pascal are partial DX12. They can't execute it efficiently since they're already efficient enough without it and they still based on Fermi DX11 design/Kepler execution model. It's time for a change (for nvidia sake)

Also I have to mention that the graphics pipeline is one complex beast in todays modern GPUs. It is not quite serial as some users are led to believe (unless your talking decades ago) nor does the marketing slides with "xxx engine" mean anything useful at a technical level. There are many many things happening in parallel just in the graphics pipe alone to speed up all kind of workloads. Think of it has cooking a recipe (a 3D workload). You have the food preparation (resource managements at all levels for different tasks - there are other schedulers for instance on the GPU other than the one that we all like to talk about for example), the cooking tools (e.g. fixed function units), cooking process (rasterisation, culling, pixel shading etc) and all these depending on workload can be done in parallel, bottle-necked at some tasks etc.

There isn't any tools in the public to show just what is bottlenecked in a GPU architecture but im sure both IHVs do. I expect Volta to further improve on this aspect because you want to fundamentally focus on how to make your graphics "pipeline" to crunch the graphics workloads faster. One good silent example of this is nVIDIA's rasterisation technique. One good known/advertised example is their memory compression technique optimized over each generation.

Async compute on the other hand is when you try to speed up the work by utilizing compute shaders to get some of the graphics work done. Its important to note that using this isn't always beneficial but could hinder performance. Its just one of many tools to improve performance if possible. When you look at GCN architecture or in particular the Polaris P10, it features roughly twice more ALUs than its competing GPU the GP106! Not just that but these ALUs are configured and able to crunch both 3D/Compute workloads if fed properly (more finer in granularity than nVIDIA for sure). If optimised, the P10 may have the performance advantage. However if left under-utilised, it will consume more energy for less performance vs the competition.

So it will be interesting to see what Volta brings onto the table. Its quite evident that although the low level API's are extending the life of a GCN architecture (by being able to utilise alot more of the idling hardware because DX11/drivers etc wasnt enough to fully utilise the immense ALU power these GPU possess), the trade off is power consumption i.e. heat/noise etc. I think nVIDIA will definitely try to make their next gen GPU crunch 3D and compute workloads in parallel alot more efficiently than they do now but the key challenge here is energy efficiency. Or they further focus on making the graphics pipeline go even faster. I think architectural wise, Vega and Volta will be more interesting than Pascal (Maxwell on steroids!) for sure.

Just my 2 cents.

No argument there. You said it better than I can. Volta will focus on making the graphics pipeline faster (with async compute help to fill up rendering bubbles) and I think it will try to be more smart on the power it uses.

tviceman said:
Volta is going to be slightly less of an upgrade over Pascal than Maxwell was over Kepler. Instead of GV104 being 50% faster at launch over GP104 (like GM204 was over GK104), it'll be 40%. The upside to this is that Titan X proves GP104 is currently bandwidth constrained so there is still room for GP104 to improve in speed in future product iterations.

It depends if nvidia just enhance Pascal design and call it a day or if it's like G80 or Fermi design shift. DX12/Vulkan game are coming. Nvidia have to make a DX12 optimize GPU to stay relevant. Otherwise, more people will choose GCN for better performance on those API. And yeah, GP104 is still not fully unleash yet. Expect Geforce 11 series to still use GP104 for 1170 with more bandwidth for it (I'm thinking higher G5X frequency).

Piroko said:
I'm starting to think that Nvidia will need to implement functionality that allows them to reduce CPU load. The GTX1080 has seen some games not scale very well due to system/CPU bottlenecks and the Titan XP will only struggle more. Volta might be the first Nvidia architecture that sees most of its performance increase only in resolutions higher than 1080p.

You are right there. GCN has those feature that make them semi-independent from the CPU. Fermi has it too for DX11 (command list for Civ V if I remember but that just one of many). Hope nvidia will make Volta much less bottleneck from the CPU than Pascal. Resolution-wise, maybe but if the CPU bottleneck is elevated. Maybe not yet.

imported_bman · Aug 12, 2016

Seeing that 16nm will be more mature I think Nvidia will make the die sizes to be more inline with the 900 series (~600mm^2 for 1180ti and Titan XV, ~390mm^2 for 1180, ~230mm^2 for 1160). GDDR5X and GDDR5 will still be used for all of these parts, but the GDDR5X will be more mature (14-16Gbps per pin). I am curious how much more Nvidia can squeeze out of memory compression. Also I think Nvidia will have support for the AV1 codec since they are a part of the Alliance for Open Media.

Glo. · Aug 12, 2016

http://www.tweaktown.com/news/53375/nvidias-next-gen-gpus-made-14nm-samsung/index.html

Arachnotronic · Aug 12, 2016

Glo. said:
http://www.tweaktown.com/news/53375/nvidias-next-gen-gpus-made-14nm-samsung/index.html

Sounds false to me, sort of like the rumors that cropped up before that Tegra Next was going to be 14LPP

2is · Aug 12, 2016

Qwertilot said:
Well the top end will be so frighteningly fast at 1080p that I doubt if that'll bother them too much. The big thing they'll have to do is push efficiency more. Presumably somehow wider/slow clocks than Pascal has.

Agreed. Their drivers are already quite efficient. That is very apparent when we look at API's like Vulkan and DX12 where driver inefficiencies are less of a factor and we see how much more they benefit AMD. It would be like them trying overly hard to get even better performance @ 720p today. It's just not necessary.

poohbear · Aug 12, 2016

OP you said time for another Fermi, but the Fermi release was a disaster for Nvidia. It got panned by reviewers and ran way too hot with a leafblower for a fan. From all post Fermi releases it looks like they learned their lesson and it certainly wont be another fermi release.

Carfax83 · Aug 12, 2016

FORTHEWIND said:
I think Volta will be multi engine, having a unit akin to GCN ACE functionality (my head pops up mini GigaThread engine per GPC). Volta will have a new and pure DX12 focused design. And I think it will reintroduce back HW scheduling to better accommodate async compute execution. The DX11 execution model (Fermi, Kepler, Maxwell, Pascal) are over. Time for a new concurrent model.

It's time for a new Fermi moment

If anything, NVidia have shown that hardware schedulers are unnecessary. NVidia has beaten AMD at the performance per watt metric ever since they abandoned hardware schedulers starting with Kepler. And yes, I know that Kepler aged horribly, but that's not due to a lack of hardware scheduling. That's because Kepler's compute performance was comparatively much weaker than AMD's GCN parts, and also because the usage of compute shaders have become more and more pervasive over time.. A vast oversimplification I know, but that's the gist of it I think.

Maxwell had much stronger compute performance than Kepler, but NVidia favored power usage over performance and was very conservative with the final clock speeds. That's why comparisons with GCN look more favorable to AMD when reference clock speeds are used, rather than aftermarket Maxwell parts which are clocked significantly higher.

Regarding Volta, I think NVidia is going to continue the trend, which are:

1) Boost IPC and power efficiency
2) Focus on maximizing clock speeds
3) Moderate increase in ALU count and width
4) Improve efficiency and intelligence of their compiler
5) Improve dynamic scheduling and fine grained preemption for concurrent tasks

Whilst AMD fans like to hype asynchronous compute, it's shown itself to have a relatively minor impact on performance for discrete GPUs, compared to consoles. This shouldn't really be surprising, because consoles being a fixed platform allows developers to have much greater control over the workflow than on PC. Also, the performance impact of asynchronous compute is very dependent on the kind of architecture. Since AMD favors very wide designs with high ALU count, asynchronous compute will likely always be bigger deal for them.

So I definitely don't expect Volta to have anything remotely resembling fixed asynchronous shaders similar to what AMD has.

FORTHEWIND · Aug 13, 2016

poohbear said:
OP you said time for another Fermi, but the Fermi release was a disaster for Nvidia. It got panned by reviewers and ran way too hot with a leafblower for a fan. From all post Fermi releases it looks like they learned their lesson and it certainly wont be another fermi release.

My bad

I should've chosen my words more correctly. What I meant is (the corrected version) it's time for a new nvidia GPU radically different like Fermi but without the disaster. Maybe a Fermi/Maxwell.

Carfax83 said:
If anything, NVidia have shown that hardware schedulers are unnecessary. NVidia has beaten AMD at the performance per watt metric ever since they abandoned hardware schedulers starting with Kepler. And yes, I know that Kepler aged horribly, but that's not due to a lack of hardware scheduling. That's because Kepler's compute performance was comparatively much weaker than AMD's GCN parts, and also because the usage of compute shaders have become more and more pervasive over time.. A vast oversimplification I know, but that's the gist of it I think.

You are correct

. Nvidia literally did blow AMD at both performance and energy efficiency. But a lot of those win are only on DX11. With more DX12 games coming, if nvidia can still do it with what they have without a big change in their GPU then it's fine by me

. I also think Kepler aged horribly isn't because of low compute performance. It's because it relied on optimized shader code too much.

Maxwell had much stronger compute performance than Kepler, but NVidia favored power usage over performance and was very conservative with the final clock speeds. That's why comparisons with GCN look more favorable to AMD when reference clock speeds are used, rather than aftermarket Maxwell parts which are clocked significantly higher.

Yup. I agreed.

Regarding Volta, I think NVidia is going to continue the trend, which are:

1) Boost IPC and power efficiency
2) Focus on maximizing clock speeds
3) Moderate increase in ALU count and width
4) Improve efficiency and intelligence of their compiler
5) Improve dynamic scheduling and fine grained preemption for concurrent tasks

Whilst AMD fans like to hype asynchronous compute, it's shown itself to have a relatively minor impact on performance for discrete GPUs, compared to consoles. This shouldn't really be surprising, because consoles being a fixed platform allows developers to have much greater control over the workflow than on PC. Also, the performance impact of asynchronous compute is very dependent on the kind of architecture. Since AMD favors very wide designs with high ALU count, asynchronous compute will likely always be bigger deal for them.

So I definitely don't expect Volta to have anything remotely resembling fixed asynchronous shaders similar to what AMD has.

For the most part, I think your prediction on Volta may just what nvidia will do

. But do remember nvidia likes to change their design every 4 architecture akin to @Cookie Monster and my explanation. But like I said, If nvidia can still make due with current design then it's fine by me. About async, it does have an impact (depends on the game and workload) on GPU like you said but remember with DX12 developers will start to use it in the future. Nvidia need a GPU that can handle it efficiently (Not to say Pascal is bad but it could be better).

Keysplayr · Aug 13, 2016

This may have been better in the Nvidia forum.

Head1985 · Aug 14, 2016

I am 100% sure volta will be:
Much better in dx12 than pascal
Much faster than pascal
Even more overpriced than pascal

my gues:
800USD GTX1180
500USD GTX1170

alcoholbob · Aug 14, 2016

Volta will probably be less of a performance gain than Pascal.

Pascal simultaneously benefited from the new node process improving performance per mm, as well as an increase in clock speed per core by 35-40%.

Whereas Volta will be purely an architectural improvement. So we might see 30-35% improvement in performance per mm of die size assuming clock speeds haven't dropped. It'll be actually a lower improvement than Pascal due to these two factors. I dont believe yields will improve enough for significantly larger dies to offset the node shrink advantage for Pascal. Im guessing GV104 will be about 350mm2 and GV102 will be around 500-520mm2.

Qwertilot · Aug 14, 2016

Doubt if they will release until yields have improved enough to let them go a bit bigger - they've got a very easy way to do a Spring 2017 refresh using Pascal if they have to. (P102 into the normal GPU stack at current 1080ish pricing etc.).

CFP · Aug 14, 2016

I haven't been paying attention to timeframe slides, but when can we reasonably expect the 1080 volta equivalent?

Genx87 · Aug 15, 2016

alcoholbob said:
Volta will probably be less of a performance gain than Pascal.

Pascal simultaneously benefited from the new node process improving performance per mm, as well as an increase in clock speed per core by 35-40%.

Whereas Volta will be purely an architectural improvement. So we might see 30-35% improvement in performance per mm of die size assuming clock speeds haven't dropped. It'll be actually a lower improvement than Pascal due to these two factors. I dont believe yields will improve enough for significantly larger dies to offset the node shrink advantage for Pascal. Im guessing GV104 will be about 350mm2 and GV102 will be around 500-520mm2.

I think the final product will have the capability to be quite a bit faster than Pascal. Pascal tweaked Maxwell and shrunk the die. The die size is conservative, the clocks are conservative, and it is using an older memory technology. I think the gloves come off Volta. We will see a new arch with efficiency gains there, new memory(HBM 2), and large dies that consume 250+ watts. Even the new titan is at 205 watts. And that thing screams compared to the 1080. Think Titan but with more efficiency and another 50 watts of consumption to throw at performance.

exar333 · Aug 15, 2016

I am pretty optimistic for Volta. NV appears to be treating 16nm similarly to how Intel approaches new process nodes through shrinking an existing arch with some optimizations. With Pascal going very smoothly (and profitable) Volta likely will be tailored for 16nm and hopefully a real compute beast, built from the ground up for DX12.

Qwertilot · Aug 16, 2016

CFP said:
I haven't been paying attention to timeframe slides, but when can we reasonably expect the 1080 volta equivalent?

Very hard to say. Mostly watch what they do with Pascal - how soon we get a 1080ti style thing - and especially in Spring 2017.

If Volta is a Spring 2018 product then they'll drop some P102 cards onto the market then and reposition to 1070/80 a bit too. If they launch a 1080ti in Autumn some time then we might be seeing Volta a bit sooner than that. They are definitely rolling Pascal out very fast, which does make you wonder slightly. Maybe just because they can though

Nvidia Volta Discussion

Do you think Volta will be a beast?

Yup. Nvidia FTW!

Maybe. PC Master Race FTW!

No. AMD FTW!

I like video games :)

Member

Diamond Member

Platinum Member

Diamond Member

Diamond Member

Senior member

Golden Member

Graphics Cards, CPU Moderator

Member

Senior member

Diamond Member

Lifer

Diamond Member

Platinum Member

Diamond Member

Member

Elite Member

Golden Member

Diamond Member

Golden Member

Senior member

Lifer

Diamond Member

Golden Member