NVIDIA Pascal Thread

exar333 · May 4, 2016

Not sure if I am comforted or disappointed that you see the same issues on the EA forum where a bumpgate article devolves into a AMD vs. NV rant. :/

Anyway, I don't think it matters much for the consumer Pascal products as it is HBM related. Charlie fishing for hits...

airfathaaaaa · May 4, 2016

exar333 said:
Not sure if I am comforted or disappointed that you see the same issues on the EA forum where a bumpgate article devolves into a AMD vs. NV rant. :/

Anyway, I don't think it matters much for the consumer Pascal products as it is HBM related. Charlie fishing for hits...

well for now nah it doesnt matter...
but in 8 months we will get volta

antihelten · May 4, 2016

airfathaaaaa said:
well for now nah it doesnt matter...
but in 8 months we will get volta

Volta is planned for 2018, not 2017, so a fair bit more than 8 months.

R0H1T · May 4, 2016

exar333 said:
Not sure if I am comforted or disappointed that you see the same issues on the EA forum where a bumpgate article devolves into a AMD vs. NV rant. :/

Anyway, I don't think it matters much for the consumer Pascal products as it is HBM related. Charlie fishing for hits...

Au contraire, if they don't make enough $ revenues & profits from their HPC cards where do you think the next target would be? If the GP100 isn't a big hit, the Pascal consumer cards will be milked way harder than Maxwell & we'll see evidence of that soon enough D:

antihelten · May 4, 2016

R0H1T said:
Au contraire, if they don't make enough $ revenues & profits from their HPC cards where do you think their next target would be? If the GP100 isn't a big hit, the Pascal consumer cards will be milked way harder than Maxwell & we'll see evidence of that soon enough D:

If it was possible for Nvidia to milk consumer cards anymore than they are doing now, then they would already have done so, otherwise they wouldn't be running their business properly (with regards to profit maximizing).

If consumer Pascal is more "milkable" than Maxwell, then it won't be because of GP100 failing, it will because of AMD failing to provide a competitive product.

airfathaaaaa · May 4, 2016

antihelten said:
Volta is planned for 2018, not 2017, so a fair bit more than 8 months.

when they changed the roadmap again? or they added volta after they added pascal?

antihelten · May 4, 2016

airfathaaaaa said:
when they changed the roadmap again? or they added volta after they added pascal?

I'm pretty sure Volta has always been 2018 on Nvidia's roadmaps (except for the ones where it didn't have any date at all)

R0H1T · May 4, 2016

antihelten said:
If it was possible for Nvidia to milk consumer cards anymore than they are doing now, then they would already have done so, otherwise they wouldn't be running their business properly (with regards to profit maximizing).

If consumer Pascal is more "milkable" than Maxwell, then it won't be because of GP100 failing, it will because of AMD failing to provide a competitive product.

Not necessarily, the new buzzword being mindshare. There are enough products in the AMD stack that're way better VFM than anything Nvidia has to offer & yet the "970gate" & disabling OC on mobile Maxwell, without serious repercussions, shows us the power of brand Nvidia. So, no I don't AMD has too much to do with the success Nvidia has had in the last two years, even more baffling after GCN's DX12 exploits!

airfathaaaaa said:
when they changed the roadmap again? or they added volta after they added pascal?

You answered that yourself 😀

antihelten · May 4, 2016

R0H1T said:
Not necessarily, the new buzzword being mindshare. There are enough products in the AMD stack that're way better VFM than anything Nvidia has to offer & yet the "970gate" & disabling OC on mobile Maxwell, without serious repercussions, shows us the power of brand Nvidia. So, no I don't AMD has too much to do with the success Nvidia has had in the last two years, even more baffling after GCN's DX12 exploits!

I didn't say anything about AMD having anything to do with the success Nvidia has had for the last two years, I said that if Nvidia is going to milk consumer Pascal more than they are already milking Maxwell (i.e. something that would happen in the future not the past), then it would be due to AMD failing (with Polaris/Vega), not because of GP100 failing.

airfathaaaaa · May 4, 2016

antihelten said:
I'm pretty sure Volta has always been 2018 on Nvidia's roadmaps (except for the ones where it didn't have any date at all)

volta didnt existed on any roadmap till early 2015 it was pascal in the place of volta(2018 didnt even existed back then) it was 2016/17

xpea · May 4, 2016

antihelten said:
Volta is planned for 2018, not 2017, so a fair bit more than 8 months.

Volta HPC is scheduled to ship in 2017 for Summit and Sierra supercomputers:
http://www.anandtech.com/show/8727/nvidia-ibm-supercomputers

Geforce gaming Volta (GV104) should follow very quickly, maybe with the same timeline as Pascal with availability around Computex 2017

Ajay · May 4, 2016

xpea said:
Volta HPC is scheduled to ship in 2017 for Summit and Sierra supercomputers:
http://www.anandtech.com/show/8727/nvidia-ibm-supercomputers

Geforce gaming Volta (GV104) should follow very quickly, maybe with the same timeline as Pascal with availability around Computex 2017

That article is from 2014 - I wouldn't put much faith in those dates.

xpea · May 4, 2016

Ajay said:
That article is from 2014 - I wouldn't put much faith in those dates.

still you already find GV100 (big Volta) reference in the last CUDA library...

tviceman · May 4, 2016

My final prediction:

Highest tier GP104 at this release - 25-30% faster than GTX 980 TI $599-649
Second tier GP104 - 5-10% faster than GTX 980 TI - $449
Third tier GP104 - 10% slower than GTX 980 TI - $329

That would put the 3rd tier card about 33% faster than a reference GTX 970 for about $30 more and the second tier card about 33-40% faster than a GTX 980 for around the same prices as the 980 is currently going for. People currently with ~GTX 970 performance or less might find the $449 price bracket appealing for a >=60% upgrade in performance.

Sweepr · May 4, 2016

Very informative post by sebbbi @ Beyond3D.

It seems that people are still confusing terms "async compute", "async shaders" and "compute queue". Marketing and press doesn't seem to understand the terms properly and spread the confusion 🙂

Hardware:
AMD - Each compute unit (CUs) on GCN can run multiple shaders concurrently. Each CU can run both compute (CS) and graphics (PS/VS/GS/HS/DS) tasks concurrently. The 64 KB LDS (local data store) inside a CU is dynamically split between currently running shaders. Graphics shaders also use it for intermediate storage. AMD calls this feature "Async shaders".

Intel / Nvidia: These GPUs do not support running graphics + compute concurrently on a single compute unit. One possible reason is the LDS / cache configuration (GPU on chip memory is configured differently when running graphics - CUDA even allows direct control for it). There most likely are other reasons as well. According to Intel documentation it seems that they are running the whole GPU either in compute mode or graphics mode. Nvidia is not as clear about this. Maxwell likely can run compute and graphics simultaneously, but not both in the same "shader multiprocessor" (SM).

Async compute = running shaders in the compute queue. Compute queue is like another "CPU thread". It doesn't have any ties to the main queue. You can use fences to synchronize between queues, but this is a very heavy operation and likely causes stalls. You don't want to do more than a few fences (preferably one) per frame. Just like "CPU threads", compute queue doesn't guarantee any concurrent execution. Driver can time slice queues (just like OS does for CPU threads when you have more threads than the CPU core count). This can still be beneficial if you have big stalls (GPU waiting for CPU for instance). AMDs hardware works a bit like hyperthreading. It can feed multiple queues concurrently to all the compute units. If a compute units has stalls (even small stalls can be exploited), the CU will immediately switches to another shader (also graphics<->compute). This results in higher GPU utilization.

You don't need to use the compute queue in order to execute multiple shaders concurrently. DirectX 12 and Vulkan are by default running all commands concurrently, even from a single queue (at the level of concurrency supported by the hardware). The developer needs to manually insert barriers in the queue to represent synchronization points for each resource (to prevent read<->write hazards). All modern GPUs are able to execute multiple shaders concurrently. However on Intel and Nvidia, the GPU is running either graphics or compute at a time (but can run multiple compute shaders or multiple graphics shaders concurrently). So in order to maximize the performance, you'd want submit large batches of either graphics or compute to the queue at once (not alternating between both rapidly). You get a GPU stall ("wait until idle") on each graphics<->compute switch (unless you are AMD of course).

If you assume that a single Pascal SM cannot run mixed graphics + compute then splitting the MPs should improve the granularity. Compute and graphics might also share some higher level (more global) resources as well. Nvidia has quite sophisticated load balancing in their geometry processing. Distributed geometry data needs to be stored somewhere (SM L1 at least is partially pinned for graphics work, see this presentation: http://on-demand.gputechconf.com/gtc/2016/video/S6138.html). Also, Nvidia doesn't have separate ROP caches (AMD still does). Some portion of their L2 needs to serve ROPs when rendering graphics. This might be transparent (just another client of the cache) or might be statically pinned based on the GPU state. I don't know 🙂

https://forum.beyond3d.com/threads/...eculation-rumors-and-discussion.56719/page-72

Cookie Monster · May 4, 2016

Sweepr said:
Very informative post by sebbbi @ Beyond3D.

https://forum.beyond3d.com/threads/...eculation-rumors-and-discussion.56719/page-72

Its a very good post.

Now the question begs, in Maxwell when running parallel graphics+compute tasks, does the entire GPU need to flip between graphics <-> compute modes?

Its interesting to see if in Pascal, if they could do this on a TPC or GPC level which would undoubtedly make it better than the entire GPU having to change modes. I doubt they can do it in SM level (similiar to GCN where they can do it in CU level).

Game_dev · May 4, 2016

R0H1T said:
Not necessarily, the new buzzword being mindshare. There are enough products in the AMD stack that're way better VFM than anything Nvidia has to offer & yet the "970gate" & disabling OC on mobile Maxwell, without serious repercussions, shows us the power of brand Nvidia. So, no I don't AMD has too much to do with the success Nvidia has had in the last two years, even more baffling after GCN's DX12 exploits!You answered that yourself 😀

AMD has had plenty of mistakes to affect their "mind share"

Hawaii was too hot and loud
Fiji was too slow and memory limited
Crimson killed cards
Terrible directx 11 driver overhead
Terrible Linux drivers
Ect.

Cookie Monster · May 4, 2016

Game_dev said:
AMD has had plenty of mistakes to affect their "mind share"

Hawaii was too hot and loud
Fiji was too slow and memory limited
Crimson killed cards
Terrible directx 11 driver overhead
Terrible Linux drivers
Ect.

When did this occur??

Silverforce11 · May 4, 2016

Cookie Monster said:
Its a very good post.

Now the question begs, in Maxwell when running parallel graphics+compute tasks, does the entire GPU need to flip between graphics <-> compute modes?

Its interesting to see if in Pascal, if they could do this on a TPC or GPC level which would undoubtedly make it better than the entire GPU having to change modes. I doubt they can do it in SM level (similiar to GCN where they can do it in CU level).

He said so right in that quote.

"However on Intel and Nvidia, the GPU is running either graphics or compute at a time (but can run multiple compute shaders or multiple graphics shaders concurrently). So in order to maximize the performance, you'd want submit large batches of either graphics or compute to the queue at once (not alternating between both rapidly). You get a GPU stall ("wait until idle") on each graphics<->compute switch (unless you are AMD of course)."

It's the context switch that hurts Kepler and Maxwell. Even without Async Compute, just normal graphics > compute serial rendering.

The more compute you have in a game, the worse it tanks due to this slow context switch.

NV actually say the same thing since late 2014 in their VR programming guide official PDF to developers. A compute async timewarp can and do get stuck behind a graphics draw call, even on priority preemption queue, it doesn't work.

Pascal resolves this flaw in the uarch. No slow context switch for graphics <-> compute workloads, supports fine-grained preemption, where a priority compute queue can SUSPEND a graphics queue in process and proceed immediately as it should.

This is why I say don't underestimate Pascal's uarch gains over Maxwell. The effect will be more pronounced in recent games with more compute usage. Example, Quantum Break, that has a lot of compute and copy queues mixed in with graphics all the time (per GPUView results), Pascal will easily crush Maxwell.

Game_dev · May 4, 2016

Cookie Monster said:
When did this occur??

Recently. Probably got swept under the rug here.

http://m.hardocp.com/news/2015/11/29/users_report_amd_crimson_driver_heating_killing_gpus/

raghu78 · May 4, 2016

Cookie Monster said:
When did this occur??

oh leave it. He must have been confused with Geforce drivers bricking cards. 😀

http://wccftech.com/nvidias-latest-game-ready-driver-allegedly-killing-gpus-plagued-issues/

anyway on topic the 1080 is going to be significantly faster due to GDDR5X. my guess is 20-25%. This means Nvidia can get more users to pick 1080 over 1070 because overclocking 1070 might not be enough to catch 1080.

Arachnotronic · May 4, 2016

Sweepr said:
Very informative post by sebbbi @ Beyond3D.

https://forum.beyond3d.com/threads/...eculation-rumors-and-discussion.56719/page-72

Thanks for sharing, this was an informative read.

Bacon1 · May 4, 2016

Game_dev said:
Recently. Probably got swept under the rug here.

http://m.hardocp.com/news/2015/11/29/users_report_amd_crimson_driver_heating_killing_gpus/

http://lmgtfy.com/?q=nvidia+drivers+killing+gpu

You'll see Nvidia has been proven to kill far more GPUs, with multiple releases.

I've yet to see anyone actually have their AMD card die from Crimson, since the hardware had failsafes and the fans were just limited, not completely turned off / disabled.

Good to see you are still trolling though.

Game_dev · May 4, 2016

Like I said "swept under the rug".

This sort of troll posting is not acceptable to discussion in VC&G.

esquared
Anandtech Forum Director

Arachnotronic · May 4, 2016

raghu78 said:
anyway on topic the 1080 is going to be significantly faster due to GDDR5X. my guess is 20-25%. This means Nvidia can get more users to pick 1080 over 1070 because overclocking 1070 might not be enough to catch 1080.

As long as they can make the additional price worth it, I agree. The GTX 980 was actually very poorly positioned against the GTX 970, with the 970 offering a staggeringly better value (for the price of one 980 + $150, you could get GTX 970 in SLI) at launch.

Here's my guess, on the product stack:

GP104-400 -> GTX 1080 Ti @ $649
GP104-200 -> GTX 1080 @ $499
GP104-150 -> GTX 1070 @ $339

Whenever GP102 drops, I expect that this will be priced at $799 for the cut down version, and $999 for a full-blown Titan version. I expect these to be branded GeForce GTX 1090 and GeForce GTX Titan [whatever]. I doubt that dual GPU flagships will be in vogue anytime soon, so the X90 can be reserved for very high end single GPUs.

If I were in charge of branding these cards, I would absolutely NOT call GP104-150 a "1060 Ti" or even a "1060" because, on some level, customers will feel that last gen they got an X70 card for $339 and now this generation they get only an X60 card. Extremely bad for business/image.

NVIDIA must make it clear to customers that they are getting a better value today than they did before Pascal launched, and the branding scheme I outlined above would do that.

NVIDIA Pascal Thread

Diamond Member

Senior member

Golden Member

Platinum Member

Golden Member

Senior member

Golden Member

Platinum Member

Golden Member

Senior member

Senior member

Lifer

Senior member

Diamond Member

Diamond Member

Diamond Member

Member

Diamond Member

Lifer

Member

Diamond Member

Lifer

Diamond Member

Member

Lifer