[bitsandchips]: Pascal to not have improved Async Compute over Maxwell

Page 11 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

flopper

Senior member
Dec 16, 2005
739
19
76
Polaris will support ROV and CR. That's what the "Primitive Discard Acceleration" is all about.

ROV is a performance reducing feature btw.

What is clear to me is that Maxwell 3.0 (Pascal) needs HBM2 to shine. This is why AMD GCN3 (Fiji) is able to keep up with a GTX 980 Ti (GM200) at 4K despite having a lower ROp count and clockspeed.


In conclusion,

We have Maxwell 3.0 facing off against a refined GCN architecture. Micron just announced that mass production of GDDR5x is set for this summer so both AMD and NVIDIA are likely to use GDDR5x. It will be quite interesting to see the end result.

So does lacking Asynchronous compute + graphics matter? Absolutely.

seems good to me.
 

Det0x

Golden Member
Sep 11, 2014
1,455
4,948
136
Polaris will support ROV and CR. That's what the "Primitive Discard Acceleration" is all about.

ROV is a performance reducing feature btw.

What is clear to me is that Maxwell 3.0 (Pascal) needs HBM2 to shine. This is why AMD GCN3 (Fiji) is able to keep up with a GTX 980 Ti (GM200) at 4K despite having a lower ROp count and clockspeed.

NVIDIA went from an 8:1 ratio of ROP to Memory Controller in Kepler (GK110) to a 16:1 ratio with GM20x. In other words Kepler (GK104/GK110) had a ratio of 8 Rops per 64-bit memory controller.

So a 256-bit memory interface would give us 4 x 8 or 32 ROps and a 384-bit memory interface would give us 6 x 8 or 48 ROps.

What NVIDIA did was boost that to 16 ROps per 64-bit memory interface with GM20x. So a 256-bit memory interface now powers 64 ROps and a 384-bit memory interface now powers 96 ROps.

NVIDIA added color compression (Delta) which affects colored pixels and texels but not random pixels and texels in order to make up for the lack of memory bandwidth. It helped out a bit but still couldn't keep up with GCN2s 64 ROps under random scenarios or GCN3s ROps under both random and colored scenarios.
a08b5e6c759e47eee7a7d299762c5368.jpg


What we're looking at, then, is NVIDIAs initial Pascal offerings being some what nice but not delivering the performance people seem to think it will. GP100, paired with HBM2, will be able to deliver deliver the needed bandwidth for NVIDIAs bandwidth starved 96 ROps (Z testing, pixel blending, anti-aliasing etc devours immense amounts of bandwidth). Therefore I don't think we're going to see more than 96 ROps in GP100. What we're instead likely to see are properly fed ROps.

If the "GTX 1080" comes with 10 Gbps GDDR5x memory on a 256-bit memory interface then we'd be looking at the same 64 ROps that the GTX 980 sports and the same 16:1 ratio (8 ROps/64-bit memory controller) but with 320GB/s memory bandwidth as opposed to 224GB/s on the GTX 980. So the GTX 1080 (320GB/s) should deliver a similar performance/clk as a GTX 980 Ti (336GB/s) at 4K despite sporting 64 ROps to the GTX 980 Ti's 96.

NVIDIA will likely clock the reference clocks on the GTX 1080 higher in order to obtain faster performance than a reference clocked GTX 980 Ti. So the performance increase of a GTX 1080 over a GTX 980 Ti will likely be due to higher reference clocks as it pertains to 4K performance.

I also think that the GTX 1080 will sport the same or around the same CUDA cores as a GTX 980 Ti (2,816). I could be entirely off but that's what I think.

As for FP64, NVLink, FP16 support, those are nice for a data centre but mean absolutely nothing for Gamers... Sad I know.

So what we're looking at from NVIDIA, initially, is GTX 980 Ti performance (or slightly higher performance) at a lower price point with GP104. The real fun will start with GP100 by end of 2016/beginning of 2017.


On the RTG/AMD front..

RTG replaced the geometry engines with new geometry proceasors. One notable new feature is primitive discard acceleration, which is something GCN1/2/3 lacked. This allows future GCN4 (Polaris/Vega) to prevent certain primitives from being rasterized. Unseen tessellated meshes are "culled" (removed from the rasterizer's work load). Primitive Discard Acceleration also means that GCN4 will support Conservative Rasterization.

Basically, RTG have removed one of their weakness's in GCN.

As for the hardware scheduling, GCN still uses an Ultra Threaded Dispatcher which is fed by the Graphics Command Processor and ACEs.
22477e8cb761ff52c06077ac89226f79.jpg


AMD replaced the Graphics Command Processor and increased the size of the command buffer (section of the frame buffer/system memory dedicated to keeping many to-be executed commands). The two changes, when coupled together, allow for a boost in performance under single threaded scenarios.

How? My opinion is that if the CPU is busy handling a complex simulation or other CPU heavy work under DX11, you generally get a stall on ther GPU side whereas the GPU idles, waiting on the CPU to finish the work it is doing so that it can continue to feed the GPU.

By increasing the size of the command buffer, more commands can be placed in-waiting so that while the CPU is busy with other work, the Graphics Command Processor still has a lot of buffered commands to execute. This averts a stall.

So 720p/900p/1080p/1440p performance should be great under DX11 and Polaris/Vega.
287f86cbd515fcf14672c2cc80bdb8d2.jpg


Another nifty new feature is instruction prefetching. Instruction prefetch is a technique used in central processor units to speed up the execution of a program by reducing wait states (GPU Idle time).

Prefetching occurs when a processor requests an instruction or data block from main memory before it is actually needed. Once the block comes back from memory, it is placed in a cache (and GCN4 has increased its Cache sizes as well). When the instruction/data block is actually needed, it can be accessed much more quickly from the cache than if it had to make a request from memory. Thus, prefetching hides memory access latency.

In the case of a GPU, the prefetch can take advantage of the spatial coherence usually found in the texture mapping process. In this case, the prefetched data are not instructions, but texture elements (texels) that are candidates to be mapped on a polygon.

This could mean that GCN4 (Polaris/Vega) will be boosting texturing performance without needing to rely on more texturing units. This makes sense when you consider that Polaris will be a relatively small die containing far fewer CUs (Compute Units) than Fiji and that Texture Mapping Units are found in the CUs. By reducing the texel fetch wait times, you can get a more efficient use out of the Texture Mapping Units on an individual basis. Kind of like higher TMU IPC.

On top of all this we have the new L2 cache, improved CUs for better shader efficiency, new memory controllers etc
73d9245013da5fba4ea32f79701e9338.jpg


So what we're looking at from AMD is FuryX performance (or slightly more) at a reduced price point for DX12 and higher than FuryX performance for DX11. Just like with NVIDIA, the real fun starts with Vega by end of 2016/beginning of 2017.


In conclusion,

We have Maxwell 3.0 facing off against a refined GCN architecture. Micron just announced that mass production of GDDR5x is set for this summer so both AMD and NVIDIA are likely to use GDDR5x. It will be quite interesting to see the end result.

So does lacking Asynchronous compute + graphics matter? Absolutely.

Thanks for great post :thumbsup:
 

Adored

Senior member
Mar 24, 2016
256
1
16
This right here proves you know little to nothing about GameWorks, nor about AMD's ProjectCARS woes.

Educate yourself.

You should have educated yourself by reading all of that article then following up with the final link to [H].

This is my favourite part.

Slightly Mad Studio's Ian Bell: Looking through company mails the last I can see they (AMD) talked to us was October of last year.
Same day...

Slightly Mad Studio's Ian Bell: I've now conducted my mini investigation and have seen lots of correspondence between AMD and ourselves as late as March and again yesterday.
So from October 2014 to March 2015. A difference of 6 months, and only 2 months prior to his comment in May 2015. Let's not even bother with the fact that they talked the day previously. Do you expect us to trust anything this guy says?

Need I continue? Try reading your own links.
 
Last edited:
Feb 19, 2009
10,457
10
76
@Adored
These Project Car clowns who try to distort history fail because you can't get rid of facts on the internet, the trail is still there.

They posted on their own forums, from their lead engineer saying PhysX is the problem on AMD because it crunches 600 times per second on the CPU, causing major CPU bottlenecks.

A few weeks later, their PR tries to re-word it, saying PhysX isn't 600 hz, that only some parts of their physics simulation is 600 hz (their only physics engine is PhysX based -_-), that nothing of NV tech is the problem...

They also talked about releasing a DX12 patch, that they saw huge performance gains for AMD, but it never happened. All of these are on the public record. So people here who side with these liars at Slightly Mad Studio, either do so because they don't know the real story, or do so deliberately to troll.

This relates to the future, whether Pascal has DX12 Async Compute or not is not really relevant because NV has $$ to throw around to cripple AMD performance in major games when it comes to the PC port. They do exactly what they accuse AMD of thinking about doing (paranoia!) years ago.

http://www.anandtech.com/show/2549/7

NVIDIA insists that if it reveals it's true feature set, AMD will buy off a bunch of developers with its vast hoards of cash to enable support for DX10.1 code NVIDIA can't run. Oh wait, I'm sorry, NVIDIA is worth twice as much as AMD who is billions in debt and struggling to keep up with its competitors on the CPU and GPU side. So we ask: who do you think is more likely to start buying off developers to the detriment of the industry?

Anand & Derek had guts back then to call it straight!

TLDR; NV will even out any hardware performance deficit via software.
 

thesmokingman

Platinum Member
May 6, 2010
2,302
231
106
"We support Multisample readback, which is about the only dx10.1 feature (some) developers are interested in. If we say what we can't do, ATI will try to have developers do it, which can only harm pc gaming and frustrate gamers."

Oh the irony!!!!!!!
 

Hi-Fi Man

Senior member
Oct 19, 2013
601
120
106
@Adored
These Project Car clowns who try to distort history fail because you can't get rid of facts on the internet, the trail is still there.

They posted on their own forums, from their lead engineer saying PhysX is the problem on AMD because it crunches 600 times per second on the CPU, causing major CPU bottlenecks.

A few weeks later, their PR tries to re-word it, saying PhysX isn't 600 hz, that only some parts of their physics simulation is 600 hz (their only physics engine is PhysX based -_-), that nothing of NV tech is the problem...

They also talked about releasing a DX12 patch, that they saw huge performance gains for AMD, but it never happened. All of these are on the public record. So people here who side with these liars at Slightly Mad Studio, either do so because they don't know the real story, or do so deliberately to troll.

This relates to the future, whether Pascal has DX12 Async Compute or not is not really relevant because NV has $$ to throw around to cripple AMD performance in major games when it comes to the PC port. They do exactly what they accuse AMD of thinking about doing (paranoia!) years ago.

http://www.anandtech.com/show/2549/7



Anand & Derek had guts back then to call it straight!

TLDR; NV will even out any hardware performance deficit via software.

It's embarrassing how your bias is just clouding your vision right now. I normally don't wade into these petty arguments but I thought I might point this out.

xCPXBxc.jpg


I just fired up Project Cars on my 780 with default PhysX settings on (meaning it runs on the GPU if it can). So yes, the devs aren't liars and PhysX really does run on the CPU regardless meaning if PhysX was the problem it would also be a problem on NVIDIA cards too.

I see the same people on these forums all the time always saying the same stuff (NVIDIA does this and that boo!) but they praise AMD no matter what. So when NVIDIA cards run like crap in the new Hitman it's "Oh look GCN ages better etc. " but when Project Cars runs like crap on AMD cards it's always "BOO NVIDIA gimped AMD!". Ridiculous and hypocritical. Before you say I'm some NVIDIA apologist, I've recommended people AMD cards for years and just recently I convinced my friend to buy a 390 because it's an amazing card. Cut the FUD.
xCPXBxc
 

nurturedhate

Golden Member
Aug 27, 2011
1,767
773
136
It's embarrassing how your bias is just clouding your vision right now. I normally don't wade into these petty arguments but I thought I might point this out.

xCPXBxc.jpg


I just fired up Project Cars on my 780 with default PhysX settings on (meaning it runs on the GPU if it can). So yes, the devs aren't liars and PhysX really does run on the CPU regardless meaning if PhysX was the problem it would also be a problem on NVIDIA cards too.

xCPXBxc

Reread what you just typed....
 

2is

Diamond Member
Apr 8, 2012
4,281
131
106
I see the same people on these forums all the time always saying the same stuff (NVIDIA does this and that boo!) but they praise AMD no matter what. So when NVIDIA cards run like crap in the new Hitman it's "Oh look GCN ages better etc. " but when Project Cars runs like crap on AMD cards it's always "BOO NVIDIA gimped AMD!". Ridiculous and hypocritical. Before you say I'm some NVIDIA apologist, I've recommended people AMD cards for years and just recently I convinced my friend to buy a 390 because it's an amazing card. Cut the FUD.
xCPXBxc

Agreed. It was quite interesting then Fury X was released as a 4GB card. The same folks trashing 970/980 for having 4GB didn't see it as much of an issue with Fury X
 

2is

Diamond Member
Apr 8, 2012
4,281
131
106
Reread what you just typed....

It's you that's confused.

Did you look at his screenshot?

It runs of the GPU if it can
Screenshot clearly shows it's running on the CPU meaning the way PC has incorporated PhysX, it runs on the CPU, not the GPU. Even with an NVidia card.
 

Adored

Senior member
Mar 24, 2016
256
1
16
Feb 19, 2009
10,457
10
76
It's embarrassing how your bias is just clouding your vision right now. I normally don't wade into these petty arguments but I thought I might point this out.

Take a look at yourself first bud. Note what I said, it's from their lead engineer, they say PhysX running on the CPU at 600 hz is the problem for AMD.

Nowhere do they or I say it runs on the GPU otherwise.

Do you realize on the console, they would never be able to poll PhysX at that rate, 20x the 30 frames per second. It's just wasteful and hurts AMD on the PC given nobody runs the game at anywhere close to 600 FPS.

Do you know why this only hurts AMD? Think multi-thread DX11 drivers for NV, it's not a problem if PhysX chokes a the main game engine thread.
 
Last edited:

nurturedhate

Golden Member
Aug 27, 2011
1,767
773
136
It's you that's confused.

Did you look at his screenshot?

It runs of the GPU if it can
Screenshot clearly shows it's running on the CPU meaning the way PC has incorporated PhysX, it runs on the CPU, not the GPU. Even with an NVidia card.

I see a screenshot where a portion of the physx was ran on the cpu with no performance data of any kind....

Try again.
 

Adored

Senior member
Mar 24, 2016
256
1
16
It's running on the CPU. If NVIDIA cards were to have an advantage it would need to run on the GPU.

No it's actually much better for Nvidia when it runs on the CPU for both - so long as there are CPU cycles spare. AMD's DX11 driver overhead is real but to get it to this level you have to try really hard to gimp it on purpose.
 
Feb 19, 2009
10,457
10
76
Agreed. It was quite interesting then Fury X was released as a 4GB card. The same folks trashing 970/980 for having 4GB didn't see it as much of an issue with Fury X

Can you find many folks trashing the 970 and 980 for having 4GB? Because I don't recall many even mentioning that as an issue. Or are you just making up fud?
 

C@mM!

Member
Mar 30, 2016
54
0
36
Agreed. It was quite interesting then Fury X was released as a 4GB card. The same folks trashing 970/980 for having 4GB didn't see it as much of an issue with Fury X

Namely because most people understood that with HBM1, it wasn't possible to cram more on, and the bandwidth gain minimised the impact of 4GB at 1440p+.

And people bitched about the 970 due to 512mb of that 4GB buffer dropping to a 64 bit bus due to the way Nvidia sliced the die.
 
Feb 19, 2009
10,457
10
76
Really, just google "project cars PhysX 600hz" and you find a ton of sites covering their PR back-pedaling after their lead engineer posted on that on their official forums.

I take the word of a lead developer who is responsible for the engine over PR fud anyday.

The problem with worse AMD performance also came about when weather effects was added, on their forum where backers complaint about a sudden loss of half their performance.

Guess where that comes from?

Turbulence, GameWork's weather features.

See, these developers say one thing at conferences, but their marketing PR tries to re-spin it later, saying there's no NV tech or GameWorks, no NV sponsorship (yeah right, all those NVIDIA logos everywhere on their tracks!)...

NVIDIA-Project-Cars.jpg


And their idea of working with AMD? They GAVE them "20 Steam Keys" on release to optimize... wow!

Really, I couldn't even make this crap up, this is what their exec said, Ian Bell: http://steamcommunity.com/app/234630/discussions/0/613957600528900678/

He also lied about AMD not contacting them for 6 months, when AMD came out with email records showing they tried to work with these clowns for a long time but to no avail. Ian Bell had to back-track saying he found the emails and were in communications up to the recent time period. -_-

Please next time before you compare lying devs like these guys to Oxide or IO, who gave NV full source code access in alpha, beta, well before release so that NV can release "Game Ready" drivers for ALPHAs and BETAs, think first.
 
Last edited:

Hi-Fi Man

Senior member
Oct 19, 2013
601
120
106
Take a look at yourself first bud. Note what I said, it's from their lead engineer, they say PhysX running on the CPU at 600 hz is the problem for AMD.

Nowhere do they or I say it runs on the GPU otherwise.

Do you realize on the console, they would never be able to poll PhysX at that rate, 20x the 30 frames per second. It's just wasteful and hurts AMD on the PC given nobody runs the game at anywhere close to 600 FPS.

Do you know why this only hurts AMD? Think multi-thread DX11 drivers for NV, it's not a problem if PhysX chokes a the main game engine thread.

So now you want to selectively decide what is true and what is a lie from the devs? Because according to you they also said PhysX doesn't impact rendering at all because it's on a completely separate thread and only consists of about 10% of what the whole physics engine does.

Let's say it's true that NVIDIA's multi threaded drivers are the reason for the boost (highly doubtful considering the PhysX thread is separate from the rendering thread). So when NVIDIA supports a feature that AMD doesn't NVIDIA gets flak but when AMD supports a feature NVIDIA doesn't/well (Async) NVIDIA still gets flak. That's called a double standard.
 

kondziowy

Senior member
Feb 19, 2016
212
188
116
The fanboy wars are real. But as we have multiple documented cases with Nvidia gimping performance, there is no such thing on AMD side. And Nvidia even gimps their own cards like in gtx970 (and then call it good design) so it's insane to see anyone defending them, and I just can't comprehend it.
 

Hi-Fi Man

Senior member
Oct 19, 2013
601
120
106
No it's actually much better for Nvidia when it runs on the CPU for both - so long as there are CPU cycles spare. AMD's DX11 driver overhead is real but to get it to this level you have to try really hard to gimp it on purpose.

Again how is this NVIDIA's fault?
 
Last edited:

Hi-Fi Man

Senior member
Oct 19, 2013
601
120
106
I see a screenshot where a portion of the physx was ran on the cpu with no performance data of any kind....

Try again.

When the PhysX indicator says CPU it means all PhysX is run on the CPU. I don't see how performance data is needed to get my point across.
 

Hi-Fi Man

Senior member
Oct 19, 2013
601
120
106
The fanboy wars are real. But as we have multiple documented cases with Nvidia gimping performance, there is no such thing on AMD side.

Source? Both companies sponsor games and have deals with devs, it's no secret that usually these sponsored games perform better on the sponsors hardware. Saying only one company does these things is simply not true. I for one don't even care if they both do it as long as I don't have to read these petty forum wars I'm fine.
 
Feb 19, 2009
10,457
10
76
So now you want to selectively decide what is true and what is a lie from the devs?

Because one is a statement from the guy responsible for coding the engine.

The other is back-track from their marketing PR, who tried to distance themselves from NVIDIA with absurd claims that they didn't use NV tech, NV didn't sponsor them...

When there's NV logos on their tracks everywhere, NV's marketing featuring the game and there's the guys coding the game presenting at conferences showing evidence to the contrary.

aTRJBx1.jpg


Doesn't take a genius to figure out who are lying scumbags when their exec has blatantly lied about communication with AMD and was forced to back-track publicly.

Let's not rehash this issue again anytime soon because I already beat this horse the last time someone tried to go revisionist, again.