• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

[bitsandchips]: Pascal to not have improved Async Compute over Maxwell

Page 20 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Or it means NVIDIA architecture is better than AMD at fully utilizing the GPU in that particular application. If AMD could fix their utilization issue suddenly async compute wouldn't look as cool.

1, 2, 3..waiting for the irate replies of those that still don't understand how async compute works/what it does and doesn't do.
 
Or it means NVIDIA architecture is better than AMD at fully utilizing the GPU in that particular application. If AMD could fix their utilization issue suddenly async compute wouldn't look as cool.

Or you can continue making ill-informed comments that clearly demonstrate your lack of understanding of the subject matter overall while still thinking you are correct.
 
Or you can continue making ill-informed comments that clearly demonstrate your lack of understanding of the subject matter overall while still thinking you are correct.

I don't really believe that he thinks he's correct. He's putting too much effort into discrediting it. If there was true confusion, I don't think he would be bringing AMD into it.
 
Or it means NVIDIA architecture is better than AMD at fully utilizing the GPU in that particular application. If AMD could fix their utilization issue suddenly async compute wouldn't look as cool.

1, 2, 3..waiting for the irate replies of those that still don't understand how async compute works/what it does and doesn't do.

Async is part of how they're addressing their utilization problem though - it's hardly a bad idea and will probably be necessary for both as GPUs keep getting wider and used for more general tasks. NV have done a better job with GPU utilization to date, but they're going to hit the same issues eventually.
 
It s tailored toward async compute, either your hardware support it and there s some gain or it doesnt support it and at best there s no losses...

I guess that you didnt get that async compute is about gaining something...
God, for the thousand time, how can you gain performance if your resource is already fully utilized ? (ie nvidia). You can't go 110% !
You can only gain performance with DX12 async if you have idling ALUs under DX11. Thats the whole story of CGN poor utilization under DX11 and the reason why AMD much higher FLOPs never materialize in performance lead (under DX11).

what is so difficult to understand ?😵
 
Or it means NVIDIA architecture is better than AMD at fully utilizing the GPU in that particular application. If AMD could fix their utilization issue suddenly async compute wouldn't look as cool.

1, 2, 3..waiting for the informed replies of those that still don't understand how async compute works/what it does and doesn't do.

FTFY ...

It could be a number of things. We need performance counters to truly tell how well the Nvidia microarchitecture is been utilized ...

You can not say if it has async compute or not due to near maximum utilization nor could you claim high utilization without measuring hardware throughput ...

It would be nice to see some async compute experiments with several 16K resolution shadow maps and a depth pre-pass for deferred lighting while a very heavy compute shader is running ...

I know how well Nvidia competitor's hardware would react but as for their own latest Pascal architecture it's very ambiguous and I don't find that to be reassuring one bit ... 🙂

The killer app of async compute is been able to overlap compute work with rasterizer and texture sampler bound work ...

FWIW, I agree with you that hardware should be ideal for the software yet the same is true for the other way as well ...
 
Last edited:
God, for the thousand time, how can you gain performance if your resource is already fully utilized ? (ie nvidia). You can't go 110% !
You can only gain performance with DX12 async if you have idling ALUs under DX11. Thats the whole story of CGN poor utilization under DX11 and the reason why AMD much higher FLOPs never materialize in performance lead (under DX11).

what is so difficult to understand ?😵
That is why AMD spent so much effort to lay down the software groundwork for PC gaming as well as capturing the console chip deals. To sow the seed for the architecture that they designed to bear fruits.
 
Or it means NVIDIA architecture is better than AMD at fully utilizing the GPU in that particular application. If AMD could fix their utilization issue suddenly async compute wouldn't look as cool.

1, 2, 3..waiting for the irate replies of those that still don't understand how async compute works/what it does and doesn't do.

Oh boy. When you ended up proving your opponents point, but you didn't even realize it. So wait Nvidia implemented a feature because they already have perfect or near perfect utilization? If you have perfect utilization you would just say async compute is BS here is our already perfect utilization because our hardware and software rocks. They didn't though. They tasked their driver team to implement dynamic load balancing and then demo'd it . Therefore they acknowledged they have gaps in their utilization or they are straight up lying for marketing purposes. Not only that, but scene to scene never takes the exact same resources. To claim otherwise is just plain silly. I think we can safely assume Nvidia is attempting to tackle an issue that is very much real.
 
Oh boy. When you ended up proving your opponents point, but you didn't even realize it. So wait Nvidia implemented a feature because they already have perfect or near perfect utilization? If you have perfect utilization you would just say async compute is BS here is our already perfect utilization because our hardware and software rocks. They didn't though. They tasked their driver team to implement dynamic load balancing and then demo'd it . Therefore they acknowledged they have gaps in their utilization or they are straight up lying for marketing purposes. Not only that, but scene to scene never takes the exact same resources. To claim otherwise is just plain silly. I think we can safely assume Nvidia is attempting to tackle an issue that is very much real.

No, better utilization does not mean perfect utilization, I think you're deliberately misunderstanding what people are saying.
 
Of course he's deliberately misunderstanding. Not only I never talked about perfect utilization, I was also clearly referring to that specific application. To not mention I wrote many times before async compute is a *great feature*. Unfortunately it's also the most opportunistically misunderstood feature.
 
Thing I don't understand is this:

If concurrent graphics+compute didn't matter then why would Sony have specifically requested for an additional 6 ACEs in GCN? Why spend money on the hardware if it isn't important?

Why have so many prominent devs post about the benefits of asynchronous compute? Is it common for them to tweet about trivial hardware features?

About Pascal, I'm just glad there isn't a performance regression in dx12, going to be very good for all owners of GCN cards.
 
Thing I don't understand is this:

If concurrent graphics+compute didn't matter then why would Sony have specifically requested for an additional 6 ACEs in GCN? Why spend money on the hardware if it isn't important?

Why have so many prominent devs post about the benefits of asynchronous compute? Is it common for them to tweet about trivial hardware features?

About Pascal, I'm just glad there isn't a performance regression in dx12, going to be very good for all owners of GCN cards.

Because some people here deliberately spread FUD about Async Compute.

In DX12/Vulkan's programming guide, it's simply referred to as Multi-Engine. Look up the documents.

This Multi-Engine API allows different queues to run concurrently IF the hardware is capable.

Graphics, Compute, Copy.

Sony specifically wanted this Multi-Engine feature, and their lead architecture made an example point, when you're rendering Shadow Maps, you are only using the Rasterizer (ROPs which also handle other types of workloads), the Shaders are idling. It is in this situation which you can run Compute queues separately so that both ROPs + Shader Clusters / CUs are both performing work concurrently.

Add to this, Copy queues can directly work on DMAs engines (GCN has 2 active, Kepler/Maxwell has 2, but 1 is disabled in DX, only 2 accessible via CUDA for some reason), to get transfers going concurrently.

Without hardware support for this feature, no matter how great your shader utilization is, you cannot use concurrent Rasterizer and DMAs while the shaders are running. That is a FLAW for prior APIs & GPUs which lack Multi-Engine / "Async Compute" hardware.

Required reading for some people, before they keep on spreading more FUD!

https://msdn.microsoft.com/en-us/library/windows/desktop/dn899217(v=vs.85).aspx

http://www.gamasutra.com/view/feature/191007/inside_the_playstation_4_with_mark_.php

Cerny expects developers to run middleware -- such as physics, for example -- on the GPU. Using the system he describes above, you can run at peak efficiency, he said.

"If you look at the portion of the GPU available to compute throughout the frame, it varies dramatically from instant to instant. For example, something like opaque shadow map rendering doesn't even use a pixel shader, it’s entirely done by vertex shaders and the rasterization hardware -- so graphics aren't using most of the 1.8 teraflops of ALU available in the CUs. Times like that during the game frame are an opportunity to say, 'Okay, all that compute you wanted to do, turn it up to 11 now.'"

http://ext3h.makegames.de/DX12_Compute.html

ps. If anyone who makes a remark about how Async Compute is only useful for GPUs with less shader utilization, or how its ineffective for GPUs that have 100% shader utilization, they don't know what they are talking about and just keep on regurgitating the same fud.
 
Last edited:
Thing I don't understand is this:



If concurrent graphics+compute didn't matter then why would Sony have specifically requested for an additional 6 ACEs in GCN? Why spend money on the hardware if it isn't important?



Why have so many prominent devs post about the benefits of asynchronous compute? Is it common for them to tweet about trivial hardware features?



About Pascal, I'm just glad there isn't a performance regression in dx12, going to be very good for all owners of GCN cards.



Who said async compute doesn't matter?
 
Having the ability to do proper Async Compute doesnt say anything about performance improvements.

Ashes is tailored towards AMD. So why would anyone expect the same or any gain on nVidia hardware?

You've repeated this falsehood too many times. Everyone knows that the source code is shared and that the dev has worked with nVidia specifically requesting an update to fix it. The hardware simply hasn't been capable.
 
Add to this, Copy queues can directly work on DMAs engines (GCN has 2 active, Kepler/Maxwell has 2, but 1 is disabled in DX, only 2 accessible via CUDA for some reason), to get transfers going concurrently.

Only some cards have both DMA engines enabled. Last I checked, it was Quadros and Titans, and it was disabled on Geforce, though that could have changed. It also used to be the case that one engine was dedicated to uploads and one to downloads (when 2 are enabled), though that could also have changed.
 
Or it means NVIDIA architecture is better than AMD at fully utilizing the GPU in that particular application. If AMD could fix their utilization issue suddenly async compute wouldn't look as cool.

1, 2, 3..waiting for the irate replies of those that still don't understand how async compute works/what it does and doesn't do.

You seem to not understand that without async compute certain tasks have to wait until another task is completed before it can run. That creates a stall. Async allows these tasks to run simultaneously. Without it, they simply can't do that. nVidia can't possibly do without it what AMD, or any hardware, does with it. nVidia is the one who has something to fix. Not AMD.
 
You seem to not understand that without async compute certain tasks have to wait until another task is completed before it can run. That creates a stall. Async allows these tasks to run simultaneously. Without it, they simply can't do that. nVidia can't possibly do without it what AMD, or any hardware, does with it. nVidia is the one who has something to fix. Not AMD.


This has absolutely nothing to do with what I said. If you have something relevant to say please go ahead, otherwise ignore me.
 
Your rate of flip flopping is higher than a quantum qubit D:


A quantum bit doesn't flip flop. That's the very nature of quantum superposition. Quantum mechanics is clearly not your cup of tea.

Also I never said async compute is useless, unnecessary, or anything of that sort, quite the contrary. But who am I to try showing people that they still don't understand (or don't want to understand) async compute?!
 
Last edited:
Back
Top