computerbaseAshes of the Singularity Beta1 DirectX 12 Benchmarks

Page 18 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

dogen1

Senior member
Oct 14, 2014
739
40
91
I wasn't aware you can run GPUView in a DX11 game and it splits up the queues in the DX12 fashion.

The question though is if they can run in parallel, why is it GPU PhysX hurts graphics performance so much? That goes against the point of async compute.

Well, physx can be pretty demanding, and async compute is really only a benefit if the tasks you want to run concurrently use different parts of the gpu.
 
Feb 19, 2009
10,457
10
76
Possibly because nVidia doesn't have dedicated compute engines and it uses CU/CC resources that would normally be doing render work.

And this is the key point here.

They have just a single engine. It is not capable of running graphics + compute queues together. Hyper-Q is an extension for the compute queue, expanding it to 32.

In pure CUDA mode, Maxwell can push through a lot of compute. But with graphics in the mix, we see compute tasks such as GPU PhysX drop graphics performance.

The proof that Maxwell is capable of graphics + compute asynchronously would be a GPU PhysX game that performs like so in comparison to Kepler:

Kepler:

GPU PhysX off = ~90 fps
GPU PhysX on = ~60 fps (this is the norm for many GPU PhysX titles that use lots of effects).

Maxwell:

GPU PhysX off = ~100 fps (No extra effects)
GPU PhysX on = ~95 fps (Lots of extra effects, a very tiny performance drop related to draw calls of the new effects which are accelerated by CUDA parallel compute).

This kind of result would be very conclusive that Maxwell is capable of parallel compute execution that does not delay graphics rendering, in comparison to Kepler who according to NV, is incapable even with CUDA.
 

Mahigan

Senior member
Aug 22, 2015
573
0
0
And this is the key point here.

They have just a single engine. It is not capable of running graphics + compute queues together. Hyper-Q is an extension for the compute queue, expanding it to 32.

In pure CUDA mode, Maxwell can push through a lot of compute. But with graphics in the mix, we see compute tasks such as GPU PhysX drop graphics performance.

The proof that Maxwell is capable of graphics + compute asynchronously would be a GPU PhysX game that performs like so in comparison to Kepler:

Kepler:

GPU PhysX off = ~90 fps
GPU PhysX on = ~60 fps (this is the norm for many GPU PhysX titles that use lots of effects).

Maxwell:

GPU PhysX off = ~100 fps (No extra effects)
GPU PhysX on = ~95 fps (Lots of extra effects, a very tiny performance drop related to draw calls of the new effects which are accelerated by CUDA parallel compute).

This kind of result would be very conclusive that Maxwell is capable of parallel compute execution that does not delay graphics rendering, in comparison to Kepler who according to NV, is incapable even with CUDA.
I think I see what you're saying. I also think I figured out something.

It could be that, in GPUView, what we're seeing are both compute and graphics tasks in the graphics queue (3D queue). As you would expect under DX11 or lower. In this shot:
ff1a0ef0f49cd9f448fda05913a0ad19.jpg


And when compute tasks, those using CUDA, are executed in the compute queue they are executed concurrently to compute tasks in the Graphics queue (3D queue).

This would give us the illusion of Asynchronous Compute + graphics working when in reality we're only seeing concurrent executions of compute tasks.

This would, in effect, affect graphics performance, by delaying the execution of graphics tasks, and lead to the results you're mentioning.

If that's the case then nvidia's lack of Asynchronous compute compliance under DX12 is not simply a trivial matter of non compliance to barrier wait times.

Non compliance would be due to Maxwell v2 lacking the available hardware. This would make it very unlikely for Pascal to rectify the problem.

It would also tie in with nvidia's lack of finer-grained pre-emption, which is also due to a lack of dedicated Asynchronous Compute Engines. Of course the flexibility of the ACEs also play a role in terms of AMDs preemption strengths.

:/
 
Last edited:

dogen1

Senior member
Oct 14, 2014
739
40
91
This kind of result would be very conclusive that Maxwell is capable of parallel compute execution that does not delay graphics rendering, in comparison to Kepler who according to NV, is incapable even with CUDA.

If the physx calculations for a frame take longer than the amount of time the compute units(or cuda cores) are idle, then your frame is gonna take longer to render. Async compute doesn't guarantee free anything.
 
Feb 19, 2009
10,457
10
76
If the physx calculations for a frame take longer than the amount of time the compute units(or cuda cores) are idle, then your frame is gonna take longer to render. Async compute doesn't guarantee free anything.

Yep, but it would still be a heck of a lot faster than running compute + graphics in serial mode where the graphics can't even start until the compute is done.

Async Compute as a feature doesn't guarantee free performance for effects that use compute, but it will generally speed it up, and with the right application can lead to straight "free" performance, such as doing the copy queue in parallel to prepare the next frame, or shadow maps that don't use graphics shaders.

This explains it quite well: https://youtu.be/H1L4iLIU9xU?t=14m48s

Note, not all compute of Async Compute will be "free" performance, just those that tend not to use the same portion of the shaders. If the compute task uses the same graphics shaders, in Async Mode, they reduce "idle" time.
 
Last edited:

Shivansps

Diamond Member
Sep 11, 2013
3,917
1,570
136
Yep, but it would still be a heck of a lot faster than running compute + graphics in serial mode where the graphics can't even start until the compute is done.

Async Compute as a feature doesn't guarantee free performance for effects that use compute, but it will generally speed it up, and with the right application can lead to straight "free" performance, such as doing the copy queue in parallel to prepare the next frame, or shadow maps that don't use graphics shaders.

This explains it quite well: https://youtu.be/H1L4iLIU9xU?t=14m48s

Note, not all compute of Async Compute will be "free" performance, just those that tend not to use the same portion of the shaders. If the compute task uses the same graphics shaders, in Async Mode, they reduce "idle" time.

Do you realise what you are asking right? devs dont even do a proper cpu async methods to make use of idle cpu time on unused cores. MT today its mostly creating mutiple "main" threads for diferent tasks, like AI, Physics, etc... and thats not the best way to do it.

So it will take a while to even consider about using it, and if it only works on AMD i can fully guaranteed you that there is no chance until nvidia has it as well for a while.

Its not about nvidia paying anything, its how things works.
 
Last edited:

Mahigan

Senior member
Aug 22, 2015
573
0
0
AMD Fury DX12 Beyond3D test:
3cbe68631d3cbdc07c6edb547402a018.jpg



Nvidia GTX 970 Nvidia DX11 + PhysX test:
d9b8fcf0760545146f9c8545bd29383d.jpg


I honestly can't tell if Maxwell is totally incapable of Asynchronous compute + graphics or simply incapable under DX12.

It's not possible to tell which tasks are compute and which are graphics in GPUView's 3D queue.
 
Feb 19, 2009
10,457
10
76
Do you realise what you are asking right? devs dont even do a proper cpu async methods to make use of idle cpu time on unused cores. MT today its mostly creating mutiple "main" threads for diferent tasks, like AI, Physics, etc... and thats not the best way to do it.

So it will take a while to even consider about using it, and if it only works on AMD i can fully guaranteed you that there is no chance until nvidia has it as well for a while.

Its not about nvidia paying anything, its how things works.

This is why I have said this several times before.

IF AMD wants async compute to be used in games, they need to be actively involved to help developers do it. Basically they need to step up and sponsor developers.

GPUOpen and DX12 features sound neat, but devs aren't going to go the extra mile out of sheer goodwill. It takes effort to implement features properly.

This is why the Hitman announcement, AMD specifically mention helping the devs.

AMD is once again partnering with IO Interactive to bring an incredible Hitman gaming experience to the PC. As the newest member to the AMD Gaming Evolved program, Hitman will feature top-flight effects and performance optimizations for PC gamers.

Hitman will leverage unique DX12 hardware found in only AMD Radeon GPUs—called asynchronous compute engines—to handle heavier workloads and better image quality without compromising performance. PC gamers may have heard of asynchronous compute already, and Hitman demonstrates the best implementation of this exciting technology yet. By unlocking performance in GPUs and processors that couldn’t be touched in DirectX 11, gamers can get new performance out of the hardware they already own.
AMD is also looking to provide an exceptional experience to PC gamers with high-end PCs, collaborating with IO Interactive to implement AMD Eyefinity and ultrawide support, plus super-sample anti-aliasing for the best possible AA quality.

This partnership is a journey three years in the making, which started with Hitman: Absolution in 2012, a top seller in Europe and widely critically acclaimed. PC technical reviewers lauded all the knobs and dials that pushed GPUs of the time to their limit. That was no accident. With on-staff game developers, source code and effects, the AMDGaming Evolved program helps developers to bring the best out of a GPU. And now in 2016, Hitman gets the same PC-focused treatment with AMD and IO Interactive to ensure that the series’ newest title represents another great showcase for PC gaming!

They basically need to do this else I don't see developers pushing GPUOpen or Async Compute for the PC port.
 
Last edited:

Shivansps

Diamond Member
Sep 11, 2013
3,917
1,570
136
This is why I have said this several times before.

IF AMD wants async compute to be used in games, they need to be actively involved to help developers do it. Basically they need to step up and sponsor developers.

GPUOpen and DX12 features sound neat, but devs aren't going to go the extra mile out of sheer goodwill. It takes effort to implement features properly.

This is why the Hitman announcement, AMD specifically mention helping the devs.



They basically need to do this else I don't see developers pushing GPUOpen or Async Compute for the PC port.

There is no way AMD could no anything about something is part of the core of the game, AMD is not gona pay half of the development and be involved for years on every project like they are doing right now with Ashes of the Singularity.
Its not like making devs implement Gameworks that is almost trivial to implement by comparison and has nothing to do with the core of the game, its too costly and time consuming, same reason of why FX8350 is not the best gaming CPU ever, even on AMD titles.
 
Last edited:
Feb 19, 2009
10,457
10
76
There is no way AMD could no anything about something is part of the core of the game, AMD is not gona pay half of the development and be involved for years on every project.
Its not like making devs implement Gameworks that is almost trivial to implement by comparison and has nothing to do with the core of the game.

Not understanding your point.

Are you saying AMD has to get developers to program DX12 correctly?

That's the developer's job, they follow the API standard.

If hardware can run Async Compute properly, they may see a performance gains relative to how much compute there are. On hardware that cannot, it will run in the normal serial mode and there's no added performance gains.

It's not an incompatibility issue.
 

Shivansps

Diamond Member
Sep 11, 2013
3,917
1,570
136
Not understanding your point.

Are you saying AMD has to get developers to program DX12 correctly?

That's the developer's job, they follow the API standard.

If hardware can run Async Compute properly, they may see a performance gains relative to how much compute there are. On hardware that cannot, it will run in the normal serial mode and there's no added performance gains.

It's not an incompatibility issue.

There is no "program DX12 correctly", it does not work like that:

1) Moneyyyyyyyyyyyyy
Time is money, same reason idle cpu time is not used properly, it takes time to do something new, its not "hey lets run it in parallel instead of serial" -is not that easy, it gets very complicated, FAST, and perf gains is not lineal either-

2) Market
Software is made to work on most hardware, anything extra cost extra money, something that works on 30% of market is extra.

The is not a "follow the API standard" either, otherwise all DX11 games whould be supporting context lists.

There is NOTHING AMD can about that, its not about trying to get devs to implement X feature, that is mostly implementing a 3rd party library, or help them to optimise performance.

See what AMD is doing right now with AoS for how many time now? Do you think they can do that with other games?
Same reason of why Mantle was only supported on the titles AMD expend a lot of time and money on.
 
Last edited:

Azix

Golden Member
Apr 18, 2014
1,438
67
91
Doubt AMD has to do all that. The problem with games like tomb raider not using it on PC is API. Development was probably going on with dx11 for the PC version. We'll see if they bring over async in a dx12 patch. Though I don't see why they would bother with dx12 by themselves. Nvidia might pay them to release one that favors them.

Since the game is out and its not a multiplayer game expecting people to play it for months, I doubt they will by themselves.

Possibly enhanced edition

Other games though that are more naturally dx12 should be able to without AMD stepping in.
 

Bacon1

Diamond Member
Feb 14, 2016
3,430
1,018
91
There is no way AMD could no anything about something is part of the core of the game, AMD is not gona pay half of the development and be involved for years on every project like they are doing right now with Ashes of the Singularity.

You keep stating that AMD helped Oxide implement Async Compute, but they've said multiple times that they weren't asked by AMD to implement it, they did it because it was a feature in DX 12 they wanted to try out.

I would qualify those statements.

Saying we heavily rely on async compute is a pretty big stretch. We spent a grand total of maybe 5 days on Async Shader support. It essentially entailed moving some ( a grand total of 4, IIRC) compute jobs from the graphics queue to the compute queue and setting up the dependencies. Async compute wasn't available when we began architecting (is that a word?) the engine, so it just wasn't an option to build around even if we wanted to. I'm not sure where this myth is coming from that we architected around Async compute. Not to say you couldn't do such a thing, and it might be a really interesting design, but it's not OUR current design.

Saying that Multi-Engine (aka Async Compute) is the root of performance increases on Ashes between DX11 to DX12 on AMD is definitely not true. Most of the performance gains in AMDs case are due to CPU driver head reductions. Async is a modest perf increase relative to that. Weirdly, though there is a marketing deal on Ashes with AMD, they never did ask us to use async compute. Since it was part of D3D12, we just decided to give it a whirl.

BTW: Just to clarify, the AoS benchmark is just a script running our game. Thus, whatever the benchmark does, the game does. There is nothing particularly special about the benchmark except a really nice UI that our intern wrote around it. It's really our own internal tool we use to optimize and test. We just decided to make it public with a polished UI.

http://www.overclock.net/t/1575638/...able-legends-dx12-benchmark/110#post_24475280


Also, the current released version of Ashes does not use async compute on Nvidia hardware, but it looks like it has been enabled in the latest nvidia drivers:

Async compute is currently forcibly disabled on public builds of Ashes for NV hardware. Whatever performance changes you are seeing driver to driver doesn't have anything to do with async compute.

I can confirm that the latest shipping DX12 drivers from NV do support async compute. You'd have to ask NV how specifically it is implemented.


http://www.overclock.net/t/1590939/...-async-compute-yet-says-amd/370#post_24898074

Looks like some new effects are coming up so the differences discussed a few pages back are now moot, as they've been changed internally and will be released soon:

Hmm... well, I suppose we could investigate it but that effect has already been completely changed, just not out yet in public. So it's probably a waste of time for us at this point to investigate. I haven't seen any noticeable differences between AMD and NVidia myself.

I'm very curious about what the Hitman folks are doing. At any rate, we've got some big D3D12 updates upcoming very shortly. I'd say more, but I'll probably already in trouble

http://www.overclock.net/t/1590939/...-async-compute-yet-says-amd/350#post_24893376

and

http://www.overclock.net/t/1590939/...-async-compute-yet-says-amd/340#post_24893300
 

itsmydamnation

Diamond Member
Feb 6, 2011
3,045
3,835
136
Do you realise what you are asking right? devs dont even do a proper cpu async methods to make use of idle cpu time on unused cores. MT today its mostly creating mutiple "main" threads for diferent tasks, like AI, Physics, etc... and thats not the best way to do it.
Don't know about PC's but on the consoles they dont, most game engines have been job based systems for ages(forced by PS3/360 unforgiving cpus'). There are even comments from a trails dev saying there latest engine is job based with no thread dedicated to dispatching of jobs, each worker thread figures out for themselves the next job to work on.


you only dedicated job functions to cores/threads if your memory management is "poor".
 

3DVagabond

Lifer
Aug 10, 2009
11,951
204
106
Don't know about PC's but on the consoles they dont, most game engines have been job based systems for ages(forced by PS3/360 unforgiving cpus'). There are even comments from a trails dev saying there latest engine is job based with no thread dedicated to dispatching of jobs, each worker thread figures out for themselves the next job to work on.


you only dedicated job functions to cores/threads if your memory management is "poor".

You should realize that any feature that AMD uses that nVidia doesn't automatically goes in to the too hard basket to never be used. If it is used it's because the dev is a paid AMD shill. A game running, looking, or playing better is never motivation on it's own.
 
Feb 19, 2009
10,457
10
76
Doubt AMD has to do all that. The problem with games like tomb raider not using it on PC is API. Development was probably going on with dx11 for the PC version. We'll see if they bring over async in a dx12 patch. Though I don't see why they would bother with dx12 by themselves. Nvidia might pay them to release one that favors them.

Other games though that are more naturally dx12 should be able to without AMD stepping in.

Good point.

Games developed for consoles in mind would already be using some of the DX12 (Xbone) or GCN-specific (for PS4) features, such as Async Compute. In fact, the PS4 devs are the ones often showcasing that feature.

When it comes to the PC port, if DX12 allows these features to remain, then it's just a matter of porting it over (unless the dev was given incentives NOT to do that).

It's a case of if they are already using async compute for consoles, then the DX12 port will have no reason to remove it, unless perhaps NV may not want it in PC ports that they sponsor.

Tomb Raider is the perfect example of this where Crystal Dynamics said they run Async Compute on the Xbone (& specifically referred to DX12!) but removed it for the DX11 PC port.
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,362
136
2) Market
Software is made to work on most hardware, anything extra cost extra money, something that works on 30% of market is extra.

100% of Consoles (XBone and PS4) can do Asunc Compute. At the end of 2015 those two were at ~55Million units. Now add all the GCN dGPUs present in the market AND all the GCN APUs (both Desktop and Laptops) from 2014 onward and its a huge market.
 

airfathaaaaa

Senior member
Feb 12, 2016
692
12
81
so what you are saying is
the devs should not use the untapped power of async compute that could essentially lead to the games being better at any level just because one company knew about dx12 since 2010 and they choose not to do anything and just milk the dx11 cow ?

how long you think nvidia will actually be able to "block"dx12? its not like we are talking about amd only here
ms is pushing it
amd is pushing it
some of the big dev houses are pushing it
intel is pushing it

when in september people called out on nvidia for not going and make the same mistake 3dfx did they said no we wont and bla bla
and here we are now watching history repeating itself (not suprising since nvidia is 3dfx lol)
eventually throwing money on devs to push for dx11 will prove ineffiecent at the very least...they will save insane amount of money by porting on dx12 and vulkan than to get paid to bring it on dx11
 
Last edited:

Paul98

Diamond Member
Jan 31, 2010
3,732
199
106
I'll be testing out the new nvidia driver when I get into work today

I really hope nvidia is able to add async shaders via driver so that it does get mass adoption.
 
Last edited:

TheELF

Diamond Member
Dec 22, 2012
4,027
753
126
so what you are saying is
the devs should not use the untapped power of async compute that could essentially lead to the games being better at any level just because one company knew about dx12 since 2010 and they choose not to do anything and just milk the dx11 cow ?
They should and they do,on consoles where every system is the same and you do it once and it works for everybody.
On the PC it's very different, they will have to do a special version for every architecture at least and it's still not guaranteed that it will perform better then the normal version because async compute is mainly useful for very tiny cores that are so small that they can't drive a video card with only one core.
Every change in drivers, windows,GPU hardware or the game could completely mess up Dx12.
how long you think nvidia will actually be able to "block"dx12? its not like we are talking about amd only here
ms is pushing it
amd is pushing it
some of the big dev houses are pushing it
intel is pushing it
Read between the lines,everybody is pushing it for consoles,tablets,smartphones and the likes, because lower CPU overhead=longer battery life=better sales.

On the PC they are just using it as a gimmick to make sales,it will provide some benefits and it will be nice but the gains will be very far away from the ideals of the benchmarks that we have seen until now.
 

airfathaaaaa

Senior member
Feb 12, 2016
692
12
81
no one knows that actually...and i highly doubt that they can provide only on one system huge benefits and on another mild or nothing..
 

Paul98

Diamond Member
Jan 31, 2010
3,732
199
106
They should and they do,on consoles where every system is the same and you do it once and it works for everybody.
On the PC it's very different, they will have to do a special version for every architecture at least and it's still not guaranteed that it will perform better then the normal version because async compute is mainly useful for very tiny cores that are so small that they can't drive a video card with only one core.
Every change in drivers, windows,GPU hardware or the game could completely mess up Dx12.

Read between the lines,everybody is pushing it for consoles,tablets,smartphones and the likes, because lower CPU overhead=longer battery life=better sales.

On the PC they are just using it as a gimmick to make sales,it will provide some benefits and it will be nice but the gains will be very far away from the ideals of the benchmarks that we have seen until now.

I am not sure what you think will suddenly "Completely mess up Dx12." Have you ever looked at the code for DX12? Do you have any understanding of how it works and how things are setup in it?

I don't even know what you are trying to say with "very tiny cores that are so small that they can't drive a video card with only one core." How do those words even go together.
 

Azix

Golden Member
Apr 18, 2014
1,438
67
91
They should and they do,on consoles where every system is the same and you do it once and it works for everybody.
On the PC it's very different, they will have to do a special version for every architecture at least and it's still not guaranteed that it will perform better then the normal version because async compute is mainly useful for very tiny cores that are so small that they can't drive a video card with only one core.
Every change in drivers, windows,GPU hardware or the game could completely mess up Dx12.

Read between the lines,everybody is pushing it for consoles,tablets,smartphones and the likes, because lower CPU overhead=longer battery life=better sales.

On the PC they are just using it as a gimmick to make sales,it will provide some benefits and it will be nice but the gains will be very far away from the ideals of the benchmarks that we have seen until now.

The point of an API, I think, is that it makes it so you don't have to code to GPU archs. The cards are compatible with the API, so they code in accordance with the API and the drivers/GPU figure it out.



There something like this for dx12?

Interesting nvidia has just one queue like mentioned
 
Last edited:

Good_fella

Member
Feb 12, 2015
113
0
0
So Nvidia can do graphics, compute, transfer, sparse_binding in single queue while AMD has 3 queues with 3 different tasks. Why AMD is better?
 

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
The point of an API, I think, is that it makes it so you don't have to code to GPU archs. The cards are compatible with the API, so they code in accordance with the API and the drivers/GPU figure it out.

We are talking here about a "low level" API. What you describe is something like OpenGL or DX11 where most of the work is done within the driver.
 
Last edited: