computerbaseAshes of the Singularity Beta1 DirectX 12 Benchmarks

Page 20 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

coercitiv

Diamond Member
Jan 24, 2014
7,225
16,982
136
I am saying that they should test async on athlon 5350 cpus and see if nvidia gains any speed there...
Athlon 5350 + DX12 = Equivalent CPU + DX11

What do you reckon is the Equivalent CPU needed to feed the GPU in the "equation" above? Same high end GPU, same FPS.
 

Bryf50

Golden Member
Nov 11, 2006
1,429
51
91
I'm no engineeeer but the purpose of asynchronous compute is to make better use of GPU hardware. A single task be it graphics or compute is unlikely to be using 100% of the hardware resources on the chip for its entire lifetime. It's already standard practice in GPU compute to launch multiple kernels simultaneously when possible. Async compute allows graphics tasks to be included in this as well.
 
Last edited:

maddie

Diamond Member
Jul 18, 2010
5,147
5,523
136
I'm no engineeeer but the purpose of asynchronous compute is to make better use of GPU hardware. A single task be it graphics or compute is unlikely to be using 100% of the hardware resources on the chip for its entire lifetime. It's already standard practice in GPU compute to launch multiple kernels simultaneously when possible. Async compute allows graphics tasks to be included in this as well.
This is the key fact.

I suppose that it might be possible to design a shader unit that would be fully utilized with a certain set of instructions, but this is impossible for general purpose use.

The people criticizing asynchronous shaders by AMD are blindly ignoring this. You will always have less than 100% utilization over time. There will always be some free hardware resources available. Low overhead asynchronous operations allow the efficient use of such resources.
 

Mahigan

Senior member
Aug 22, 2015
573
0
0
What you(they) show on the picture is the basic dx12/vulkan/low level async stuff that almost any card can do, even intel igpus.
Async compute,using graphics + compute is just a feature where only GCN cards get a big boost.
But then again gaining speed from doing graphics + compute in parallel only means that you can not get all of your GPUs power when doing only one of them...
The image he showed:
e7bff7116a6508b3e94598e565acfb95.jpg


Is what AMD can do. Notice that you have 3 lines of commands running concurrently to one another. Those 3 lines represent the 3 queues (Graphics/3D, Copy and Compute).

When NVIDIA Kepler/Maxwell execute the same code they execute it like you see in DirectX11 below with some exceptions. When the Graphics queue is executing a compute command, the compute queue can also concurrently execute a compute command. Copy commands can also be executed concurrently on NVIDIA hardware. What NVIDIA don't support is mixing Compute with Graphics thus NVIDIAs current hardware do not support Asynchronous compute + Graphics (what can significantly boost performance). AMD behave like the DirectX12 portion of this image and support all forms of concurrent executions:
506a9688ad9a70873daf8b13363e34cd.jpg


Where you see a divergence is on the CPU side. Intel, AMD and NVIDIA all split the command buffer listing (DirectX run time or red bar) across various cores under DirectX12 as such:
a6365b642fcfb5751d53ffe584bc0bc8.jpg

f63e113d576b004b77c81a8ce681bbf8.jpg


Of course there's another difference here, where as AMD GCN executes the DirectX runtime under DX11 like the DirectX 11 shot above, NVIDIA do not. NVIDIA support DirectX 11 Multi-threaded command listing while AMD do not.

DirectX 11 Multi-threaded command listing basically works like this, Batches of Commands are pre-recorded on multiple CPU cores. And the primary CPU thread simply plays back the pre-arranged and pre-computed command lists to the NVIDIA driver. The NVIDIA driver compiler orders them into grids and schedules them for execution.

Basically NVIDIA is able to split that red bar, in the DirectX11 shot above, across many CPU threads under DX11. AMD does not. NVIDIA also already employ a multi-threaded DirectX11 driver (pale blue bar).

This is why NVIDIA don't gain much from DX12 over DX11 performance wise and AMD gain a lot. AMD GCN hammers the primary CPU thread under DirectX11 leading to a CPU bottleneck. Vulcan will eventually highlight the same behaviour for AMD (as the API matures), more performance than DX11 and NVIDIA, similar performance give or take (NVIDIA new driver will improve things a bit by allowing concurrent executions of compute commands).

The end result is that AMD GCN gains performance from the get go by running DirectX12 over DirectX11.

If you throw Asynchronous Compute + Graphics into the mix, AMD gain even more performance. How? Asynchronous compute + Graphics significantly lowers frame times (frame latency) thus boosting performance. Asynchronous Compute + Graphics also raises GPU utilization thus minimizing idling resources.

The thing with GCN is that resources are almost always idling. The architecture is highly parallel.

So yeah, AMDs DX11 implementation is inferior to NVIDIAs. There's no denying that. Vulcan and DX12, however, are based on ideas spawned by the Mantle API and thus, like the Mantle API, are really tailored to AMD hardware as it pertains to "performance".

Want to see this multiple queue execution in action? GPUView allows you to do just that.

Here's the Fable Legends Fly by demo running on a TitanX. (Note: There are very little Asynchronous work loads in the Fable Legends Fly by test but the released version will include spell effects and more):
1167abb31436e803d21527543087acb4.jpg


Notice the Compute queue is pretty much empty?

Now the same test on a Fury:
59d788ae092b8fb251925aa46204ebc9.jpg


Get the idea?

NVIDIA will be able to run Asynchronous compute, running compute commands, in the compute queue, concurrently to compute commands in the Graphics (3D queue) but not Compute commands running concurrently to Graphics commands (you see that in the Fable Legends screen shot above). This is what NVIDIA call "Asynchronous Compute". Kollock, Oxide developer, mentioned that support for this was recently added into NVIDIAs driver but requires an NVIDIA specific implementation. However the real performance gains are to be had from concurrently executing Compute and Graphics tasks. This is something current NVIDIA architectures are incapable of doing.

Conclusion:
- What AMD mean by Asynchronous Compute is not what NVIDIA mean.
- NVIDIA do not support concurrent executions of Compute + Graphics commands.
- GCN has idling resources from being a very wide architecture. Exploiting those resources through Asynchronous compute can lead to significant performance improvements.
- NVIDIA has little to gain performance wise on their current architectures under DX12/Vulcan.
- AMDs DirectX11 implementation is inferior and hammers the primary CPU thread leading to a CPU bottleneck (Rise of the Tomb Raider highlighting this).

And that's that.
 
Last edited:

TheELF

Diamond Member
Dec 22, 2012
4,027
753
126
Athlon 5350 + DX12 = Equivalent CPU + DX11

What do you reckon is the Equivalent CPU needed to feed the GPU in the "equation" above? Same high end GPU, same FPS.

Probably(most certainly) impossible, but it's the best bet for Dx11 chocking an nvidia card and Dx12 async granting improvement.
 

TheELF

Diamond Member
Dec 22, 2012
4,027
753
126
The thing with GCN is that resources are almost always idling. The architecture is highly parallel.
...
...
Conclusion:
- What AMD mean by Asynchronous Compute is not what NVIDIA mean.
- NVIDIA do not support concurrent executions of Compute + Graphics commands.
- GCN has idling resources from being a very wide architecture. Exploiting those resources through Asynchronous compute can lead to significant performance improvements.
- NVIDIA has little to gain performance wise on their current architectures under DX12/Vulcan.
- AMDs DirectX11 implementation is inferior and hammers the primary CPU thread leading to a CPU bottleneck (Rise of the Tomb Raider highlighting this).
That's exactly what I said.As well as the AMD white page
GCN can't use all the available shaders with Dx11/one queue while nvidia can and that's why GCN gains a lot from async while nvidia doesn't.
 

Erenhardt

Diamond Member
Dec 1, 2012
3,251
105
101
The image he showed:
e7bff7116a6508b3e94598e565acfb95.jpg


Is what AMD can do. Notice that you have 3 lines of commands running concurrently to one another. Those 3 lines represent the 3 queues (Graphics/3D, Copy and Compute).

When NVIDIA Kepler/Maxwell execute the same code they execute it like you see in DirectX11 below with some exceptions. When the Graphics queue is executing a compute command, the compute queue can also concurrently execute a compute command. Copy commands can also be executed concurrently on NVIDIA hardware. What NVIDIA don't support is mixing Compute with Graphics thus NVIDIAs current hardware do not support Asynchronous compute + Graphics (what can significantly boost performance). AMD behave like the DirectX12 portion of this image and support all forms of concurrent executions:
506a9688ad9a70873daf8b13363e34cd.jpg


Where you see a divergence is on the CPU side. Intel, AMD and NVIDIA all split the command buffer listing (DirectX run time or red bar) across various cores under DirectX12 as such:
a6365b642fcfb5751d53ffe584bc0bc8.jpg

f63e113d576b004b77c81a8ce681bbf8.jpg


Of course there's another difference here, where as AMD GCN executes the DirectX runtime under DX11 like the DirectX 11 shot above, NVIDIA do not. NVIDIA support DirectX 11 Multi-threaded command listing while AMD do not.

DirectX 11 Multi-threaded command listing basically works like this, Batches of Commands are pre-recorded on multiple CPU cores. And the primary CPU thread simply plays back the pre-arranged and pre-computed command lists to the NVIDIA driver. The NVIDIA driver compiler orders them into grids and schedules them for execution.

Basically NVIDIA is able to split that red bar, in the DirectX11 shot above, across many CPU threads under DX11. AMD does not. NVIDIA also already employ a multi-threaded DirectX11 driver (pale blue bar).

This is why NVIDIA don't gain much from DX12 over DX11 performance wise and AMD gain a lot. AMD GCN hammers the primary CPU thread under DirectX11 leading to a CPU bottleneck. Vulcan will eventually highlight the same behaviour for AMD (as the API matures), more performance than DX11 and NVIDIA, similar performance give or take (NVIDIA new driver will improve things a bit by allowing concurrent executions of compute commands).

The end result is that AMD GCN gains performance from the get go by running DirectX12 over DirectX11.

If you throw Asynchronous Compute + Graphics into the mix, AMD gain even more performance. How? Asynchronous compute + Graphics significantly lowers frame times (frame latency) thus boosting performance. Asynchronous Compute + Graphics also raises GPU utilization thus minimizing idling resources.

The thing with GCN is that resources are almost always idling. The architecture is highly parallel.

So yeah, AMDs DX11 implementation is inferior to NVIDIAs. There's no denying that. Vulcan and DX12, however, are based on ideas spawned by the Mantle API and thus, like the Mantle API, are really tailored to AMD hardware as it pertains to "performance".

Want to see this multiple queue execution in action? GPUView allows you to do just that.

Here's the Fable Legends Fly by demo running on a TitanX. (Note: There are very little Asynchronous work loads in the Fable Legends Fly by test but the released version will include spell effects and more):
1167abb31436e803d21527543087acb4.jpg


Notice the Compute queue is pretty much empty?

Now the same test on a Fury:
59d788ae092b8fb251925aa46204ebc9.jpg


Get the idea?

NVIDIA will be able to run Asynchronous compute, running compute commands, in the compute queue, concurrently to compute commands in the Graphics (3D queue) but not Compute commands running concurrently to Graphics commands (you see that in the Fable Legends screen shot above). This is what NVIDIA call "Asynchronous Compute". Kollock, Oxide developer, mentioned that support for this was recently added into NVIDIAs driver but requires an NVIDIA specific implementation. However the real performance gains are to be had from concurrently executing Compute and Graphics tasks. This is something current NVIDIA architectures are incapable of doing.

Conclusion:
- What AMD mean by Asynchronous Compute is not what NVIDIA mean.
- NVIDIA do not support concurrent executions of Compute + Graphics commands.
- GCN has idling resources from being a very wide architecture. Exploiting those resources through Asynchronous compute can lead to significant performance improvements.
- NVIDIA has little to gain performance wise on their current architectures under DX12/Vulcan.
- AMDs DirectX11 implementation is inferior and hammers the primary CPU thread leading to a CPU bottleneck (Rise of the Tomb Raider highlighting this).

And that's that.

Excellent post.

So nvidia do have a interest in delaying DX12 until they release hardware that will be able to compete with what amd currently has on shelves. That explains why dx12 patches for games (ARK for example) get delayed.

So, all GCN gpus can run async compute. Do previous amd architectures have the same perk?
 

Mahigan

Senior member
Aug 22, 2015
573
0
0
Excellent post.

So nvidia do have a interest in delaying DX12 until they release hardware that will be able to compete with what amd currently has on shelves. That explains why dx12 patches for games (ARK for example) get delayed.

So, all GCN gpus can run async compute. Do previous amd architectures have the same perk?
Yes, one way to look at it is that it would serve NVIDIAs interests if DX12 games were delayed.

Asynchronous Compute is only available on GCN (Radeon HD 7xx0 and above).
 

pj-

Senior member
May 5, 2015
501
278
136
How does async compute affect heat/power consumption?

Pretending current nvidia hw did support running compute and graphics command simultaneously, would it even matter?
 

Bacon1

Diamond Member
Feb 14, 2016
3,430
1,018
91
Of course there's another difference here, where as AMD GCN executes the DirectX runtime under DX11 like the DirectX 11 shot above, NVIDIA do not. NVIDIA support DirectX 11 Multi-threaded command listing while AMD do not.

DirectX 11 Multi-threaded command listing basically works like this, Batches of Commands are pre-recorded on multiple CPU cores. And the primary CPU thread simply plays back the pre-arranged and pre-computed command lists to the NVIDIA driver. The NVIDIA driver compiler orders them into grids and schedules them for execution.

Just an FYI, if you download the Multithread DX11 test from Microsoft, you can see that it does work on AMD hardware.

https://code.msdn.microsoft.com/windowsdesktop/Direct3D-Multithreaded-d02193c0

Edit: Hmm that link isn't loading, try this one:

Its also listed here which links to the above one:

http://blogs.msdn.com/b/chuckw/archive/2013/09/20/directx-sdk-samples-catalog.aspx

2-3x performance increase and CPU usage goes up a lot as well.

Now, running 3dMark API Test the multithreading draw call test is always slightly lower than single, and both top out ~1m vs 15m+ for DX12 / Mantle.
 
Last edited:

3DVagabond

Lifer
Aug 10, 2009
11,951
204
106
What you(they) show on the picture is the basic dx12/vulkan/low level async stuff that almost any card can do, even intel igpus.
Async compute,using graphics + compute is just a feature where only GCN cards get a big boost.
But then again gaining speed from doing graphics + compute in parallel only means that you can not get all of your GPUs power when doing only one of them...

Ridiculous statement. GCN offers fixed function hardware to do tasks that nVidia can't. If the GPU is not called upon to do those tasks it's not a shortcoming of the hardware.
 

3DVagabond

Lifer
Aug 10, 2009
11,951
204
106
We dont talk about "code to metal". DX12 and Vulkan are much more low level than DX12/OpenGL without Extensions. The amount of work a developer has to do to get the same result is huge. The driver is doing less work and most of the the work happens on the application side:

Despite that we've been told by devs that it's not really that hard you continue to spread all kinds of FUD attacking superior higher performing API's.

We should call these new APIs explicit and not low-level.

You can call them whatever you want. Until nVidia's support equals AMD the same people will be against them.

If the game runs better on AMD then the game sucks. If AMD offers better API support then the API sucks.

A seven year old API that the entire industry is kicking to the curb that requires driver gymnastics to make perform decently is the better option because nVidia is behind in support of the modern API currently.
 
Last edited:

Mahigan

Senior member
Aug 22, 2015
573
0
0
How does async compute affect heat/power consumption?

Pretending current nvidia hw did support running compute and graphics command simultaneously, would it even matter?
Since you're using more resources then heat and power consumption will rise. Instead of having some hot spots on the GPU, most of the GPU will be running hot. AMD did point this out during a dev talk.

As for your second question, there would be cases where it would benefit NVIDIAs architecture. Such as when batching short running shaders:
https://forum.beyond3d.com/posts/1869234/

Since the Graphics only was 16.4ms and the compute was around 9.2ms running them concurrently would take 16.4ms instead of 25.6ms. That's a missed potential performance boost.
 

pj-

Senior member
May 5, 2015
501
278
136
Since you're using more resources then heat and power consumption will rise. Instead of having some hot spots on the GPU, most of the GPU will be running hot. AMD did point this out during a dev talk.

As for your second question, there would be cases where it would benefit NVIDIAs architecture. Such as when batching short running shaders:
https://forum.beyond3d.com/posts/1869234/

Since the Graphics only was 16.4ms and the compute was around 9.2ms running them concurrently would take 16.4ms instead of 25.6ms. That's a missed potential performance boost.

This year's graphics cards are going to be very interesting
 

airfathaaaaa

Senior member
Feb 12, 2016
692
12
81
it just shows that the whole dx12 has been in the making since 2009 is a complete lie. if nvidia knew about this why didnt they just put it in their arch...we could have been way ahead now.
so what you are saying is nvidia didnt knew about dx12 despite nvidia saying they been working with ms for 4 years? 2010 2014?:sneaky::sneaky:
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,362
136
Im sure NV new about DX-12 and Asunc Compute but they played the DX-11 cards for maximum profits, as im sure they new DX-12 was not going to be released before 2015-16.

This is the best way to gain higher profits but that doesnt make you a technology leader and innovator. And no matter what people believe, NVIDIA the last 3-4 years are not a technology leader. They have stuck to DX-11, they created GameWorks in response to Mantle, they will just only use HBM2 and they will still be second in 14/16nm process.

I will give them the profit award any time but technology leader and Innovator award goes to AMD this round.
 

Pinstripe

Member
Jun 17, 2014
197
12
81
Nvidia is the market share leader, and that's why game developers will keep throwing more resources and time at Nvidia than AMD, regardless of AMD's "superior" hardware.
 

Paul98

Diamond Member
Jan 31, 2010
3,732
199
106
Nvidia is the market share leader, and that's why game developers will keep throwing more resources and time at Nvidia than AMD, regardless of AMD's "superior" hardware.

Unless they are also running on console and want to reuse their async shaders that they will use both from AMD on PC and console.
 

airfathaaaaa

Senior member
Feb 12, 2016
692
12
81
Nvidia is the market share leader, and that's why game developers will keep throwing more resources and time at Nvidia than AMD, regardless of AMD's "superior" hardware.
actually we have dx12 which the core of it is on xbox one and we have ps4 version that is similiar its way more effiecent to port it on dx12 than to actually code a dx11 path
eventually it will become ineffective for nvidia to keep paying devs to change the code
 

Pinstripe

Member
Jun 17, 2014
197
12
81
I keep hearing this for years now and yet Next-Gen only titles like Witcher 3 and Rise of the Tomb Raider still run better on Nvidia hardware.
 

TheELF

Diamond Member
Dec 22, 2012
4,027
753
126
Ridiculous statement. GCN offers fixed function hardware to do tasks that nVidia can't. If the GPU is not called upon to do those tasks it's not a shortcoming of the hardware.

Ohhhhh,so AMD was charging you guys for "fixed function hardware" that was doing exactly nothing for the customer for years now...
What is this "fixed function hardware" being called? Any site that explains it?
 

Glo.

Diamond Member
Apr 25, 2015
5,930
4,991
136
Ohhhhh,so AMD was charging you guys for "fixed function hardware" that was doing exactly nothing for the customer for years now...
What is this "fixed function hardware" being called? Any site that explains it?

It is called: Environment.