computerbaseAshes of the Singularity Beta1 DirectX 12 Benchmarks

coercitiv · Feb 17, 2016

TheELF said:
I am saying that they should test async on athlon 5350 cpus and see if nvidia gains any speed there...

Athlon 5350 + DX12 = Equivalent CPU + DX11

What do you reckon is the Equivalent CPU needed to feed the GPU in the "equation" above? Same high end GPU, same FPS.

Bryf50 · Feb 17, 2016

I'm no engineeeer but the purpose of asynchronous compute is to make better use of GPU hardware. A single task be it graphics or compute is unlikely to be using 100% of the hardware resources on the chip for its entire lifetime. It's already standard practice in GPU compute to launch multiple kernels simultaneously when possible. Async compute allows graphics tasks to be included in this as well.

maddie · Feb 17, 2016

Bryf50 said:
I'm no engineeeer but the purpose of asynchronous compute is to make better use of GPU hardware. A single task be it graphics or compute is unlikely to be using 100% of the hardware resources on the chip for its entire lifetime. It's already standard practice in GPU compute to launch multiple kernels simultaneously when possible. Async compute allows graphics tasks to be included in this as well.

This is the key fact.

I suppose that it might be possible to design a shader unit that would be fully utilized with a certain set of instructions, but this is impossible for general purpose use.

The people criticizing asynchronous shaders by AMD are blindly ignoring this. You will always have less than 100% utilization over time. There will always be some free hardware resources available. Low overhead asynchronous operations allow the efficient use of such resources.

Mahigan · Feb 17, 2016

TheELF said:
What you(they) show on the picture is the basic dx12/vulkan/low level async stuff that almost any card can do, even intel igpus.
Async compute,using graphics + compute is just a feature where only GCN cards get a big boost.
But then again gaining speed from doing graphics + compute in parallel only means that you can not get all of your GPUs power when doing only one of them...

The image he showed:

Is what AMD can do. Notice that you have 3 lines of commands running concurrently to one another. Those 3 lines represent the 3 queues (Graphics/3D, Copy and Compute).

When NVIDIA Kepler/Maxwell execute the same code they execute it like you see in DirectX11 below with some exceptions. When the Graphics queue is executing a compute command, the compute queue can also concurrently execute a compute command. Copy commands can also be executed concurrently on NVIDIA hardware. What NVIDIA don't support is mixing Compute with Graphics thus NVIDIAs current hardware do not support Asynchronous compute + Graphics (what can significantly boost performance). AMD behave like the DirectX12 portion of this image and support all forms of concurrent executions:

Where you see a divergence is on the CPU side. Intel, AMD and NVIDIA all split the command buffer listing (DirectX run time or red bar) across various cores under DirectX12 as such:

Of course there's another difference here, where as AMD GCN executes the DirectX runtime under DX11 like the DirectX 11 shot above, NVIDIA do not. NVIDIA support DirectX 11 Multi-threaded command listing while AMD do not.

DirectX 11 Multi-threaded command listing basically works like this, Batches of Commands are pre-recorded on multiple CPU cores. And the primary CPU thread simply plays back the pre-arranged and pre-computed command lists to the NVIDIA driver. The NVIDIA driver compiler orders them into grids and schedules them for execution.

Basically NVIDIA is able to split that red bar, in the DirectX11 shot above, across many CPU threads under DX11. AMD does not. NVIDIA also already employ a multi-threaded DirectX11 driver (pale blue bar).

This is why NVIDIA don't gain much from DX12 over DX11 performance wise and AMD gain a lot. AMD GCN hammers the primary CPU thread under DirectX11 leading to a CPU bottleneck. Vulcan will eventually highlight the same behaviour for AMD (as the API matures), more performance than DX11 and NVIDIA, similar performance give or take (NVIDIA new driver will improve things a bit by allowing concurrent executions of compute commands).

The end result is that AMD GCN gains performance from the get go by running DirectX12 over DirectX11.

If you throw Asynchronous Compute + Graphics into the mix, AMD gain even more performance. How? Asynchronous compute + Graphics significantly lowers frame times (frame latency) thus boosting performance. Asynchronous Compute + Graphics also raises GPU utilization thus minimizing idling resources.

The thing with GCN is that resources are almost always idling. The architecture is highly parallel.

So yeah, AMDs DX11 implementation is inferior to NVIDIAs. There's no denying that. Vulcan and DX12, however, are based on ideas spawned by the Mantle API and thus, like the Mantle API, are really tailored to AMD hardware as it pertains to "performance".

Want to see this multiple queue execution in action? GPUView allows you to do just that.

Here's the Fable Legends Fly by demo running on a TitanX. (Note: There are very little Asynchronous work loads in the Fable Legends Fly by test but the released version will include spell effects and more):

Notice the Compute queue is pretty much empty?

Now the same test on a Fury:

Get the idea?

NVIDIA will be able to run Asynchronous compute, running compute commands, in the compute queue, concurrently to compute commands in the Graphics (3D queue) but not Compute commands running concurrently to Graphics commands (you see that in the Fable Legends screen shot above). This is what NVIDIA call "Asynchronous Compute". Kollock, Oxide developer, mentioned that support for this was recently added into NVIDIAs driver but requires an NVIDIA specific implementation. However the real performance gains are to be had from concurrently executing Compute and Graphics tasks. This is something current NVIDIA architectures are incapable of doing.

Conclusion:
- What AMD mean by Asynchronous Compute is not what NVIDIA mean.
- NVIDIA do not support concurrent executions of Compute + Graphics commands.
- GCN has idling resources from being a very wide architecture. Exploiting those resources through Asynchronous compute can lead to significant performance improvements.
- NVIDIA has little to gain performance wise on their current architectures under DX12/Vulcan.
- AMDs DirectX11 implementation is inferior and hammers the primary CPU thread leading to a CPU bottleneck (Rise of the Tomb Raider highlighting this).

And that's that.

TheELF · Feb 17, 2016

coercitiv said:
Athlon 5350 + DX12 = Equivalent CPU + DX11

What do you reckon is the Equivalent CPU needed to feed the GPU in the "equation" above? Same high end GPU, same FPS.

Probably(most certainly) impossible, but it's the best bet for Dx11 chocking an nvidia card and Dx12 async granting improvement.

TheELF · Feb 17, 2016

Mahigan said:
The thing with GCN is that resources are almost always idling. The architecture is highly parallel.
...
...
Conclusion:
- What AMD mean by Asynchronous Compute is not what NVIDIA mean.
- NVIDIA do not support concurrent executions of Compute + Graphics commands.
- GCN has idling resources from being a very wide architecture. Exploiting those resources through Asynchronous compute can lead to significant performance improvements.
- NVIDIA has little to gain performance wise on their current architectures under DX12/Vulcan.
- AMDs DirectX11 implementation is inferior and hammers the primary CPU thread leading to a CPU bottleneck (Rise of the Tomb Raider highlighting this).

That's exactly what I said.As well as the AMD white page
GCN can't use all the available shaders with Dx11/one queue while nvidia can and that's why GCN gains a lot from async while nvidia doesn't.

Erenhardt · Feb 17, 2016

Mahigan said:
The image he showed:

Is what AMD can do. Notice that you have 3 lines of commands running concurrently to one another. Those 3 lines represent the 3 queues (Graphics/3D, Copy and Compute).

When NVIDIA Kepler/Maxwell execute the same code they execute it like you see in DirectX11 below with some exceptions. When the Graphics queue is executing a compute command, the compute queue can also concurrently execute a compute command. Copy commands can also be executed concurrently on NVIDIA hardware. What NVIDIA don't support is mixing Compute with Graphics thus NVIDIAs current hardware do not support Asynchronous compute + Graphics (what can significantly boost performance). AMD behave like the DirectX12 portion of this image and support all forms of concurrent executions:

Where you see a divergence is on the CPU side. Intel, AMD and NVIDIA all split the command buffer listing (DirectX run time or red bar) across various cores under DirectX12 as such:

Of course there's another difference here, where as AMD GCN executes the DirectX runtime under DX11 like the DirectX 11 shot above, NVIDIA do not. NVIDIA support DirectX 11 Multi-threaded command listing while AMD do not.

DirectX 11 Multi-threaded command listing basically works like this, Batches of Commands are pre-recorded on multiple CPU cores. And the primary CPU thread simply plays back the pre-arranged and pre-computed command lists to the NVIDIA driver. The NVIDIA driver compiler orders them into grids and schedules them for execution.

Basically NVIDIA is able to split that red bar, in the DirectX11 shot above, across many CPU threads under DX11. AMD does not. NVIDIA also already employ a multi-threaded DirectX11 driver (pale blue bar).

This is why NVIDIA don't gain much from DX12 over DX11 performance wise and AMD gain a lot. AMD GCN hammers the primary CPU thread under DirectX11 leading to a CPU bottleneck. Vulcan will eventually highlight the same behaviour for AMD (as the API matures), more performance than DX11 and NVIDIA, similar performance give or take (NVIDIA new driver will improve things a bit by allowing concurrent executions of compute commands).

The end result is that AMD GCN gains performance from the get go by running DirectX12 over DirectX11.

If you throw Asynchronous Compute + Graphics into the mix, AMD gain even more performance. How? Asynchronous compute + Graphics significantly lowers frame times (frame latency) thus boosting performance. Asynchronous Compute + Graphics also raises GPU utilization thus minimizing idling resources.

The thing with GCN is that resources are almost always idling. The architecture is highly parallel.

So yeah, AMDs DX11 implementation is inferior to NVIDIAs. There's no denying that. Vulcan and DX12, however, are based on ideas spawned by the Mantle API and thus, like the Mantle API, are really tailored to AMD hardware as it pertains to "performance".

Want to see this multiple queue execution in action? GPUView allows you to do just that.

Here's the Fable Legends Fly by demo running on a TitanX. (Note: There are very little Asynchronous work loads in the Fable Legends Fly by test but the released version will include spell effects and more):

Notice the Compute queue is pretty much empty?

Now the same test on a Fury:

Get the idea?

NVIDIA will be able to run Asynchronous compute, running compute commands, in the compute queue, concurrently to compute commands in the Graphics (3D queue) but not Compute commands running concurrently to Graphics commands (you see that in the Fable Legends screen shot above). This is what NVIDIA call "Asynchronous Compute". Kollock, Oxide developer, mentioned that support for this was recently added into NVIDIAs driver but requires an NVIDIA specific implementation. However the real performance gains are to be had from concurrently executing Compute and Graphics tasks. This is something current NVIDIA architectures are incapable of doing.

Conclusion:
- What AMD mean by Asynchronous Compute is not what NVIDIA mean.
- NVIDIA do not support concurrent executions of Compute + Graphics commands.
- GCN has idling resources from being a very wide architecture. Exploiting those resources through Asynchronous compute can lead to significant performance improvements.
- NVIDIA has little to gain performance wise on their current architectures under DX12/Vulcan.
- AMDs DirectX11 implementation is inferior and hammers the primary CPU thread leading to a CPU bottleneck (Rise of the Tomb Raider highlighting this).

And that's that.

Excellent post.

So nvidia do have a interest in delaying DX12 until they release hardware that will be able to compete with what amd currently has on shelves. That explains why dx12 patches for games (ARK for example) get delayed.

So, all GCN gpus can run async compute. Do previous amd architectures have the same perk?

Mahigan · Feb 17, 2016

Erenhardt said:
Excellent post.

So nvidia do have a interest in delaying DX12 until they release hardware that will be able to compete with what amd currently has on shelves. That explains why dx12 patches for games (ARK for example) get delayed.

So, all GCN gpus can run async compute. Do previous amd architectures have the same perk?

Yes, one way to look at it is that it would serve NVIDIAs interests if DX12 games were delayed.

Asynchronous Compute is only available on GCN (Radeon HD 7xx0 and above).

pj- · Feb 17, 2016

How does async compute affect heat/power consumption?

Pretending current nvidia hw did support running compute and graphics command simultaneously, would it even matter?

Bacon1 · Feb 17, 2016

Mahigan said:
Of course there's another difference here, where as AMD GCN executes the DirectX runtime under DX11 like the DirectX 11 shot above, NVIDIA do not. NVIDIA support DirectX 11 Multi-threaded command listing while AMD do not.

DirectX 11 Multi-threaded command listing basically works like this, Batches of Commands are pre-recorded on multiple CPU cores. And the primary CPU thread simply plays back the pre-arranged and pre-computed command lists to the NVIDIA driver. The NVIDIA driver compiler orders them into grids and schedules them for execution.

Just an FYI, if you download the Multithread DX11 test from Microsoft, you can see that it does work on AMD hardware.

https://code.msdn.microsoft.com/windowsdesktop/Direct3D-Multithreaded-d02193c0

Edit: Hmm that link isn't loading, try this one:

Its also listed here which links to the above one:

http://blogs.msdn.com/b/chuckw/archive/2013/09/20/directx-sdk-samples-catalog.aspx

2-3x performance increase and CPU usage goes up a lot as well.

Now, running 3dMark API Test the multithreading draw call test is always slightly lower than single, and both top out ~1m vs 15m+ for DX12 / Mantle.

3DVagabond · Feb 17, 2016

TheELF said:
What you(they) show on the picture is the basic dx12/vulkan/low level async stuff that almost any card can do, even intel igpus.
Async compute,using graphics + compute is just a feature where only GCN cards get a big boost.
But then again gaining speed from doing graphics + compute in parallel only means that you can not get all of your GPUs power when doing only one of them...

Ridiculous statement. GCN offers fixed function hardware to do tasks that nVidia can't. If the GPU is not called upon to do those tasks it's not a shortcoming of the hardware.

3DVagabond · Feb 17, 2016

sontin said:
We dont talk about "code to metal". DX12 and Vulkan are much more low level than DX12/OpenGL without Extensions. The amount of work a developer has to do to get the same result is huge. The driver is doing less work and most of the the work happens on the application side:

Despite that we've been told by devs that it's not really that hard you continue to spread all kinds of FUD attacking superior higher performing API's.

zlatan said:
We should call these new APIs explicit and not low-level.

You can call them whatever you want. Until nVidia's support equals AMD the same people will be against them.

If the game runs better on AMD then the game sucks. If AMD offers better API support then the API sucks.

A seven year old API that the entire industry is kicking to the curb that requires driver gymnastics to make perform decently is the better option because nVidia is behind in support of the modern API currently.

Mahigan · Feb 17, 2016

pj- said:
How does async compute affect heat/power consumption?

Pretending current nvidia hw did support running compute and graphics command simultaneously, would it even matter?

Since you're using more resources then heat and power consumption will rise. Instead of having some hot spots on the GPU, most of the GPU will be running hot. AMD did point this out during a dev talk.

As for your second question, there would be cases where it would benefit NVIDIAs architecture. Such as when batching short running shaders:
https://forum.beyond3d.com/posts/1869234/

Since the Graphics only was 16.4ms and the compute was around 9.2ms running them concurrently would take 16.4ms instead of 25.6ms. That's a missed potential performance boost.

pj- · Feb 17, 2016

Mahigan said:
Since you're using more resources then heat and power consumption will rise. Instead of having some hot spots on the GPU, most of the GPU will be running hot. AMD did point this out during a dev talk.

As for your second question, there would be cases where it would benefit NVIDIAs architecture. Such as when batching short running shaders:
https://forum.beyond3d.com/posts/1869234/

Since the Graphics only was 16.4ms and the compute was around 9.2ms running them concurrently would take 16.4ms instead of 25.6ms. That's a missed potential performance boost.

This year's graphics cards are going to be very interesting

airfathaaaaa · Feb 17, 2016

caswow said:
it just shows that the whole dx12 has been in the making since 2009 is a complete lie. if nvidia knew about this why didnt they just put it in their arch...we could have been way ahead now.

so what you are saying is nvidia didnt knew about dx12 despite nvidia saying they been working with ms for 4 years? 2010 2014?:sneaky::sneaky:

AtenRa · Feb 17, 2016

Im sure NV new about DX-12 and Asunc Compute but they played the DX-11 cards for maximum profits, as im sure they new DX-12 was not going to be released before 2015-16.

This is the best way to gain higher profits but that doesnt make you a technology leader and innovator. And no matter what people believe, NVIDIA the last 3-4 years are not a technology leader. They have stuck to DX-11, they created GameWorks in response to Mantle, they will just only use HBM2 and they will still be second in 14/16nm process.

I will give them the profit award any time but technology leader and Innovator award goes to AMD this round.

Pinstripe · Feb 17, 2016

Nvidia is the market share leader, and that's why game developers will keep throwing more resources and time at Nvidia than AMD, regardless of AMD's "superior" hardware.

Paul98 · Feb 17, 2016

Pinstripe said:
Nvidia is the market share leader, and that's why game developers will keep throwing more resources and time at Nvidia than AMD, regardless of AMD's "superior" hardware.

Unless they are also running on console and want to reuse their async shaders that they will use both from AMD on PC and console.

airfathaaaaa · Feb 17, 2016

Pinstripe said:
Nvidia is the market share leader, and that's why game developers will keep throwing more resources and time at Nvidia than AMD, regardless of AMD's "superior" hardware.

actually we have dx12 which the core of it is on xbox one and we have ps4 version that is similiar its way more effiecent to port it on dx12 than to actually code a dx11 path
eventually it will become ineffective for nvidia to keep paying devs to change the code

Pinstripe · Feb 17, 2016

I keep hearing this for years now and yet Next-Gen only titles like Witcher 3 and Rise of the Tomb Raider still run better on Nvidia hardware.

TheELF · Feb 17, 2016

3DVagabond said:
Ridiculous statement. GCN offers fixed function hardware to do tasks that nVidia can't. If the GPU is not called upon to do those tasks it's not a shortcoming of the hardware.

Ohhhhh,so AMD was charging you guys for "fixed function hardware" that was doing exactly nothing for the customer for years now...
What is this "fixed function hardware" being called? Any site that explains it?

Glo. · Feb 17, 2016

TheELF said:
Ohhhhh,so AMD was charging you guys for "fixed function hardware" that was doing exactly nothing for the customer for years now...
What is this "fixed function hardware" being called? Any site that explains it?

It is called: Environment.

AtenRa · Feb 17, 2016

Pinstripe said:
I keep hearing this for years now and yet Next-Gen only titles like Witcher 3 and Rise of the Tomb Raider still run better on Nvidia hardware.

not according to this

http://www.hardocp.com/article/2016..._video_card_performance_review/1#.VsTgwoRlOUk

Pinstripe · Feb 17, 2016

AtenRa said:
not according to this

http://www.hardocp.com/article/2016..._video_card_performance_review/1#.VsTgwoRlOUk

But certainly according to this:
http://www.pcgameshardware.de/Rise-...451/Specials/Grafikkarten-Benchmarks-1184288/

poofyhairguy · Feb 17, 2016

Pinstripe said:
But certainly according to this:
http://www.pcgameshardware.de/Rise-...451/Specials/Grafikkarten-Benchmarks-1184288/

Am I reading that right or does that site have a 970 beating a Fury X at 1080p? And a 960 almost matching a 290?

computerbaseAshes of the Singularity Beta1 DirectX 12 Benchmarks

Diamond Member

Golden Member

Diamond Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Senior member

Senior member

Diamond Member

Lifer

Lifer

Senior member

Senior member

Senior member

Lifer

Member

Diamond Member

Senior member

Member

Diamond Member

Diamond Member

Lifer

Member

Lifer