computerbaseAshes of the Singularity Beta1 DirectX 12 Benchmarks

Page 5 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

railven

Diamond Member
Mar 25, 2010
6,604
561
126
Or perhaps you imagine DX12 is something it isn't.

I don't imagine DX12 is anything. I'd love to see it in action. Unfortunately, the only example I could get my hands on is a genre that I don't much like and being an NV user, I'd benefit more from using the DX11 path.
 

Glo.

Diamond Member
Apr 25, 2015
5,930
4,991
136
Nvidia MegaThread engine is dependent on the software. Maxwell is extremely efficient architecture because it does not have absolutely clue what to with application on hardware level. It lacks hardware scheduling. Nvidia got rid of that because Fermi, which had HS, was inefficient and hot. Big cache, extremely fast cores, which are able to empty the schedule very fast - thats where comes potential for overclocking, performance etc.

Problem comes when you have Low-Level APIs. To deal with that you have to program whole application just for Nvidia hardware in order to utilize it properly. That is real world. Low-Level APIs brought control over the hardware to the application, not the drivers. Only driver you have is... the API driver. Nothing else there is between the App and the hardware. Nvidia approach to application is giving another layer of abstraction which controls both hardware and software(the application). That is easiest explanation here.

For AMD where you have wide GPUs, but they lack the ability to work in simple environment with preemption - thats why DX11 were they behind. When you will get Compute, graphics, hardware scheduling and Asynchronous Compute with context switching in one place - thats where you will gigantic leaps. Thats why AMD get benefit from DX12. Because all of the features are on hardware level. It will not be impossible, when true DX12 titles will come out with Compute, context switching and hardware scheduling, to see that R9 390X surpasses GTX 980 Ti. You cannot run from the fact that both cards have pretty similar amount of compute power. And that is what will play biggest difference in future titles.
 

Leadbox

Senior member
Oct 25, 2010
744
63
91
Yes, since you claim NVidia is the only one with vendor specific code. You can of course do so from the rationale that its an AMD game, hence any DX12 not running on AMD must be vendor specific.
When they found they couldn't async compute on nvidia , they took steps to accommodate them i.e vendor specific optimization. Stop twisting it.
 

Goatsecks

Senior member
May 7, 2012
210
7
76
I can't believe people are still getting hot under the collar about a single game that is:

  1. in beta,
  2. pre-performance optimisation,
  3. on a brand new API (no DX12 title is currently available),
  4. new API being implemented on older hardware*
* older hardware w.r.t. DX12. I suspect this will be controversial but; current hardware is, primarily, DX11 hardware. Furthermore, we do not know what the developers, AMD, Nvidia and microsoft are up to. You guys need to keep in mind how minuscule the amount of available information is.
 

Headfoot

Diamond Member
Feb 28, 2008
4,444
641
126
IMO we need 3 or 4 fully released DX12 games each from different developers (and even waiting a month for the first few patches to drop for each) before any solid conclusions can be made on DX12. Single game tests are too dependent on the nature of the game and the skill of the engine programmers at that one company to make conclusions about the entire ecosystem
 

zlatan

Senior member
Mar 15, 2011
580
291
136
Async compute is an easy, but tricky thing in D3D12. This is a multi-engine API, so choosing the right queue for the right job is important. The API can support any hardware with a multi-engine code so the devs only need to care about to run some jobs asynchronously. For example texture streaming is an easy target and it should be loaded to the copy engine. If a hardware only has one DMA unit, then D3D12 will automatically run the job on the graphics engine. This is a very easy multi-engine model, because graphics is the superset of compute, and compute is the superset of copy. This is flawless in my opinion so kudos to Microsoft.
Still the async compute is problematic, because the DXKG specification require a very specific compute command engine in the hardware that support fences and barriers in the right way. The fences mostly used to synchronize the workloads, while the barriers used to block an operation on the GPU. In D3D12 the barriers has to support some specific conditions which makes this API really future-proof, but make some limits either, because GCN is the only architecture where the compute command engines designed in the right way. Kepler and Maxwell has some independent compute command engines, but these aren't support the specific barrier conditions that DXKG requires for D3D12. In this case both Kepler and Maxwell can execute a standard multi-engine code, but don't able to run the async compute in the compute command engines, so the async compute jobs will be loaded to the main command engine, and this leads to some not useful context switches. The result will be a dramatic performance degradation with a standard D3D12 multi-engine code. To avoid this situation the devs must use an alternate codepath, that use the compute command engines in a not really efficient way, but this option is better for NVIDIA, or it is possible to not use async compute.
 
Last edited:

3DVagabond

Lifer
Aug 10, 2009
11,951
204
106
Yes that sounds correct, it is the developer's choice if they enable Async Compute for hardware that don't support it. They need a separate check for IHV and enable/disable features.

We should see it in Beta 2 when they focus on optimizations.

DX12 should not perform slower, basically, worse case scenario, run compute serial as DX11 and get the benefits of lower API overhead on the CPU side.

It might require a separate render path and be a lot more work than you think. Shouldn't the IHV's make hardware that is DX12 compliant if they claim it is?
 

TheELF

Diamond Member
Dec 22, 2012
4,027
753
126
It might require a separate render path and be a lot more work than you think. Shouldn't the IHV's make hardware that is DX12 compliant if they claim it is?

You don't need to have super improved performance to be compliant.
Just look at the AMD cards and Dx11 in this game,would you call the AMD cards not Dx11 compliant just because they are so much slower?

Also they already made it work on Dx11 the separate render path is already there and can be used in Dx12 as well.
 

3DVagabond

Lifer
Aug 10, 2009
11,951
204
106
You don't need to have super improved performance to be compliant.
Just look at the AMD cards and Dx11 in this game,would you call the AMD cards not Dx11 compliant just because they are so much slower?

Also they already made it work on Dx11 the separate render path is already there and can be used in Dx12 as well.

You are comparing driver optimizing with lack of support.

Can you explain what you mean by the bold part using the DX11 path for DX12?
 

TheELF

Diamond Member
Dec 22, 2012
4,027
753
126
You are comparing driver optimizing with lack of support.

Wait, so Dx11 is serial compute based and nvidia cards hardware is serial compute based, amds are not so of course that is driver optimization

but Dx12 is async compute based (at least the part we are debating) amd cards hardware is async compute based, nvidias are not so of course that is lack of support

It doesn't work like that, it's either the one or the other.

Can you explain what you mean by the bold part using the DX11 path for DX12?
Well they already have a serial based render path that works in Dx11 how difficult isit to "copy/paste" this into Dx12.
 
Feb 19, 2009
10,457
10
76
In this case both Kepler and Maxwell can execute a standard multi-engine code, but don't able to run the async compute in the compute command engines, so the async compute jobs will be loaded to the main command engine, and this leads to some not useful context switches. The result will be a dramatic performance degradation with a standard D3D12 multi-engine code. To avoid this situation the devs must use an alternate codepath, that use the compute command engines in a not really efficient way, but this option is better for NVIDIA, or it is possible to not use async compute.

Thanks for the insightful post, I suspected something like that but lacking in indepth knowledge to confirm. Basically calling Async Compute queues for NV will add a performance penalty because their command engine has to re-interpret it for the serial architecture. It's better to just straight up not use Async Compute for hardware that don't support it.

So when Oxide does their beta 2 optimization, we should not see NV performance degrade in DX12 vs DX11. We might even see a small perf gain due to CPU scaling.
 

3DVagabond

Lifer
Aug 10, 2009
11,951
204
106
Wait, so Dx11 is serial compute based and nvidia cards hardware is serial compute based, amds are not so of course that is driver optimization

but Dx12 is async compute based (at least the part we are debating) amd cards hardware is async compute based, nvidias are not so of course that is lack of support

It doesn't work like that, it's either the one or the other.


Well they already have a serial based render path that works in Dx11 how difficult isit to "copy/paste" this into Dx12.

Have any evidence that AMD can't execute "serial compute"? Is that even possible? I don't believe it is, but if you can show me.

Also, what makes you think you can simply copy and paste DX11 code into the DX12 render path and even if you can, why would you.

I really don't think anything you've said here is possible or accurate.
 

Deders

Platinum Member
Oct 14, 2012
2,401
1
91
Of course AMD can execute commands in serial. How do you think they have been managing so far in all the other DX's?
 

littleg

Senior member
Jul 9, 2015
355
38
91
Of course AMD can execute commands in serial. How do you think they have been managing so far in all the other DX's?

They obviously can but they then suffer from a lack of utilisation due to the design of GCN (I think, utilisation may be the wrong word, i'm far from an expert)
 

Deders

Platinum Member
Oct 14, 2012
2,401
1
91
They obviously can but they then suffer from a lack of utilisation due to the design of GCN (I think, utilisation may be the wrong word, i'm far from an expert)

Is this based on AOTS performance? The reason you see such an improvement with AMD cards and DX12 is down not needing as much CPU overhead. Nvidia already had this well optimised with DX11 drivers.

Not quite the same as DX12 performance should give, but enough to make a big difference.
 

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
In this case both Kepler and Maxwell can execute a standard multi-engine code, but don't able to run the async compute in the compute command engines, so the async compute jobs will be loaded to the main command engine, and this leads to some not useful context switches. The result will be a dramatic performance degradation with a standard D3D12 multi-engine code.

This doesnt make any sense. Context switches can only hurt the performance if you switch the context. The driver is not putting every queue into one. Execution happens in a serial way. This is only a problem within the graphics queue. And using the graphics queue to execute draw and dispatch commands is not a "standard dx12 multi engine code".
There is no context switch involved when a developer uses a graphics and a compute queue because the driver can schedule the workload serial and the hardware can immediately execute them.
 

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
Are you sure about this statement?

The hardware doesnt need to switch context between every dispatch command in a compute queue or every draw command in a graphics queue.

But if you put multiple dispatch commands in a graphics queue every time the nVidia hardware has a penalty before the dispatch command gets executed.
 

zlatan

Senior member
Mar 15, 2011
580
291
136
Thanks for the insightful post, I suspected something like that but lacking in indepth knowledge to confirm. Basically calling Async Compute queues for NV will add a performance penalty because their command engine has to re-interpret it for the serial architecture. It's better to just straight up not use Async Compute for hardware that don't support it.

If you use async compute to offload some long running jobs to the compute engines, which is the point of this concept, than yes.
 

zlatan

Senior member
Mar 15, 2011
580
291
136
This doesnt make any sense. Context switches can only hurt the performance if you switch the context. The driver is not putting every queue into one. Execution happens in a serial way. This is only a problem within the graphics queue. And using the graphics queue to execute draw and dispatch commands is not a "standard dx12 multi engine code".
There is no context switch involved when a developer uses a graphics and a compute queue because the driver can schedule the workload serial and the hardware can immediately execute them.
D3D12 is an explict API with well specified multi-engine model. The driver don't have the rights to change the application own queueing design. This has to be written explicitly. If an archirtecture don't designed well for a standard code, than the application should have an alternate codepath to support the problematic hardwares correctly.
 

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
If you use async compute to offload some long running jobs to the compute engines, which is the point of this concept, than yes.

If you offload dispatch calls to the compute queue, nVidia has no problem. This was verified by the guy from the beyond3d.com forum who wrote the Async Compute benchmark.

D3D12 is an explict API with well specified multi-engine model. The driver don't have the right to change the application own queueing design. This has to be written explicitly. If an archirtecture don't designed well for a standard code, than the application should have an alternate codepath to support the problematic hardwares correctly.

nVidia doesnt change anything. Execution of the compute queue happens in the graphics queue as a seperated queue. It is a non problem on hardware which cant execute graphics and compute queues at the same time.
 
Last edited:

Kenmitch

Diamond Member
Oct 10, 1999
8,505
2,250
136
This doesnt make any sense. Context switches can only hurt the performance if you switch the context. The driver is not putting every queue into one. Execution happens in a serial way. This is only a problem within the graphics queue. And using the graphics queue to execute draw and dispatch commands is not a "standard dx12 multi engine code".
There is no context switch involved when a developer uses a graphics and a compute queue because the driver can schedule the workload serial and the hardware can immediately execute them.

I'm not into the software side of things so I don't really know how it works.

Reading your statement makes me wonder what your saying.

I decipher your statement and I came to the following conclusion....It's not fair to implement it that way if AMD gpus benefit from it and nVidia doesn't....Correct?
 

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
No, it is the job of the developer to implement it for every hardware in a way that it works.

Oxide has demanded a low level API and more access to the GPUs. Blaming the hardware vendor when something doesnt work, it just an excuse.