computerbaseAshes of the Singularity Beta1 DirectX 12 Benchmarks

railven · Feb 4, 2016

ShintaiDK said:
Or perhaps you imagine DX12 is something it isn't.

I don't imagine DX12 is anything. I'd love to see it in action. Unfortunately, the only example I could get my hands on is a genre that I don't much like and being an NV user, I'd benefit more from using the DX11 path.

coercitiv · Feb 4, 2016

ShintaiDK said:
I know exactly what it is. And running out of cash seems to be the reason behind.

Straight from the Early Access page on Steam.

Ashes of the Singularity is fully funded, we are not relying on Early Access sales to finish the game.

ShintaiDK · Feb 4, 2016

coercitiv said:
Straight from the Early Access page on Steam.

Yes, that's always how it is

Glo. · Feb 4, 2016

Nvidia MegaThread engine is dependent on the software. Maxwell is extremely efficient architecture because it does not have absolutely clue what to with application on hardware level. It lacks hardware scheduling. Nvidia got rid of that because Fermi, which had HS, was inefficient and hot. Big cache, extremely fast cores, which are able to empty the schedule very fast - thats where comes potential for overclocking, performance etc.

Problem comes when you have Low-Level APIs. To deal with that you have to program whole application just for Nvidia hardware in order to utilize it properly. That is real world. Low-Level APIs brought control over the hardware to the application, not the drivers. Only driver you have is... the API driver. Nothing else there is between the App and the hardware. Nvidia approach to application is giving another layer of abstraction which controls both hardware and software(the application). That is easiest explanation here.

For AMD where you have wide GPUs, but they lack the ability to work in simple environment with preemption - thats why DX11 were they behind. When you will get Compute, graphics, hardware scheduling and Asynchronous Compute with context switching in one place - thats where you will gigantic leaps. Thats why AMD get benefit from DX12. Because all of the features are on hardware level. It will not be impossible, when true DX12 titles will come out with Compute, context switching and hardware scheduling, to see that R9 390X surpasses GTX 980 Ti. You cannot run from the fact that both cards have pretty similar amount of compute power. And that is what will play biggest difference in future titles.

Leadbox · Feb 4, 2016

ShintaiDK said:
Yes, since you claim NVidia is the only one with vendor specific code. You can of course do so from the rationale that its an AMD game, hence any DX12 not running on AMD must be vendor specific.

When they found they couldn't async compute on nvidia , they took steps to accommodate them i.e vendor specific optimization. Stop twisting it.

Goatsecks · Feb 4, 2016

I can't believe people are still getting hot under the collar about a single game that is:

in beta,
pre-performance optimisation,
on a brand new API (no DX12 title is currently available),
new API being implemented on older hardware*

* older hardware w.r.t. DX12. I suspect this will be controversial but; current hardware is, primarily, DX11 hardware. Furthermore, we do not know what the developers, AMD, Nvidia and microsoft are up to. You guys need to keep in mind how minuscule the amount of available information is.

Headfoot · Feb 4, 2016

IMO we need 3 or 4 fully released DX12 games each from different developers (and even waiting a month for the first few patches to drop for each) before any solid conclusions can be made on DX12. Single game tests are too dependent on the nature of the game and the skill of the engine programmers at that one company to make conclusions about the entire ecosystem

zlatan · Feb 4, 2016

Async compute is an easy, but tricky thing in D3D12. This is a multi-engine API, so choosing the right queue for the right job is important. The API can support any hardware with a multi-engine code so the devs only need to care about to run some jobs asynchronously. For example texture streaming is an easy target and it should be loaded to the copy engine. If a hardware only has one DMA unit, then D3D12 will automatically run the job on the graphics engine. This is a very easy multi-engine model, because graphics is the superset of compute, and compute is the superset of copy. This is flawless in my opinion so kudos to Microsoft.
Still the async compute is problematic, because the DXKG specification require a very specific compute command engine in the hardware that support fences and barriers in the right way. The fences mostly used to synchronize the workloads, while the barriers used to block an operation on the GPU. In D3D12 the barriers has to support some specific conditions which makes this API really future-proof, but make some limits either, because GCN is the only architecture where the compute command engines designed in the right way. Kepler and Maxwell has some independent compute command engines, but these aren't support the specific barrier conditions that DXKG requires for D3D12. In this case both Kepler and Maxwell can execute a standard multi-engine code, but don't able to run the async compute in the compute command engines, so the async compute jobs will be loaded to the main command engine, and this leads to some not useful context switches. The result will be a dramatic performance degradation with a standard D3D12 multi-engine code. To avoid this situation the devs must use an alternate codepath, that use the compute command engines in a not really efficient way, but this option is better for NVIDIA, or it is possible to not use async compute.

3DVagabond · Feb 4, 2016

Silverforce11 said:
Yes that sounds correct, it is the developer's choice if they enable Async Compute for hardware that don't support it. They need a separate check for IHV and enable/disable features.

We should see it in Beta 2 when they focus on optimizations.

DX12 should not perform slower, basically, worse case scenario, run compute serial as DX11 and get the benefits of lower API overhead on the CPU side.

It might require a separate render path and be a lot more work than you think. Shouldn't the IHV's make hardware that is DX12 compliant if they claim it is?

TheELF · Feb 4, 2016

3DVagabond said:
It might require a separate render path and be a lot more work than you think. Shouldn't the IHV's make hardware that is DX12 compliant if they claim it is?

You don't need to have super improved performance to be compliant.
Just look at the AMD cards and Dx11 in this game,would you call the AMD cards not Dx11 compliant just because they are so much slower?

Also they already made it work on Dx11 the separate render path is already there and can be used in Dx12 as well.

3DVagabond · Feb 4, 2016

TheELF said:
You don't need to have super improved performance to be compliant.
Just look at the AMD cards and Dx11 in this game,would you call the AMD cards not Dx11 compliant just because they are so much slower?

Also they already made it work on Dx11 the separate render path is already there and can be used in Dx12 as well.

You are comparing driver optimizing with lack of support.

Can you explain what you mean by the bold part using the DX11 path for DX12?

TheELF · Feb 4, 2016

3DVagabond said:
You are comparing driver optimizing with lack of support.

Wait, so Dx11 is serial compute based and nvidia cards hardware is serial compute based, amds are not so of course that is driver optimization

but Dx12 is async compute based (at least the part we are debating) amd cards hardware is async compute based, nvidias are not so of course that is lack of support

It doesn't work like that, it's either the one or the other.

Can you explain what you mean by the bold part using the DX11 path for DX12?

Well they already have a serial based render path that works in Dx11 how difficult isit to "copy/paste" this into Dx12.

Silverforce11 · Feb 4, 2016

zlatan said:
In this case both Kepler and Maxwell can execute a standard multi-engine code, but don't able to run the async compute in the compute command engines, so the async compute jobs will be loaded to the main command engine, and this leads to some not useful context switches. The result will be a dramatic performance degradation with a standard D3D12 multi-engine code. To avoid this situation the devs must use an alternate codepath, that use the compute command engines in a not really efficient way, but this option is better for NVIDIA, or it is possible to not use async compute.

Thanks for the insightful post, I suspected something like that but lacking in indepth knowledge to confirm. Basically calling Async Compute queues for NV will add a performance penalty because their command engine has to re-interpret it for the serial architecture. It's better to just straight up not use Async Compute for hardware that don't support it.

So when Oxide does their beta 2 optimization, we should not see NV performance degrade in DX12 vs DX11. We might even see a small perf gain due to CPU scaling.

3DVagabond · Feb 4, 2016

TheELF said:
Wait, so Dx11 is serial compute based and nvidia cards hardware is serial compute based, amds are not so of course that is driver optimization

but Dx12 is async compute based (at least the part we are debating) amd cards hardware is async compute based, nvidias are not so of course that is lack of support

It doesn't work like that, it's either the one or the other.

Well they already have a serial based render path that works in Dx11 how difficult isit to "copy/paste" this into Dx12.

Have any evidence that AMD can't execute "serial compute"? Is that even possible? I don't believe it is, but if you can show me.

Also, what makes you think you can simply copy and paste DX11 code into the DX12 render path and even if you can, why would you.

I really don't think anything you've said here is possible or accurate.

Deders · Feb 4, 2016

Of course AMD can execute commands in serial. How do you think they have been managing so far in all the other DX's?

littleg · Feb 4, 2016

Deders said:
Of course AMD can execute commands in serial. How do you think they have been managing so far in all the other DX's?

They obviously can but they then suffer from a lack of utilisation due to the design of GCN (I think, utilisation may be the wrong word, i'm far from an expert)

Deders · Feb 5, 2016

littleg said:
They obviously can but they then suffer from a lack of utilisation due to the design of GCN (I think, utilisation may be the wrong word, i'm far from an expert)

Is this based on AOTS performance? The reason you see such an improvement with AMD cards and DX12 is down not needing as much CPU overhead. Nvidia already had this well optimised with DX11 drivers.

Not quite the same as DX12 performance should give, but enough to make a big difference.

sontin · Feb 5, 2016

zlatan said:
In this case both Kepler and Maxwell can execute a standard multi-engine code, but don't able to run the async compute in the compute command engines, so the async compute jobs will be loaded to the main command engine, and this leads to some not useful context switches. The result will be a dramatic performance degradation with a standard D3D12 multi-engine code.

This doesnt make any sense. Context switches can only hurt the performance if you switch the context. The driver is not putting every queue into one. Execution happens in a serial way. This is only a problem within the graphics queue. And using the graphics queue to execute draw and dispatch commands is not a "standard dx12 multi engine code".
There is no context switch involved when a developer uses a graphics and a compute queue because the driver can schedule the workload serial and the hardware can immediately execute them.

coercitiv · Feb 5, 2016

sontin said:
Context switches can only hurt the performance if you switch the context.

Are you sure about this statement?

sontin · Feb 5, 2016

coercitiv said:
Are you sure about this statement?

The hardware doesnt need to switch context between every dispatch command in a compute queue or every draw command in a graphics queue.

But if you put multiple dispatch commands in a graphics queue every time the nVidia hardware has a penalty before the dispatch command gets executed.

zlatan · Feb 5, 2016

Silverforce11 said:
Thanks for the insightful post, I suspected something like that but lacking in indepth knowledge to confirm. Basically calling Async Compute queues for NV will add a performance penalty because their command engine has to re-interpret it for the serial architecture. It's better to just straight up not use Async Compute for hardware that don't support it.

If you use async compute to offload some long running jobs to the compute engines, which is the point of this concept, than yes.

zlatan · Feb 5, 2016

sontin said:
This doesnt make any sense. Context switches can only hurt the performance if you switch the context. The driver is not putting every queue into one. Execution happens in a serial way. This is only a problem within the graphics queue. And using the graphics queue to execute draw and dispatch commands is not a "standard dx12 multi engine code".
There is no context switch involved when a developer uses a graphics and a compute queue because the driver can schedule the workload serial and the hardware can immediately execute them.

D3D12 is an explict API with well specified multi-engine model. The driver don't have the rights to change the application own queueing design. This has to be written explicitly. If an archirtecture don't designed well for a standard code, than the application should have an alternate codepath to support the problematic hardwares correctly.

sontin · Feb 5, 2016

zlatan said:
If you use async compute to offload some long running jobs to the compute engines, which is the point of this concept, than yes.

If you offload dispatch calls to the compute queue, nVidia has no problem. This was verified by the guy from the beyond3d.com forum who wrote the Async Compute benchmark.

zlatan said:
D3D12 is an explict API with well specified multi-engine model. The driver don't have the right to change the application own queueing design. This has to be written explicitly. If an archirtecture don't designed well for a standard code, than the application should have an alternate codepath to support the problematic hardwares correctly.

nVidia doesnt change anything. Execution of the compute queue happens in the graphics queue as a seperated queue. It is a non problem on hardware which cant execute graphics and compute queues at the same time.

Kenmitch · Feb 5, 2016

sontin said:
This doesnt make any sense. Context switches can only hurt the performance if you switch the context. The driver is not putting every queue into one. Execution happens in a serial way. This is only a problem within the graphics queue. And using the graphics queue to execute draw and dispatch commands is not a "standard dx12 multi engine code".
There is no context switch involved when a developer uses a graphics and a compute queue because the driver can schedule the workload serial and the hardware can immediately execute them.

I'm not into the software side of things so I don't really know how it works.

Reading your statement makes me wonder what your saying.

I decipher your statement and I came to the following conclusion....It's not fair to implement it that way if AMD gpus benefit from it and nVidia doesn't....Correct?

sontin · Feb 5, 2016

No, it is the job of the developer to implement it for every hardware in a way that it works.

Oxide has demanded a low level API and more access to the GPUs. Blaming the hardware vendor when something doesnt work, it just an excuse.

computerbaseAshes of the Singularity Beta1 DirectX 12 Benchmarks

Diamond Member

Diamond Member

Lifer

Diamond Member

Senior member

Senior member

Diamond Member

Senior member

Lifer

Diamond Member

Lifer

Diamond Member

Lifer

Lifer

Platinum Member

Senior member

Platinum Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Senior member

Diamond Member

Diamond Member

Diamond Member