Ashes of the Singularity User Benchmarks Thread

Page 31 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

AtenRa

Lifer
Feb 2, 2009
14,003
3,362
136
1) So, it's gone from nVidia has been working on DX12 w/ Msft for the last 5 years to only having DX11 access. I guess we are now supposed to believe that nVidia was waiting for their free Win10 upgrade from Msft before they had access to DX12?

I though they had 80% of Gaming market, how come they didnt have win 10 access before August 2015 ?? :rolleyes:
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,362
136
Well, considering that AMD offers excellent DX11 support in Mantle games, I'll bet they will in DX12 games as well.

Really i dont see why they would need to. Every GCN card from January 2012 onward supports DX-12 and Async Compute. Next year in 2016 only a small handful of users will game with non GCN graphics cards. And of those only a few percentage will game in non DX-12 OS.

On the other hand, NVIDIA will have to support DX-11 longer since not all of its latest graphics cards will gain much from DX-12. For instance i highly doubt Fermi cards will benefit much, if at all running in DX-12 mode vs DX-11.
 

tg2708

Senior member
May 23, 2013
687
20
81
If amd dx11 drivers are consider lackluster to nvidias why in most games their performance are so close? So in other words if they were to take a little bit more time to work on said driver optimizations that would put them ahead of nvidia sans gameworks titles. Amd putting less effort yet keeping up so well is a good feat in and of itself.
 

3DVagabond

Lifer
Aug 10, 2009
11,951
204
106
nVidia hasnt got access to the source code of the "game" since one year. They got the source code of a previous versions of the engine which run fine on their hardware...

Source? Seeing as how the latest build has nVidia specific code supplied by nVidia what are you basing this on?
 

3DVagabond

Lifer
Aug 10, 2009
11,951
204
106
It's a very different thing to have the specifications of dx12 from having a working environment where you can actually run something.

nVidia hardware has been used in Msft DX12 presentations since the beginning. Stop trying to misrepresent this as nVidia getting blindsided by DX12. The position is ridiculous.
 

Hitman928

Diamond Member
Apr 15, 2012
6,749
12,477
136
Results from 2nd benchmark. I didn't plot the 980 Ti as high as the single command list goes because it ends up in the thousands and you can't see the rest of the results clearly.

vydSXB7.png

3NrhGRo.png

ccDss46.png

7c5TKsf.png
 

Final8ty

Golden Member
Jun 13, 2007
1,172
13
81
Yeah and a anti-NV studio who never bother to go and optimize a synthetic benchmark that's a one-off showcase.

Some of you forum warriors are insulting some of the greatest programmers around with years of experience in CREATING DirectX standard for the entire industry and game engines that push the tech boundaries, Civ 5, first for multi-thread rendering in the DX11 era, one of the first for DirectCompute shading etc.

If you don't want to believe it fine, wait and see. Don't drag good people into the mud with you. If you should be hating on crook devs, go hating on Warner Bros (Batman AK anyone?) or Ubifail.

Seriously SIGGRAPH 2015 was recent, imagine Dan Baker and Tim Foley in that same conference room educating people about next-gen APIs... and here a bunch of forum warriors are attacking their credibility on the very topic of which they were called to represent as the best in their fields.

I recall a similar attack against DICE's Johan Anderson, for being an AMD PR or shill back during the Mantle announcement. Guess what? All the Frosbite games ran excellent on NV hardware, even better than on AMD. Proof right there of a high standard of ethics.

Attacking respectable messengers when they deliver a message you dislike hearing is shameless.

icon14.gif
 

antihelten

Golden Member
Feb 2, 2012
1,764
274
126
Results from 2nd benchmark. I didn't plot the 980 Ti as high as the single command list goes because it ends up in the thousands and you can't see the rest of the results clearly.

snip

That is fairly interesting.

Basically changing the number of command lists fro AMD seems to be able to replicate the behavior seen for Nvidia, at least for the 7000 series card in there (the 7950), which when changed to 1 command list shows behavior similar to Nvidias. This makes sense of course since using 1 command list would essentially be the same as forcing async off. Of course the rx 200 series shows some weird behavior here (higher performance with 1 command list at high kernel count), so questions still remain.

Another interesting thing is that with the changes to the compute kernel and the increase in number, we are now seeing the expected stepping for AMD as well, but even here questions remain. GCN 1.1/1.2 should be able to support 64 queues and thus have steps of 64, but instead we see steps of 32, unless it's set to 1 command queue, where we then see steps of 64 as expected. GCN 1.0 should only support 16 queues (only has 2 ACEs), but shows steps of 64 both with and without command list forced to 1.

Other than that, AMD still seems to suffer from some global latency issues, since their compute times are far longer than they have any reason to be, this is of course not something that appears to happen in any games making use of async compute (like ashes), since it would results in framerates well below 30 fps.


Btw, this page is quite useful for visualizing the impact of async:
http://nubleh.github.io/async/
 
Last edited:

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
The "graphics, compute" setting is using one graphics and one compute queue. Within the compute queue the programm is launching kernels.

nVidia and AMD support 32 concurrent kernels within one queue.
 
Feb 19, 2009
10,457
10
76
The b3d guys are talking about async compute emulation on Maxwell/Kepler, software emulation leading to higher CPU usage as a possible theory.

User with 980ti then confirms it with the updated app:

https://forum.beyond3d.com/threads/dx12-performance-thread.57188/page-12#post-1869361

Used the new one on my 980TI. Took 14 minutes to run and the Nvidia 355.82 driver crashed at the async compute batch 455(when it got neat to 3000ms/batch).
Also added Afterburner log during the run if it interest anyone. From a quick glance uses more CPU and the GPU switched from 100 to 0% usage when going from one async compute batch to another.

Same for the 970

In async mode, GPU usage drops and CPU usage spike whenever its doing compute.

ei0wWLM.png


Keep following and see where it leads, seems to be scratching the surface only.
 
Last edited:

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Don't know if this has been posted yet, but another Oxide developer commented on the situation.

He was replying to Mahigan (who happens to be an ex ATi employee apparently):

It didn't look like there was a hardware defect to me on Maxwell just some unfortunate complex interaction between software scheduling trying to emmulate it which appeared to incure some heavy CPU costs. Since we were tying to use it for #1, not #2, it made little sense to bother. I don't believe there is any specific requirement that Async Compute be required for D3D12, but perhaps I misread the spec.

It appears that there may not even be any set specification for Asynchronous Compute in DX12, which could explain how AMD and NVidia's implementation differs so drastically..
 

Erenhardt

Diamond Member
Dec 1, 2012
3,251
105
101
So, GCN can do compute task while doing graphics task without any performance penalty...

<"tinfoil_hat"=1>
Anyone with amd card is mining bitcoins while they play games and they don't even know that. AMD driver recognizes graphics load and starts compute bitmining if there are free compute resources. That is what keeps the lights on in amd HQ.
<"tinfoil_hat"=0>
 
Feb 19, 2009
10,457
10
76
Actually here's the updated results:

980Ti:
https://forum.beyond3d.com/posts/1869478/

Compute only:
1. 6.79ms

Graphics only: 16.21ms

Graphics + compute:
1. 20.22ms

Graphics, compute single commandlist:
1. 20.04ms

Your result is identical to others. Running Graphics + Compute results in an additive output, close to the sum of compute + graphics.

Also your single commandlist results (forced), result in ever rising timings as we've seen with the others, up to 281st with a time of 2117.00ms!

Is this what Oxide is talking about? When they try to force direct async mode it would mess up.

It looks as though if it's NOT forced, it operates in normal serial mode, adding up the sum of the timings graphics & compute. Users also report CPU emulation, CPU usage spike, GPU usage drops. Thus this normal call allowing the driver to handle it, results in emulation.

When its forced to do async in hardware.. it goes bonkers.

So what Oxide is saying, *seems* correct so far (software emulation, issues if try to force it). We need more info/data and potentially other apps testing it to be certain.

But Maxwell can't do DX12* Async Compute.

*As it currently exist, it does not handle the DX12 async compute call properly.
 
Last edited:
Feb 19, 2009
10,457
10
76
So, GCN can do compute task while doing graphics task without any performance penalty...

<"tinfoil_hat"=1>
Anyone with amd card is mining bitcoins while they play games and they don't even know that. AMD driver recognizes graphics load and starts compute bitmining if there are free compute resources. That is what keeps the lights on in amd HQ.
<"tinfoil_hat"=0>

Ahaha OMG, that cracks me up. All AMD GPUs minting coins unwillingly. Only an evil genius corporation would pull that off!
 

selni

Senior member
Oct 24, 2013
249
0
41
Actually here's the updated results:

980Ti:
https://forum.beyond3d.com/posts/1869478/



It looks as though if it's NOT forced, it operates in normal serial mode, adding up the sum of the timings graphics & compute. Users also report CPU emulation, CPU usage spike, GPU usage drops. Thus this normal call allowing the driver to handle it, results in emulation.

When its forced to do async in hardware.. it goes bonkers.

So what Oxide is saying, *seems* correct so far (software emulation, issues if try to force it). We need more info/data and potentially other apps testing it to be certain.

But Maxwell can't do DX12* Async Compute.

*As it currently exist, it does not handle the DX12 async compute call properly.

That's over interpreting a bit - you can use async compute there's just not a significant speedup from doing so. GCN has hardware dedicated to asynch compute - that using it is faster than not using it is not surprising.

Is the same true of Maxwell though? If not then that sort of result seems like it'd be expected?
 

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
So what Oxide is saying, *seems* correct so far (software emulation, issues if try to force it). We need more info/data and potentially other apps testing it to be certain.

But Maxwell can't do DX12* Async Compute.

*As it currently exist, it does not handle the DX12 async compute call properly.

You realized that nVidia is doing Asynchronous Compute in this test? :\

There are two test sets:
1. Asynchronous Compute with one graphics (direct) and one compute queue.
2. Asynchronous Compute with one graphics (direct) queue only.

The first set is the real "Asynchronous Compute" workload because compute work gets submitted through the compute queue and there is no need for a context switch between graphics and compute.
 
Feb 19, 2009
10,457
10
76
You realized that nVidia is doing Asynchronous Compute in this test? :\

There are two test sets:
1. Asynchronous Compute with one graphics (direct) and one compute queue.
2. Asynchronous Compute with one graphics (direct) queue only.

The first set is the real "Asynchronous Compute" workload because compute work gets submitted through the compute queue and there is no need for a context switch between graphics and compute.

Compute only:
1. 6.79ms

Graphics only: 16.21ms

Graphics + compute (Async #1):
1. 20.22ms

Graphics, compute single commandlist (Async #2):
1. 20.04ms

If it could do proper async compute, Graphics + Compute would be no more than ~16.21ms. Why? Because while graphics is in the pipe, compute is also in and completed in 6.79ms, when graphics finish, both tasks are counted as done, thus, ~16.21ms.

The Graphics + Compute times of >20ms indicates Graphics in the pipeline, when its done, compute begins. It may save some time for the 1st kernel, but subsquent kernels even out.

For all the kernel data along with times plotted, go here:

Credits to Nub: http://nubleh.github.io/async/#36

That charts out all the results so far. Move your mouse across and look at the times for the graphics or compute individually, then combined async compute, and also time saved. Notice all the Maxwell GPUs doesn't get much faster in async mode than the sum of the individual tasks.
 
Last edited:

Flapdrol1337

Golden Member
May 21, 2014
1,677
93
91
Compute only:
1. 6.79ms

Graphics only: 16.21ms

Graphics + compute (Async #1):
1. 20.22ms

Graphics, compute single commandlist (Async #2):
1. 20.04ms

If it could do proper async compute, Graphics + Compute would be no more than ~16.21ms.

Credits to Nub: http://nubleh.github.io/async/#36

That charts out all the results so far. Move your mouse across and look at the times for the graphics or compute individually, then combined async compute, and also time saved. Notice all the Maxwell GPUs doesn't get much faster in async mode than the sum of the individual tasks.
But it is a bit faster.

You can't do graphics + compute without any penalty to graphics. It'll at least use extra power, focing the clocks down.

The only thing that matters is the total performance.
 

Erenhardt

Diamond Member
Dec 1, 2012
3,251
105
101
But it is a bit faster.

You can't do graphics + compute without any penalty to graphics. It'll at least use extra power, focing the clocks down.

The only thing that matters is the total performance.

If you go through the linked graph, there is a delta of +-4ms in async mode in relation to total graphics+compute time.
 

Zstream

Diamond Member
Oct 24, 2005
3,395
277
136
But it is a bit faster.

You can't do graphics + compute without any penalty to graphics. It'll at least use extra power, focing the clocks down.

The only thing that matters is the total performance.

Your definition of faster is interesting.
 

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
If it could do proper async compute,

You should read what Asynchronous Compute actual is:
https://msdn.microsoft.com/de-de/library/windows/desktop/dn899217%28v=vs.85%29.aspx

You see that nVidia can run different kernels at the same time within the compute queue. So they do "proper async compute".

Claiming that nVidia cant do Async Compute would be same like claming that AMD cant do Tessellation because nVidia is much faster. :\

That charts out all the results so far. Move your mouse across and look at the times for the graphics or compute individually, then combined async compute, and also time saved. Notice all the Maxwell GPUs doesn't get much faster in async mode than the sum of the individual tasks.
And it takes 5x times as long to initialize the compute queue on AMD hardware as on nVidia. So there is a penalty involved when using the compute queue on AMD hardware.
 
Last edited:

Erenhardt

Diamond Member
Dec 1, 2012
3,251
105
101
And it takes 5x times as long to initialize the compute queue on AMD hardware as on nVidia. So there is a penalty involved when using the compute queue on AMD hardware.

Because this software is designed to run only on the fraction of GCN 'queues'
Which was explained 10 posts above...

Also, do we know what is the 'compute task' we are dealing here with? It may very well be something that is dependent on the architecture/memory/cache capacity/speed/latency and have no relevance in real world performance.