computerbaseAshes of the Singularity Beta1 DirectX 12 Benchmarks

Page 32 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Shivansps

Diamond Member
Sep 11, 2013
3,916
1,570
136
It's up to 20%, and that depends on how pervasive its usage is:
66f7316f239e2872b3b9bf52245d79f3.jpg

1529d639cd2fc07e8fc660fc3d892c6a.jpg

This is strange, why you gain more on higher resolutions where you have less idle time on graphic hardware and less on 1080p where you are more cpu limited?
 
Feb 19, 2009
10,457
10
76
FCAT is broken so the Guru3D results are wrong: http://www.extremetech.com/extreme/223654-instrument-error-amd-fcat-and-ashes-of-the-singularity

Basically AMD support WDM instead of DirectFlip. It reduces tearing and image anomolies. WDM is new to DX12. NVIDIA is still using the older technology.

I was thinking that could be it, because FCAT was a tool designed for DX11 API, pre-DX12.

And now that explains why all other review sites, including ones that record the benchmark show smooth performance all round. Trust in the data they said! Eyes do it best. :)

First, some basics. FCAT is a system NVIDIA pioneered that can be used to record, playback, and analyze the output that a game sends to the display. This captures a game at a different point than FRAPS does, and it offers fine-grained analysis of the entire captured session. Guru3D argues that FCAT’s results are intrinsically correct because “Where we measure with FCAT is definitive though, it’s what your eyes will see and observe.” Guru3D is wrong. FCAT records output data, but its analysis of that data is based on assumptions it makes about the output — assumptions that aren’t accurate in this case.

AMD’s driver follows Microsoft’s recommendations for DX12 and composites using the Desktop Windows Manager to increase smoothness and reduce tearing. FCAT, in contrast, assumes that the GPU is using DirectFlip. According to Oxide, the problem is that FCAT assumes so-called intermediate frames make it into the data stream and depends on these frames for its data analysis. If V-Sync is turned off differently than FCAT expects, the FCAT tools cannot properly analyze the final output. The application’s accuracy is only as reliable as its assumptions, after all.

An Oxide representative told us that the only real negative from AMD’s switch to DWM compositing from DirectFlip “s that it throws off FCAT.”

In this case, AMD is using Microsoft’s recommended compositing method, not the method that FCAT supports, and the result is an FCAT graph that makes AMD’s performance look terrible. It isn’t. From an end-user’s perspective, compositing through DWM eliminates tearing in windowed mode and may reduce it in fullscreen mode as well when V-Sync is disabled.
 
Feb 19, 2009
10,457
10
76
So it cant be trusted as a benchmark....

Maybe averages over multiple runs, will reduce the element of randomness.

In many ways, this is the SAME as reviewers benching their playthrough, running and fighting through the level. It's never the same. Even worse for online games.

A good review site will tell you this beforehand and tell you that they run multiple time and get the average result of each run.

There's this debate, whether a scripted benchmark is representative of actual gameplay, and whether that opens the doors to IHV to optimize the specific benchmark. Whereas a gameplay based test, they would have to optimize for the game. ;)
 

Dygaza

Member
Oct 16, 2015
176
34
101
So it cant be trusted as a benchmark....

If you have average fps for total benchmark variating between runs around 1 fps when hitting 80 fps. Yes it can. Heck, you get fps variation between runs even on synthetic tests.
 

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
Taking account of the frequencies scaling under DX12 is roughly 95%, 33% more perfs for 40% more shaders and 5% lower frequencies while bandwith is not even scaled by 40%...

So what is your point..?.

That it should overscale..??

A GTX980TI has 61% more compute/texture/pixel/geometry performance and over 70% more bandwidth.

The GTX980TI is not gpu bound in this setting.

BTW: Fury X is only 32% faster than 390 in the Hitmam Beta, too: http://pclab.pl/art68473-7.html

The same difference you see on the Celeron in Ashes.
 

Mahigan

Senior member
Aug 22, 2015
573
0
0
Those settings are not "heavily GPU-Bound":
aos_crazy_1.png

http://pclab.pl/art67995-15.html

With DX11 the GTX980TI is 10% faster, with DX12 only 27% faster than the GTX970. This card schould be ~47-50% better.



There doesnt exists any reason why nVidia cards dont see an improvement on a 2 core CPU. The DX11 optimization needs at least 4 threads:
71448.png

http://www.anandtech.com/show/8962/the-directx-12-performance-preview-amd-nvidia-star-swarm/4

73051.png

http://www.anandtech.com/show/9112/exploring-dx12-3dmark-api-overhead-feature-test/3
Not if it's hitting a compute wall. That being said, NVIDIA will optimize their drivers further (hopefully this time they won't be removing effects).

3.5 Tflops vs 5.63 Tflops = 38% more compute. But then we have the fact that it's executing those compute jobs sequentially under DX11.

So you likely have a lower compute utilization on the GTX 980 Ti than you do the GTX 970. If only NVIDIA supported Async Compute + Graphics eh?

We have a similar situation we had when AMD hadn't yet activated the Async compute portion of Fiji's HWSs when Ashes was in Alpha. A 390x was matching Fiji's performance.
 
Last edited:
Feb 19, 2009
10,457
10
76
Not if it's hitting a compute wall. That being said, NVIDIA will optimize their drivers further (hopefully this time they won't be removing effects).

3.5 Tflops vs 5.63 Tflops = 38% more compute. But then we have the fact that it's executing those compute jobs sequentially under DX11.

So you likely have a lower compute utilization on the GTX 980 Ti than you do the GTX 970. If only NVIDIA supported Async Compute eh?

We have a similar situation we had when AMD hadn't yet activated the Async compute portion of Fiji's HWSs when Ashes was in Alpha. A 390x was matching Fiji's performance.

Bingo.

There's some bottlenecks there in a serial engine, and if compute is pushing through and causing a traffic jam for graphics.. you get less performance scaling. This is exactly what happened to Fiji pre-Beta 2 with AC off. To the T exact, and in Beta 2 with AC on, some of the compute tasks are now offloaded to the ACEs, graphics flow freely, boom, Fury X performance jumps massively ahead of the 390X.
 

IllogicalGlory

Senior member
Mar 8, 2013
934
346
136
sontin said:
There doesnt exists any reason why nVidia cards dont see an improvement on a 2 core CPU. The DX11 optimization needs at least 4 threads:
What do you mean there's no improvement? I see the 980 Ti improving by 13%. The others are GPU-bound because the settings are fully maxed with 4x MSAA. Obviously the game is heavily GPU bound because the results are the same on OCed 6700K. The difference on those settings between an FX-4300 DX11 and that 6700K OC is a mere 30% and anything above an i3-6100 has no effect. There's no difference between any of the CPUs for the GTX 970.

GPUs display strange behavior across the board in this game; increasing resolution doesn't affect performance normally. This is probably due to its unconventional object space rendering pipeline in some way. I don't pretend to be any kind of expert on that.
 

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
Not if it's hitting a compute wall. That being said, NVIDIA will optimize their drivers further (hopefully this time they won't be removing effects).

3.5 Tflops vs 5.63 Tflops = 38% more compute. But then we have the fact that it's executing those compute jobs sequentially under DX11.

So you likely have a lower compute utilization on the GTX 980 Ti than you do the GTX 970. If only NVIDIA supported Async Compute + Graphics eh?

We have a similar situation we had when AMD hadn't yet activated the Async compute portion of Fiji's HWSs when Ashes was in Alpha. A 390x was matching Fiji's performance.

What you just wrote doesnt make any sense. There doesnt exist a "compute wall" with graphics card. There doesnt even exist any logical pipeline with compute operations at all. Compute workload is highly scalable on these architectures.

BTW: 5,63 is 60% more than 3,5. :\
 
Feb 19, 2009
10,457
10
76
There doesnt exist a "compute wall" with graphics card. There doesnt even exist any logical pipeline with compute operations at all. Compute workload is highly scalable on these architectures.

Sure is, when it's pure compute mode like CUDA. As soon as graphics is in the engine, nope, no compute for you! Compute? You can wait while graphics is done first.

Hopefully Pascal fixes this.
 

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
Do you know that a GTX980TI has 60% more shader/geometry/pixel performance, 50% more rasterizing performance and 70% more bandwidth than the GTX970? :|

Thats the reason why this card is for example ~50% faster in Hitman...
 

Glo.

Diamond Member
Apr 25, 2015
5,930
4,991
136
Has anyone seen this on FCAT on Ashes?
http://www.extremetech.com/extreme/223654-instrument-error-amd-fcat-and-ashes-of-the-singularity

So it is more problem with FRAPS than AMD Hardware?

Ashes of the Singularity measures its own frame variance in a manner similar to FRAPS; we extracted that information for both the GTX 980 Ti and the R9 Fury X. The graph above shows two video cards that perform identically — AMD’s frame times are slightly lower because AMD’s frame rate is slightly higher. There are no other significant differences. That’s what the benchmark “feels” like when viewed in person. The FCAT graph above suggests incredible levels of microstutter that simply don’t exist when playing the game or viewing the benchmark.
 

Mahigan

Senior member
Aug 22, 2015
573
0
0
What you just wrote doesnt make any sense. There doesnt exist a "compute wall" with graphics card. There doesnt even exist any logical pipeline with compute operations at all. Compute workload is highly scalable on these architectures.

BTW: 5,63 is 60% more than 3,5. :\

Theoretically yes, but not in practice. The more compute cores, the more resource sharing and the more latency is introduced. A GTX 980s individual SIMDs are faster than those of a GTX 980 Ti and a 980 Ti is faster than a TitanX. So yeah a GTX 970 has faster individual SIMDs than a GTX 980 Ti. So if you're executing large batches of compute work, that 60% theoretical advantage drops. Let me show you..
a12e700f539aa5d15ef0ba456bf843e9.jpg


That's SIMD latency. A GTX 980 Ti's SIMDs are 14.5% slower than a GTX 980s. You see NVIDIA Maxwell has more cores sharing less resources than AMD GCN. Maxwell has less hardware redundancy than GCN (meaning less non-shared units). That's why Maxwell consumes less power, it has less hardware on tap.

Ashes of the Singularity has a lot of compute work. Several lights, smoke and post processing effects going on. So a 60% theoretical Tflops advantage will translate into less under real world conditions.
 
Last edited:

Bacon1

Diamond Member
Feb 14, 2016
3,430
1,018
91

Although the benchmark 2 of Ashes of the Singularity is equipped with its own FCAT overlay, only made by the game itself log files are used for frametimes diagrams. Because the FCAT overlay produces incorrect results and Nvidia has FCAT is not adjusted for the DirectX 12 API.

Pardon the crappy translate, but they are saying they didn't use Nvidia's FCAT because it doesn't work, so they used the log files from the benchmark itself which are accurate.
 

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
Theoretically yes, but not in practice. The more compute cores, the more resource sharing and the more latency is introduced. A GTX 980s individual SIMDs are faster than those of a GTX 980 Ti and a 980 Ti is faster than a TitanX. So yeah a GTX 970 has faster individual SIMDs than a GTX 980 Ti. So if you're executing large batches of compute work, that 60% theoretical advantage drops. Let me show you..

That's SIMD latency. A GTX 980 Ti's SIMDs are 14.5% slower than a GTX 980s. You see NVIDIA Maxwell has more cores sharing less resources than AMD GCN. Maxwell has less hardware redundancy than GCN (meaning less non-shared units). That's why Maxwell consumes less power, it has less hardware on tap.

GPUs need thousends of threads to hide the latency. Using less threads to fill the GPU is just an unoptimized workload.

Ashes of the Singularity has a lot of compute work. Several lights, smoke and post processing effects going on. So a 60% theoretical Tflops advantage will translate into less under real world conditions.

Seriously, what have you just written? More work will result in less advantages? That doesnt make any sense. 60% advantages will result in 60% more performance. It is the nature of a compute archtecture.
 
Feb 19, 2009
10,457
10
76
Seriously, what have you just written? More work will result in less advantages? That doesnt make any sense. 60% advantages will result in 60% more performance. It is the nature of a compute archtecture.

That's because you have chosen to not understand.

More compute work will stall graphics rendering for NV GPUs because they run serial, compute is bottlenecking graphics. It's such a simple concept and many people have tried to help you understand many times already. -_-

Now, if compute was running alone, like in CUDA, it's great.

Maybe one day, NV will make Async Compute work with Kepler/Maxwell in their drivers, and you will see better scaling.
 

Glo.

Diamond Member
Apr 25, 2015
5,930
4,991
136
That's because you have chosen to not understand.

More compute work will stall graphics rendering for NV GPUs because they run serial, compute is bottlenecking graphics. It's such a simple concept and many people have tried to help you understand many times already. -_-

Now, if compute was running alone, like in CUDA, it's great.

Maybe one day, NV will make Async Compute work with Kepler/Maxwell in their drivers, and you will see better scaling.

They will not. To make it truly Asynchronous you need at least TWO Asynchronous engines. What Nvidia can do to deal with this problem is to put on their hardware at least 2 ACE's and one hardware scheduler.

Will they do it? Time will tell.
 

AnandThenMan

Diamond Member
Nov 11, 2004
3,991
626
126
Why would the chief editor/CEO of a tech site be such outright bias and hostile to a tech company? Contrary, he praises NV for releasing game ready drivers, NV has actually released game ready drivers for Ashes ever since ALPHA (when it was unavailable for the public to play).
It's rare to see a tech journalist so thoroughly put their foot in their mouth. And one of his responses was "cool story bro" I don't even...
Its ok, he also called Rise of the Tomb Raider an AMD title...
I hope that was a momentary lapse and he actually knows who sponsored said title. Anyway this game is going gold quite soon very much looking forward to it. :thumbsup:
 
Feb 19, 2009
10,457
10
76
They will not. To make it truly Asynchronous you need at least TWO Asynchronous engines. What Nvidia can do to deal with this problem is to put on their hardware at least 2 ACE's and one hardware scheduler.

Will they do it? Time will tell.

What about the theory they can enable compute via drivers by getting the other CPU threads (idled ones) to process the compute.

My opinion of that is it will fail hard, due to latency issues and sync issues, by sending compute task via the bus to the CPU and back again.

But potentially, with a good scheduler and pre-emption, they could in theory offload a long running compute task as soon as the frame started, and it's processed on the CPU while the GPU does the graphics, it gets sent back, sync and the frame finishes in time.
 
Last edited:

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
That's because you have chosen to not understand.

More compute work will stall graphics rendering for NV GPUs because they run serial, compute is bottlenecking graphics. It's such a simple concept and many people have tried to help you understand many times already. -_-

How can the "graphics rendering" stalling the pipeline when a GTX980TI is >50% faster with graphics workload? Do you even thing about this?! o_O

And how will "more compute workload" reduce the performance advantages of the GTX980TI over the GTX970 when the execution happens after the graphics rendering? For example 0,62ms + 0,62ms is still 38% less than 1ms + 1ms...
 
Last edited:

PhonakV30

Senior member
Oct 26, 2009
987
378
136
@sontin
More 60% Computes don't mean 60% more perf.It does have cost and that's
latency.heavy computes mean more latency.if you can't accept Fact, then leave it alone.