computerbaseAshes of the Singularity Beta1 DirectX 12 Benchmarks

Shivansps · Feb 25, 2016

Bacon1 said:
Yep, also unlike most benchmarks, this is actually playing the game and all the AI are dynamically fighting and doing different things, so each run is different.

So it cant be trusted as a benchmark....

Shivansps · Feb 25, 2016

Mahigan said:
It's up to 20%, and that depends on how pervasive its usage is:

This is strange, why you gain more on higher resolutions where you have less idle time on graphic hardware and less on 1080p where you are more cpu limited?

Silverforce11 · Feb 25, 2016

Mahigan said:
FCAT is broken so the Guru3D results are wrong: http://www.extremetech.com/extreme/223654-instrument-error-amd-fcat-and-ashes-of-the-singularity

Basically AMD support WDM instead of DirectFlip. It reduces tearing and image anomolies. WDM is new to DX12. NVIDIA is still using the older technology.

I was thinking that could be it, because FCAT was a tool designed for DX11 API, pre-DX12.

And now that explains why all other review sites, including ones that record the benchmark show smooth performance all round. Trust in the data they said! Eyes do it best.

First, some basics. FCAT is a system NVIDIA pioneered that can be used to record, playback, and analyze the output that a game sends to the display. This captures a game at a different point than FRAPS does, and it offers fine-grained analysis of the entire captured session. Guru3D argues that FCATs results are intrinsically correct because Where we measure with FCAT is definitive though, its what your eyes will see and observe. Guru3D is wrong. FCAT records output data, but its analysis of that data is based on assumptions it makes about the output assumptions that arent accurate in this case.

AMDs driver follows Microsofts recommendations for DX12 and composites using the Desktop Windows Manager to increase smoothness and reduce tearing. FCAT, in contrast, assumes that the GPU is using DirectFlip. According to Oxide, the problem is that FCAT assumes so-called intermediate frames make it into the data stream and depends on these frames for its data analysis. If V-Sync is turned off differently than FCAT expects, the FCAT tools cannot properly analyze the final output. The applications accuracy is only as reliable as its assumptions, after all.

An Oxide representative told us that the only real negative from AMDs switch to DWM compositing from DirectFlip s that it throws off FCAT.

In this case, AMD is using Microsofts recommended compositing method, not the method that FCAT supports, and the result is an FCAT graph that makes AMDs performance look terrible. It isnt. From an end-users perspective, compositing through DWM eliminates tearing in windowed mode and may reduce it in fullscreen mode as well when V-Sync is disabled.

Silverforce11 · Feb 25, 2016

Shivansps said:
So it cant be trusted as a benchmark....

Maybe averages over multiple runs, will reduce the element of randomness.

In many ways, this is the SAME as reviewers benching their playthrough, running and fighting through the level. It's never the same. Even worse for online games.

A good review site will tell you this beforehand and tell you that they run multiple time and get the average result of each run.

There's this debate, whether a scripted benchmark is representative of actual gameplay, and whether that opens the doors to IHV to optimize the specific benchmark. Whereas a gameplay based test, they would have to optimize for the game.

Dygaza · Feb 25, 2016

Shivansps said:
So it cant be trusted as a benchmark....

If you have average fps for total benchmark variating between runs around 1 fps when hitting 80 fps. Yes it can. Heck, you get fps variation between runs even on synthetic tests.

sontin · Feb 25, 2016

Abwx said:
Taking account of the frequencies scaling under DX12 is roughly 95%, 33% more perfs for 40% more shaders and 5% lower frequencies while bandwith is not even scaled by 40%...

So what is your point..?.

That it should overscale..??

A GTX980TI has 61% more compute/texture/pixel/geometry performance and over 70% more bandwidth.

The GTX980TI is not gpu bound in this setting.

BTW: Fury X is only 32% faster than 390 in the Hitmam Beta, too: http://pclab.pl/art68473-7.html

The same difference you see on the Celeron in Ashes.

Mahigan · Feb 25, 2016

sontin said:
Those settings are not "heavily GPU-Bound":

http://pclab.pl/art67995-15.html

With DX11 the GTX980TI is 10% faster, with DX12 only 27% faster than the GTX970. This card schould be ~47-50% better.

There doesnt exists any reason why nVidia cards dont see an improvement on a 2 core CPU. The DX11 optimization needs at least 4 threads:

http://www.anandtech.com/show/8962/the-directx-12-performance-preview-amd-nvidia-star-swarm/4

http://www.anandtech.com/show/9112/exploring-dx12-3dmark-api-overhead-feature-test/3

Not if it's hitting a compute wall. That being said, NVIDIA will optimize their drivers further (hopefully this time they won't be removing effects).

3.5 Tflops vs 5.63 Tflops = 38% more compute. But then we have the fact that it's executing those compute jobs sequentially under DX11.

So you likely have a lower compute utilization on the GTX 980 Ti than you do the GTX 970. If only NVIDIA supported Async Compute + Graphics eh?

We have a similar situation we had when AMD hadn't yet activated the Async compute portion of Fiji's HWSs when Ashes was in Alpha. A 390x was matching Fiji's performance.

Silverforce11 · Feb 25, 2016

Mahigan said:
Not if it's hitting a compute wall. That being said, NVIDIA will optimize their drivers further (hopefully this time they won't be removing effects).

3.5 Tflops vs 5.63 Tflops = 38% more compute. But then we have the fact that it's executing those compute jobs sequentially under DX11.

So you likely have a lower compute utilization on the GTX 980 Ti than you do the GTX 970. If only NVIDIA supported Async Compute eh?

We have a similar situation we had when AMD hadn't yet activated the Async compute portion of Fiji's HWSs when Ashes was in Alpha. A 390x was matching Fiji's performance.

Bingo.

There's some bottlenecks there in a serial engine, and if compute is pushing through and causing a traffic jam for graphics.. you get less performance scaling. This is exactly what happened to Fiji pre-Beta 2 with AC off. To the T exact, and in Beta 2 with AC on, some of the compute tasks are now offloaded to the ACEs, graphics flow freely, boom, Fury X performance jumps massively ahead of the 390X.

IllogicalGlory · Feb 25, 2016

sontin said:
There doesnt exists any reason why nVidia cards dont see an improvement on a 2 core CPU. The DX11 optimization needs at least 4 threads:

What do you mean there's no improvement? I see the 980 Ti improving by 13%. The others are GPU-bound because the settings are fully maxed with 4x MSAA. Obviously the game is heavily GPU bound because the results are the same on OCed 6700K. The difference on those settings between an FX-4300 DX11 and that 6700K OC is a mere 30% and anything above an i3-6100 has no effect. There's no difference between any of the CPUs for the GTX 970.

GPUs display strange behavior across the board in this game; increasing resolution doesn't affect performance normally. This is probably due to its unconventional object space rendering pipeline in some way. I don't pretend to be any kind of expert on that.

sontin · Feb 25, 2016

Mahigan said:
Not if it's hitting a compute wall. That being said, NVIDIA will optimize their drivers further (hopefully this time they won't be removing effects).

3.5 Tflops vs 5.63 Tflops = 38% more compute. But then we have the fact that it's executing those compute jobs sequentially under DX11.

So you likely have a lower compute utilization on the GTX 980 Ti than you do the GTX 970. If only NVIDIA supported Async Compute + Graphics eh?

We have a similar situation we had when AMD hadn't yet activated the Async compute portion of Fiji's HWSs when Ashes was in Alpha. A 390x was matching Fiji's performance.

What you just wrote doesnt make any sense. There doesnt exist a "compute wall" with graphics card. There doesnt even exist any logical pipeline with compute operations at all. Compute workload is highly scalable on these architectures.

BTW: 5,63 is 60% more than 3,5. :\

Silverforce11 · Feb 25, 2016

sontin said:
There doesnt exist a "compute wall" with graphics card. There doesnt even exist any logical pipeline with compute operations at all. Compute workload is highly scalable on these architectures.

Sure is, when it's pure compute mode like CUDA. As soon as graphics is in the engine, nope, no compute for you! Compute? You can wait while graphics is done first.

Hopefully Pascal fixes this.

sontin · Feb 25, 2016

Do you know that a GTX980TI has 60% more shader/geometry/pixel performance, 50% more rasterizing performance and 70% more bandwidth than the GTX970? :|

Thats the reason why this card is for example ~50% faster in Hitman...

Glo. · Feb 25, 2016

Has anyone seen this on FCAT on Ashes?
http://www.extremetech.com/extreme/223654-instrument-error-amd-fcat-and-ashes-of-the-singularity

So it is more problem with FRAPS than AMD Hardware?

Ashes of the Singularity measures its own frame variance in a manner similar to FRAPS; we extracted that information for both the GTX 980 Ti and the R9 Fury X. The graph above shows two video cards that perform identically — AMD’s frame times are slightly lower because AMD’s frame rate is slightly higher. There are no other significant differences. That’s what the benchmark “feels” like when viewed in person. The FCAT graph above suggests incredible levels of microstutter that simply don’t exist when playing the game or viewing the benchmark.

Mahigan · Feb 25, 2016

sontin said:
What you just wrote doesnt make any sense. There doesnt exist a "compute wall" with graphics card. There doesnt even exist any logical pipeline with compute operations at all. Compute workload is highly scalable on these architectures.

BTW: 5,63 is 60% more than 3,5. :\

Theoretically yes, but not in practice. The more compute cores, the more resource sharing and the more latency is introduced. A GTX 980s individual SIMDs are faster than those of a GTX 980 Ti and a 980 Ti is faster than a TitanX. So yeah a GTX 970 has faster individual SIMDs than a GTX 980 Ti. So if you're executing large batches of compute work, that 60% theoretical advantage drops. Let me show you..

That's SIMD latency. A GTX 980 Ti's SIMDs are 14.5% slower than a GTX 980s. You see NVIDIA Maxwell has more cores sharing less resources than AMD GCN. Maxwell has less hardware redundancy than GCN (meaning less non-shared units). That's why Maxwell consumes less power, it has less hardware on tap.

Ashes of the Singularity has a lot of compute work. Several lights, smoke and post processing effects going on. So a 60% theoretical Tflops advantage will translate into less under real world conditions.

Abwx · Feb 25, 2016

Glo. said:
Has anyone seen this on FCAT on Ashes?
http://www.extremetech.com/extreme/223654-instrument-error-amd-fcat-and-ashes-of-the-singularity

So it is more problem with FRAPS than AMD Hardware?

With FCAT...

http://translate.google.com/transla...chnitt_frametimes_zeigen_weitere_unterschiede

Glo. · Feb 25, 2016

Silverforce11 said:
It was discussed above already. FCAT, an NV created tool for DX11... is not compatible with DX12! lol

I would say "typical Nvidia" but I am scared of consequences.

Silverforce11 · Feb 25, 2016

Glo. said:
Has anyone seen this on FCAT on Ashes?
http://www.extremetech.com/extreme/223654-instrument-error-amd-fcat-and-ashes-of-the-singularity

So it is more problem with FRAPS than AMD Hardware?

Abwx said:
With FCAT...

http://translate.google.com/transla...chnitt_frametimes_zeigen_weitere_unterschiede

It looks like DX12 is much smoother than DX11 for both the Fury X AND the 980Ti.

Bacon1 · Feb 25, 2016

Abwx said:
With FCAT...

http://translate.google.com/transla...chnitt_frametimes_zeigen_weitere_unterschiede

Although the benchmark 2 of Ashes of the Singularity is equipped with its own FCAT overlay, only made by the game itself log files are used for frametimes diagrams. Because the FCAT overlay produces incorrect results and Nvidia has FCAT is not adjusted for the DirectX 12 API.

Pardon the crappy translate, but they are saying they didn't use Nvidia's FCAT because it doesn't work, so they used the log files from the benchmark itself which are accurate.

sontin · Feb 25, 2016

Mahigan said:
Theoretically yes, but not in practice. The more compute cores, the more resource sharing and the more latency is introduced. A GTX 980s individual SIMDs are faster than those of a GTX 980 Ti and a 980 Ti is faster than a TitanX. So yeah a GTX 970 has faster individual SIMDs than a GTX 980 Ti. So if you're executing large batches of compute work, that 60% theoretical advantage drops. Let me show you..

That's SIMD latency. A GTX 980 Ti's SIMDs are 14.5% slower than a GTX 980s. You see NVIDIA Maxwell has more cores sharing less resources than AMD GCN. Maxwell has less hardware redundancy than GCN (meaning less non-shared units). That's why Maxwell consumes less power, it has less hardware on tap.

GPUs need thousends of threads to hide the latency. Using less threads to fill the GPU is just an unoptimized workload.

Ashes of the Singularity has a lot of compute work. Several lights, smoke and post processing effects going on. So a 60% theoretical Tflops advantage will translate into less under real world conditions.

Seriously, what have you just written? More work will result in less advantages? That doesnt make any sense. 60% advantages will result in 60% more performance. It is the nature of a compute archtecture.

Silverforce11 · Feb 25, 2016

sontin said:
Seriously, what have you just written? More work will result in less advantages? That doesnt make any sense. 60% advantages will result in 60% more performance. It is the nature of a compute archtecture.

That's because you have chosen to not understand.

More compute work will stall graphics rendering for NV GPUs because they run serial, compute is bottlenecking graphics. It's such a simple concept and many people have tried to help you understand many times already. -_-

Now, if compute was running alone, like in CUDA, it's great.

Maybe one day, NV will make Async Compute work with Kepler/Maxwell in their drivers, and you will see better scaling.

Glo. · Feb 25, 2016

Silverforce11 said:
That's because you have chosen to not understand.

More compute work will stall graphics rendering for NV GPUs because they run serial, compute is bottlenecking graphics. It's such a simple concept and many people have tried to help you understand many times already. -_-

Now, if compute was running alone, like in CUDA, it's great.

Maybe one day, NV will make Async Compute work with Kepler/Maxwell in their drivers, and you will see better scaling.

They will not. To make it truly Asynchronous you need at least TWO Asynchronous engines. What Nvidia can do to deal with this problem is to put on their hardware at least 2 ACE's and one hardware scheduler.

Will they do it? Time will tell.

AnandThenMan · Feb 25, 2016

Silverforce11 said:
Why would the chief editor/CEO of a tech site be such outright bias and hostile to a tech company? Contrary, he praises NV for releasing game ready drivers, NV has actually released game ready drivers for Ashes ever since ALPHA (when it was unavailable for the public to play).

It's rare to see a tech journalist so thoroughly put their foot in their mouth. And one of his responses was "cool story bro" I don't even...

Bacon1 said:
Its ok, he also called Rise of the Tomb Raider an AMD title...

I hope that was a momentary lapse and he actually knows who sponsored said title. Anyway this game is going gold quite soon very much looking forward to it. :thumbsup:

Silverforce11 · Feb 25, 2016

Glo. said:
They will not. To make it truly Asynchronous you need at least TWO Asynchronous engines. What Nvidia can do to deal with this problem is to put on their hardware at least 2 ACE's and one hardware scheduler.

Will they do it? Time will tell.

What about the theory they can enable compute via drivers by getting the other CPU threads (idled ones) to process the compute.

My opinion of that is it will fail hard, due to latency issues and sync issues, by sending compute task via the bus to the CPU and back again.

But potentially, with a good scheduler and pre-emption, they could in theory offload a long running compute task as soon as the frame started, and it's processed on the CPU while the GPU does the graphics, it gets sent back, sync and the frame finishes in time.

sontin · Feb 25, 2016

Silverforce11 said:
That's because you have chosen to not understand.

More compute work will stall graphics rendering for NV GPUs because they run serial, compute is bottlenecking graphics. It's such a simple concept and many people have tried to help you understand many times already. -_-

How can the "graphics rendering" stalling the pipeline when a GTX980TI is >50% faster with graphics workload? Do you even thing about this?!

And how will "more compute workload" reduce the performance advantages of the GTX980TI over the GTX970 when the execution happens after the graphics rendering? For example 0,62ms + 0,62ms is still 38% less than 1ms + 1ms...

PhonakV30 · Feb 25, 2016

@sontin
More 60% Computes don't mean 60% more perf.It does have cost and that's
latency.heavy computes mean more latency.if you can't accept Fact, then leave it alone.

computerbaseAshes of the Singularity Beta1 DirectX 12 Benchmarks

Diamond Member

Diamond Member

Lifer

Lifer

Member

Diamond Member

Senior member

Lifer

Senior member

Diamond Member

Lifer

Diamond Member

Diamond Member

Senior member

Lifer

Diamond Member

Lifer

Diamond Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Lifer

Diamond Member

Senior member