computerbaseAshes of the Singularity Beta1 DirectX 12 Benchmarks

linkgoron · Feb 24, 2016

Spjut said:
Swedish site Nordichardware just posted a new performance test for Ashes of Singularity Beta 2

http://www.nordichardware.se/Grafik...12-och-multi-gpu/Prestandatester.html#content

It also includes tests for multi-GPU with Nvidia + AMD combos

The 380x basically matches the 970 in those DX12 tests... I assume that something is wrong with the nVidia setup, drivers or whatever, as I can't imagine how that's possible.

Tapoer · Feb 24, 2016

linkgoron said:
The 380x basically matches the 970 in those DX12 tests... I assume that something is wrong with the nVidia setup, drivers or whatever, as I can't imagine how that's possible.

It could be possible if Ashes of Singularity is using even more Asynchronous Computing than before, which Nvidia doesn't support in hardware.
EDIT: The 380x have the same peak teraflop performance than the 970, AC helps to archive it on the AMD side.

At least Nvidia performance in D3D12 is better than in D3D11.

Flapdrol1337 · Feb 24, 2016

linkgoron said:
The 380x basically matches the 970 in those DX12 tests... I assume that something is wrong with the nVidia setup, drivers or whatever, as I can't imagine how that's possible.

A 380x has more memory bandwidth and is a bigger chip than a 970 if you count how much the 970 is cut. It also uses more power.

Not impossible imo.

zlatan · Feb 24, 2016

linkgoron said:
The 380x basically matches the 970 in those DX12 tests... I assume that something is wrong with the nVidia setup, drivers or whatever, as I can't imagine how that's possible.

This is pretty normal in DX12 with well optimized standard multiengine code.

AtenRa · Feb 24, 2016

Flapdrol1337 said:
A 380x has more memory bandwidth and is a bigger chip than a 970 if you count how much the 970 is cut. It also uses more power.

Not impossible imo.

Actually Tonga die size is 366mm2 and GM204 is 398mm2.

Flapdrol1337 · Feb 24, 2016

AtenRa said:
Actually Tonga die size is 366mm2 and GM204 is 398mm2.

Flapdrol1337 said:
A 380x has more memory bandwidth and is a bigger chip than a 970 if you count how much the 970 is cut. It also uses more power.

Not impossible imo.

...........

AtenRa · Feb 24, 2016

Yes but R9 380X also not using all of its memory controllers, so its a little bit smaller than 366mm2 more closer to 350mm2.

I believe that if GTX 970 was a full die it would still be a little bigger than 350mm2.

But lets agree that they are both almost the same die size.

linkgoron · Feb 24, 2016

zlatan said:
This is pretty normal in DX12 with well optimized standard multiengine code.

This is quite a claim, that this is "pretty normal".

Are there any other DX12 engines that present this kind of behavior? Usually I see the 290X/390 getting better performance than the 970, but I would assume that the 380x basically *matching* the 970 would make a bit more buzz.

BTW is the link down? I get a 404 now.

Tapoer · Feb 24, 2016

linkgoron said:
This is quite a claim, that this is "pretty normal".

Are there any other DX12 engines that present this kind of behavior? Usually I see the 290X/390 getting better performance than the 970, but I would assume that the 380x basically *matching* the 970 would make a bit more buzz.

BTW is the link down? I get a 404 now.

Yes, it's giving error 404 for me too, but you can use google cache to see the original article.

Also weird is that the 980 TI is only 25% faster than the 980.

antihelten · Feb 24, 2016

Tapoer said:
...

Also weird is that the 980 TI is only 25% faster than the 980.

...

Why in the world is that weird? In general a reference 980 Ti is only 20-25% faster than a reference 980:

18% faster at 1080P:

25% faster at 4k:

Azix · Feb 24, 2016

Edit: with the results above we see fury x destroying the 980ti. Async is definitely in play. Definitely not just 5 fps but 10 in this case. Need more sites testing though.

Hi there. Lead Designer here.
Beta 2 of Ashes will make heavy use of Async compute.
On GPU X: We get 30fps with Async off and 35 with it on. So it's a pretty big deal.
However, on GPU Y: We're not seeing any difference but the vendor is stating that they will have a driver soon that will show a difference.

https://www.reddit.com/r/Ashesofthe...devs_async_compute_usage_verification/d0044bg

Seems legit.

30 to 35 is a big deal sort of, but expected more. We'll see what nvidia comes up with. Driver has been months in the making

Tapoer · Feb 24, 2016

antihelten said:
Why in the world is that weird? In general a reference 980 Ti is only 20-25% faster than a reference 980:

18% faster at 1080P:
25% faster at 4k:

Thanks, I had the notion that the TI was on average 30-35% faster...

zlatan · Feb 24, 2016

linkgoron said:
This is quite a claim, that this is "pretty normal".

Are there any other DX12 engines that present this kind of behavior? Usually I see the 290X/390 getting better performance than the 970, but I would assume that the 380x basically *matching* the 970 would make a bit more buzz.

BTW is the link down? I get a 404 now.

I don't know if there is another engine with well optimized multiengine code, but my own multiengine sample codes give nearly the same results. The reason is simple. Some architectures are better suited for the D3D12 multiengine design, while the other architectures structured for a single engine API designs like D3D11.

Don't worry Pascal will have better multiengine support. Nearly as good as GCN.

antihelten · Feb 24, 2016

Tapoer said:
Thanks, I had the notion that the TI was on average 30-35% faster...

The 980 Ti does seem to overclock a fair bit better than the 980, so OC vs. OC the 30-35% span might hold true (although this is of course always hard to say with any level of certainty given the variability of overclocking).

linkgoron · Feb 24, 2016

zlatan said:
I don't know if there is another engine with well optimized multiengine code, but my own multiengine sample codes give nearly the same results. The reason is simple. Some architectures are better suited for the D3D12 multiengine design, while the other architectures structured for a single engine API designs like D3D11.

Don't worry Pascal will have better multiengine support. Nearly as good as GCN.

I'm not worried, I have a R9 390, and the improved AMD results are quite encouraging.

However, the 380x giving similar results to a 970 is still quite surprising IMO. I expected the 970 to perform much better. However, the game is still in beta and we're still in the early days of DX12 - so I assume that we'll have to wait and see how drivers and further development change the performance of the cards.

Mahigan · Feb 24, 2016

sontin said:
No, "caring about" means that they do something for them.
They dont need a driver which is doing AS to improve performance over DX11. It is their job to optimize per hand for nVidia hardware. Otherwise a new driver wont do anything.

And without a NDA they can show the world how great AMD is - over and over and over again. This will be the third (or fourth) time that reviewers will use this "game" to showcase DX12. Even 9 months after the initial release nVidia cards dont take advantages of DX12.

I already explained it all to you in previous posts. I've been explaining this since last August.

GCN is under-utilized in DX11, Maxwell is also under-utilized.

The problem is that while GCN can achieve better utilization under DX12, Maxwell's lack of Asynchronous Compute + Graphics support makes it so it cannot achieve better compute utilization.

All in all though, GCN is at least 2 years ahead of Maxwell. GCN is a better architecture. This isn't fanboyism talking, it's an objective truth.

Glo. · Feb 24, 2016

sontin said:
Wow - this doesnt make any sense.
You dont optimize for DX12. You optimize to the hardware. That's the only reason why somebody would want to use a low level API.

Otherwise you let the driver and a high level API do the job. :\

If your hardware sports on HARDWARE LEVEL API - then there is no problem.

If your hardware is unable to execute features on hardware level of API - thats when you have to optimize it for hardware, vide: Asynchronous Compute Context Switching on Nvidia hardware, and specific code lines in application just for specific Vendor.

Simple as it can be.

sontin · Feb 24, 2016

Mahigan said:
I already explained it all to you in previous posts. I've been explaining this since last August.

GCN is under-utilized in DX11, Maxwell is also under-utilized.

The problem is that while GCN can achieve better utilization under DX12, Maxwell's lack of Asynchronous Compute + Graphics support makes it so it cannot achieve better compute utilization.

Neither Vulkan nor DX12 are just Async Compute over the previous one. They are whole new APIs which explizit control over the GPU (memory management, draw calls etc.). Without optimizing they are not better.

Glo. said:
If your hardware sports on HARDWARE LEVEL API - then there is no problem.

If your hardware is unable to execute features on hardware level of API - thats when you have to optimize it for hardware, vide: Asynchronous Compute Context Switching on Nvidia hardware, and specific code lines in application just for specific Vendor.

Simple as it can be.

Right Oxide has designed an engine which is not optimized for nVidia and they are just doing nothing but disabling certain code paths. Isnt this what i am writing?

Glo. · Feb 24, 2016

I am sorry, but I'm done. I do not have time, and will to explain everything again, and again to you.

You can go back to every post that Mahigan, Zlatan wrote on this topic on this forum. Thats the best way you will understand it.

sontin · Feb 24, 2016

You havent explained anything. In fact you wrote things which doesnt make sense in the context of these APIs. Like saying "write something for DX12 API".

You should go and read what CroTeam have written about Vulkan. They have made it clear that the developer is responsible for nearly everything and they have to know what to do.

The fact stands that this is the fourth version of the game which doesnt improve performance on nVidia hardware with DX12 over DX11. Obviously this developer doesnt care as much as they care about the paycheck from AMD to support their development.

Mahigan · Feb 24, 2016

sontin said:
Neither Vulkan nor DX12 are just Async Compute over the previous one. They are whole new APIs which explizit control over the GPU (memory management, draw calls etc.). Without optimizing they are not better.

Right Oxide has designed an engine which is not optimized for nVidia and they are just doing nothing but disabling certain code paths. Isnt this what i am writing?

DX12 allows support for Multi-threaded rendering, Multi-engine, multi-adapter etc

NVIDIA already made good use of multi-threaded command listing and deferred rendering under DX11. NVIDIA used hidden driver threads to boost CPU performance and thus lower their API overhead. This resulted in a higher draw call rate. This is achievable due to a few design perks, one is that since Kepler NVIDIA have been making use of Static scheduling. Static Scheduling means that a large part of your scheduling is done in software. NVIDIA multi-threaded their driver. So several Threads were feeding NVIDIAs hardware. Secondly the gigathread engine allows for larger batches of commands to be fed to NVIDIA GPUs. In essence, NVIDIA GPUs have a much larger command buffer. This is why you're under the illusion that Maxwell is superior to GCN, you never factored in the additional API overhead GCN entails under DX11 by its use of hardware scheduling.

Moving onto Vulcan and DX12, this API overhead has been lifted, giving us a glimpse of the true performance of GCN, Kepler and Maxwell.

On top of this GCN is a much more highly threaded architecture (more parallel). So GCN has a lot of untapped resources on hand. By using Asynchronous compute, you can maximize your applications usage of GCN.

When you couple both together, GCN overpowers Kepler and Maxwell. Why? Because GCN is a better architecture.

Maxwell may have the upperhand it terms of its front end, compute wise however... Maxwel isn't even close.

Paul98 · Feb 24, 2016

Results are as expected for removing the CPU overhead and using async compute. Also GCN has a good bit more compute power than Maxwell.

Hitman928 · Feb 24, 2016

Glo. said:
I am sorry, but I'm done. I do not have time, and will to explain everything again, and again to you.

You can go back to every post that Mahigan, Zlatan wrote on this topic on this forum. Thats the best way you will understand it.

Willful ignorance, it exists.

Dygaza · Feb 24, 2016

sontin said:
Where? This small gain in 1080p? :thumbsup:
The AMD card improves over 2x with DX12 over DX11. The nVidia hardware sits at 3,9% or so. Very realistic for a low level API which is optimized for nVidia hardware.

There isnt a difference in 4K. AMD on the other hand jumps 60% in 4K.

Here is screenshots from AMD DX11 vs DX12 with MSI-afterburner attached to them.

DX11:

DX12

Notice, this isn't the patch that gives more async compute. Do you understand now where majority of AMD performance over DX11 comes from?

Nvidia don't get this performance uplift since they are already feeding their gpu nearly to 100% capacity.

sontin · Feb 24, 2016

It is not about AMD. I'm talking about nVidia. This developer has stated that they need a low level API to create this engine and game. So where is the performance improvement on nVidia hardware? What happen to Star Swarm anyway?!

It is obvious that this is a marketing deal between them and AMD. Their beta versions have no NDA (unlike Fable Legends for example) and they are pushing them to reviewers every few months. Instead of optimizing their engine for nVidia user they are just using their "work" to get after them. You dont even hear anything about the game. It is only about DX12 and how great AMD is.

I dont even know why anybody with nVidia hardware should support this developer.

computerbaseAshes of the Singularity Beta1 DirectX 12 Benchmarks

Platinum Member

Member

Golden Member

Senior member

Lifer

Golden Member

Lifer

Platinum Member

Member

Golden Member

Golden Member

Member

Senior member

Golden Member

Platinum Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Diamond Member

Member

Diamond Member