Ashes of the Singularity User Benchmarks Thread

Page 17 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Abwx

Lifer
Apr 2, 2011
11,837
4,790
136
I really just read that... didn't I...

You did read well...

I said that there are more competent people than me for GPUs, i.e, Zlatan, and that i wouldnt dare contradicting them like some people did, the very same people that try to contradict me when talking semiconductors technology even if they knows that they wont hold a candle in this kind of debate, hence my answer wich was specificaly tuned for a single person...
 

Abwx

Lifer
Apr 2, 2011
11,837
4,790
136
Thank you for clarifying this important difference because reading up on the articles it is confusing when they say they can go out-of-order with the ACEs, but they can't do that with the CP/graphics?

Still, that's quite unique because it means GCN's compute can indeed run in parallel (leapfrog/bypass traffic blocks), no context switch required through the ACEs since they are separate pipelines. But it can only do this for compute tasks, hence the limited out-of-order and why it excels for VR. Makes sense now.

ps. Everyone should educate themselves, if they have an interest in uarchs. We are all indeed laymen.

This was posted by Enigmoid about GCN 1.0 :

The CU front-end can decode and issue seven different types of instructions: branches, scalar ALU or memory, vector ALU, vector memory, local data share,
global data share or export, and special instructions. Only issue one instruction of each type can be issued at a time per SIMD, to avoid oversubscribing
the execution pipelines. To preserve in-order execution, each instruction must also come from a different wavefront; with 10 wavefronts for each SIMD, there
are typically many available to choose from. Beyond these two restrictions, any mix is allowed, giving the compiler plenty of freedom to issue instructions
for execution.

So it is said that in order exe is to be preserved, so that doesnt mean that it s in order but that this latter mode will provide the best efficency.

Yet it was used as "prove" that GCN 1.0 was totaly in order..
 
Feb 19, 2009
10,457
10
76
@Abwx

It makes sense now, GCN is in-order for the main graphics CP pipeline but out-of-order/parallel for ACEs (that was the confusion). Since ACEs only handle compute, it therefore does not need a context switch because that's its sole purpose.

This is why GCN is touted by devs as being great for VR, no latency added for async compute/shaders.

On Maxwell's example (info from nvidia's pdf), async compute if used in an un-optimized engine that has large chunks of draw calls, could cause latency & performance drops. It basically means NV needs to work with devs to ensure they optimize their game engines around their uarch better if they use async compute/shaders.

This is the most plausible explanation for the drop in performance in DX12 in Ashes: It's not optimized well for NV GPUs.
 

railven

Diamond Member
Mar 25, 2010
6,604
561
126
I mean I know it's not possible, but I feel a statement like that really should be bannable on a forum like this. Pretty much defeats the purpose. Again, obviously not possible but..... Lol.... Wow. He just made it to my ignore list at least.

I hope everyone else does the same and don't bait into him.

Back to my busy work schedule, but had to log in just to +1 this post.

Some people only seem to post to argue. I avoid most of the AMD CPU threads because of such toxic attitudes. Using the ignore option means I'd have to always be logged in :(
 

tential

Diamond Member
May 13, 2008
7,348
642
121
Back to my busy work schedule, but had to log in just to +1 this post.

Some people only seem to post to argue. I avoid most of the AMD CPU threads because of such toxic attitudes. Using the ignore option means I'd have to always be logged in :(

I just ignored, logged out, clicked remember me, logged in. I guess I'm permanently logged in now? Oh well.
 

Abwx

Lifer
Apr 2, 2011
11,837
4,790
136
@Abwx

It makes sense now, GCN is in-order for the main graphics CP pipeline but out-of-order/parallel for ACEs (that was the confusion). Since ACEs only handle compute, it therefore does not need a context switch because that's its sole purpose.

This is why GCN is touted by devs as being great for VR, no latency added for async compute/shaders.

On Maxwell's example (info from nvidia's pdf), async compute if used in an un-optimized engine that has large chunks of draw calls, could cause latency & performance drops. It basically means NV needs to work with devs to ensure they optimize their game engines around their uarch better if they use async compute/shaders.

This is the most plausible explanation for the drop in performance in DX12 in Ashes: It's not optimized well for NV GPUs.

As said i m not a specialist of GPUs, for the time i m reading through the doc posted here :

http://forums.anandtech.com/showpost.php?p=37655445&postcount=399

I got directly to the part that is in relation with what was posted about GCN 1.0 , these are the related chapters :

4.3 Work-Groups

4.4 Data Dependency Resolution
Edit : To know if what you bolded is 100% right a comparison of the respective white papers datas is necessary to check what one GPU can do and what it cant in respect of its competitor.
 
Last edited:

ocre

Golden Member
Dec 26, 2008
1,594
7
81
Thank you for clarifying this important difference because reading up on the articles it is confusing when they say they can go out-of-order with the ACEs, but they can't do that with the CP/graphics?

Still, that's quite unique because it means GCN's compute can indeed run in parallel (leapfrog/bypass traffic blocks), no context switch required through the ACEs since they are separate pipelines. But it can only do this for compute tasks, hence the limited out-of-order and why it excels for VR. Makes sense now.

ps. Everyone should educate themselves, if they have an interest in uarchs. We are all indeed laymen.

No. It seems you guys just want to take pieces of what zlatan says and run with it.

But he specifically addressed this in a reply quoting your claim that GCN has a VR advantage due to out of order execution. He corrected you once before.

So, here it is again.............
First of all, VR is a different kind of workload, and LiquidVR is a software solution built around Mantle, so most of the advantages are comes from the upgraded Mantle API, and not from the hardware. Even if another IHV will have the hardware for VR, they won't allowed to use Mantle and LiquidVR.

The in-order logic won't be a big issue in the first round. It can be an issue later, but there is a huge difference on what an API+hardware capable for, and how the devs use it. Most engines are not designed for D3D12, so in the first round the primary focus will be a new engine structure for the new APIs. In this aspect most devs will ensure some backward compatibility with D3D11, and in this case most multi-engine solutions won't use more than one compute command queue. This is more or less a safe way to start with D3D12.
NV just use a more limited hardware solution than AMD. Their hardwares are less stateless, and this means some synchronization strategies will be non-optimal for them. In worst-case scenarios it may harm the performance.

I think the multi-engine feature is one of the most useful thing in D3D12, but most of the time we talk about theories and not practice. With consoles and with Mantle an async shader solution is easy, because the program will target a single hardware or some very similar architectures. With D3D12 a multi-engine implementation must target a huge amount of very different architectures, and it is unknown that it will work or not. Things can get even worse with undocumented architectures, like all Geforces. Luckily D3D12 has a robust multi-engine solution where graphics is a superset of compute, and compute is a superset of the copy engine. This means the program can use the best engine for the actual pipeline, but the driver can execute it differently. For example a compute pipeline can be loaded in to the compute queue to execute it asynchronously with a graphics task with the correct synchronization. It may run faster, but may not, and there is a minimal chance that the async scheduling will affect the performance negatively. In this case the IHV can create a driver profile for the game, and the compute pipeline can be loaded to the graphics queue.

He may be a very talented PS4 dev and have great knowledge of GCN. I also trust he knows a lot about DX12 but this is not the API used on the PS4. Neither is Maxwell. So while he may know a lot on these subjects, i wouldnt take every last word as the gospel. What he supposes are not fact.

I appreciate his contribution though, I think it is priceless. But i am not sure that nvidia will be helpless when it comes to VR, I wouldnt underestimate their ability when it comes to SW talent. Right now, it is just too early to tell. There just isnt enough info, just hype. Honestly, I dont see VR moving fast enough that it even matters.

I expect nvidia will fix their performance issues in this DX12 Ashes benchmark. It only seems logical. But if they dont, that would really be interesting to me. It is still in early stages and has already caused a lot of noise on forums. Its hard to imagine that nvidia cant/wont do anything but i guess we will have to see how all this plays out
 

Magee_MC

Senior member
Jan 18, 2010
217
13
81
@Abwx

It makes sense now, GCN is in-order for the main graphics CP pipeline but out-of-order/parallel for ACEs (that was the confusion). Since ACEs only handle compute, it therefore does not need a context switch because that's its sole purpose.

This is why GCN is touted by devs as being great for VR, no latency added for async compute/shaders.

On Maxwell's example (info from nvidia's pdf), async compute if used in an un-optimized engine that has large chunks of draw calls, could cause latency & performance drops. It basically means NV needs to work with devs to ensure they optimize their game engines around their uarch better if they use async compute/shaders.

This is the most plausible explanation for the drop in performance in DX12 in Ashes: It's not optimized well for NV GPUs.

So, as I understand it so far, the two biggest hangups with NV's cards are the need to switch contexts from graphics/compute to pure compute and a potential delay when processing large batch calls.

If AoS is causing delays with NV GPUs because of large batch draw calls, it's possible to optimize for asynchronous compute on NV by making the batches smaller. What would be the effect of smaller draw call batches? Would it slow the game for AMD because of the increased number of batches, or would it increase the speed on NV cards with no effect on AMD cards?
 
Feb 19, 2009
10,457
10
76
So, as I understand it so far, the two biggest hangups with NV's cards are the need to switch contexts from graphics/compute to pure compute and a potential delay when processing large batch calls.

If AoS is causing delays with NV GPUs because of large batch draw calls, it's possible to optimize for asynchronous compute on NV by making the batches smaller. What would be the effect of smaller draw call batches? Would it slow the game for AMD because of the increased number of batches, or would it increase the speed on NV cards with no effect on AMD cards?

Yeah that sounds like what could be causing the problem in Ashes for NV because I relate it to what they say about their uarch overall (not just in the context of VR), but the facts are NV have said they cannot out of order switch context, it has to occur after the rendering is completed. It can get bottlenecked (stuck in traffic) if game engines don't take that into consideration, ie. "best usage scenarios" not adhered to.

Oxide CAN optimize for NV with that knowledge, they need to make more smaller draw calls so NV's uarch can process compute without long latency delays. How that affects GCN, no idea, and probably only Oxide & other devs experienced with GCN can tell us more info.

Basically, in this context with the info available, NV can fix their Ashes DX12 performance, but its gonna require Oxide to optimize their game engine for it. It's in Alpha, so by launch I expect it to look better for NV (no perf drops).
 
Last edited:

Magee_MC

Senior member
Jan 18, 2010
217
13
81
Oxide CAN optimize for NV with that knowledge, they need to make more smaller draw calls so NV's uarch can process compute without long latency delays. How that affects GCN, no idea, and probably only Oxide & other devs experienced with GCN can tell us more info.

Basically, in this context with the info available, NV can fix their Ashes DX12 performance, but its gonna require Oxide to optimize their game engine for it. It's in Alpha, so by launch I expect it to look better for NV (no perf drops).

If the large batch draw calls are part of the problem, and if (this is pure speculation) making the draw call batches smaller might have an adverse effect on AMD cards, I wonder if Oxide isn't in a bit of a catch 22.

They have said that they will implement improvements that help a vendor as long as it doesn't have an adverse effect on other vendors performance. If optimizing for NV cards architecture has an adverse efffect on AMD cards performance then they're they may not have many good options, unless they could specify the draw call batch size in the engine to optimize for whatever card is being used.

It's all pure guesswork, but since there is limited data, it's all that I have.
 

Abwx

Lifer
Apr 2, 2011
11,837
4,790
136
No. It seems you guys just want to take pieces of what zlatan says and run with it.

But he specifically addressed this in a reply quoting your claim that GCN has a VR advantage due to out of order execution. He corrected you once before.

So, here it is again.............


So you deem him competent when he correct some people, and indeed i can only agree but then :


He may be a very talented PS4 dev and have great knowledge of GCN. I also trust he knows a lot about DX12 but this is not the API used on the PS4. Neither is Maxwell. So while he may know a lot on these subjects, i wouldnt take every last word as the gospel. What he supposes are not fact.

So he is no more competent when his statements do not suit your opinion or do not fuel your own beliefs...
 

Enigmoid

Platinum Member
Sep 27, 2012
2,907
31
91
If the large batch draw calls are part of the problem, and if (this is pure speculation) making the draw call batches smaller might have an adverse effect on AMD cards, I wonder if Oxide isn't in a bit of a catch 22.

They have said that they will implement improvements that help a vendor as long as it doesn't have an adverse effect on other vendors performance. If optimizing for NV cards architecture has an adverse efffect on AMD cards performance then they're they may not have many good options, unless they could specify the draw call batch size in the engine to optimize for whatever card is being used.

It's all pure guesswork, but since there is limited data, it's all that I have.

Oxide have investigated on this with star swarm.

http://www.anandtech.com/show/8962/the-directx-12-performance-preview-amd-nvidia-star-swarm/6

Update: Oxide Games has emailed us this evening with a bit more detail about what's going on under the hood, and why Mantle batch submission times are higher. When working with large numbers of very small batches, Star Swarm is capable of throwing enough work at the GPU such that the GPU's command processor becomes the bottleneck. For this reason the Mantle path includes an optimization routine for small batches (OptimizeSmallBatch=1), which trades GPU power for CPU power, doing a second pass on the batches in the CPU to combine some of them before submitting them to the GPU. This bypasses the command processor bottleneck, but it increases the amount of work the CPU needs to do (though note that in AMD's case, it's still several times faster than DX11).

This feature is enabled by default in our build, and by combining those small batches this is the likely reason that the Mantle path holds a slight performance edge over the DX12 path on our AMD cards. The tradeoff is that in a 2 core configuration, the extra CPU workload from the optimization pass is just enough to cause Star Swarm to start bottlenecking at the CPU again. For the time being this is a user-adjustable feature in Star Swarm, and Oxide notes that in any shipping game the small batch feature would likely be turned off by default on slower CPUs.

71461.png


71462.png


Hurts CPU performance but helps with the framerate as it seems that even star swarm was capable of causing the 290X's command processor to choke.

Comparison to DX 11 and Nvidia.

71451.png
 
Feb 19, 2009
10,457
10
76
@Enigmoid
Nice find. So it seems AMD was aware of potential issues and have an optimized path to handle lots of small batches, which has a side-effect, can cause perf drop on 2 core CPUs.

71458.png


71459.png


Note the drop in perf on 2 core in Mantle, but not in DX12. So DX12 path is optimized better for handling draw call batches by default?
 
Last edited:

Azix

Golden Member
Apr 18, 2014
1,438
67
91
this was interesting as well. I had thought nvidias cards were running near full utilization already but guess not. dx12 drives up the 980s power consumption to near 290x levels in dx12.

71452.png


Just noticed they were still close in dx11. Not too significant a change. This could actually suggest under-utilization of the GPU
 
Last edited:

ocre

Golden Member
Dec 26, 2008
1,594
7
81
So you deem him competent when he correct some people, and indeed i can only agree but then :




So he is no more competent when his statements do not suit your opinion or do not fuel your own beliefs...

I didn't think what I posted was too complicated but i guess I should have bet on someone getting confused.

It will be okay as long as you don't twist my words.

this was interesting as well. I had thought nvidias cards were running near full utilization already but guess not. dx12 drives up the 980s power consumption to near 290x levels in dx12.

71452.png


Just noticed they were still close in dx11. Not too significant a change. This could actually suggest under-utilization of the GPU

The 980 uses 22% more power in dx12 while the 290x goes up 18%.

In dx11 the 290x uses 8% more power than the 980
In dx12 it uses 5% more.

Not really shocking or anything
 

VR Enthusiast

Member
Jul 5, 2015
133
1
0
Ok final set of benches updated for this build. What I would consider clean runs.

DX12 > DX11 on all benches. DX12 rips through everything that is thrown at it. Smooth all of the time. DX11 turns into a stutter fest the more it's pushed. As I've been saying, the next gen APIs give a superior user experience due to less FPS variance. It's not just about max fps anymore. Average FPS is higher because min fps is much higher.

Normal / Medium / Heavy Batches in % FPS increase
GPU bound +7, +21, +58%
CPU bound +242, +336, 408%

I would be really interested to know what the experience was like on Nvidia DX11. Even though the numbers look quite good I would bet on severe dips in frame rate combined with higher maximums. This is a bit like what we saw with Mantle having lower overall frame rates but a much smoother experience.
 

96Firebird

Diamond Member
Nov 8, 2010
5,738
334
126
Total system consumption... Power is expected to increase as the GPU is not as limited by the CPU in DX12 compared to DX11.
 

Despoiler

Golden Member
Nov 10, 2007
1,968
773
136
I would be really interested to know what the experience was like on Nvidia DX11. Even though the numbers look quite good I would bet on severe dips in frame rate combined with higher maximums. This is a bit like what we saw with Mantle having lower overall frame rates but a much smoother experience.

If you look at the benchmarks beyond the normal, medium, and high batch numbers you can see the individual scenes that comprise the benchmark. Take a look at the section called "shot high vista" which is more towards the bottom. DX11 is fine with low amounts of zoom ie less units on screen. It absolutely tanks when you zoom out to expose more units. It would go as low as 8.5 FPS on shot high vista where DX12 is double that. It's important to note that you need to zoom way out in AoTS as part of the game. The war and battles can get quite big. It is completely realistic what is being benchmarked from the context of this game. DX11 is extremely jarring from a user experience perspective. DX12 remains smooth throughout due to less variance in FPS and faster frametimes. I cannot say enough that the improved user experience of these nextgen APIs is the unsung property. It's absolutely noticeable. So much so that if dev houses don't have DX12 or Vulkan paths in their games I probably will not be buying them.
 

Despoiler

Golden Member
Nov 10, 2007
1,968
773
136
It looks like DX12 is off to a good initial showing for 2015.

Guess who got their Fable Legends Win10 closed beta invite? This guy.
 

monstercameron

Diamond Member
Feb 12, 2013
3,818
1
0
This was posted by Enigmoid about GCN 1.0 :



So it is said that in order exe is to be preserved, so that doesnt mean that it s in order but that this latter mode will provide the best efficency.

Yet it was used as "prove" that GCN 1.0 was totaly in order..

@zlatan has already noted that gcn and most modern gpus are in-order. which makes perfect sense.
 

railven

Diamond Member
Mar 25, 2010
6,604
561
126
I just logged into to my account to see I got a code. It does have an NDA, but I'm not sure how strict it is. I'll find out when I get home from work.

I guess when a new junior poster creates an account August 2015 with some tasty benchmarks, it won't be you. :sneaky:

Don't worry brah, we won't rat you out (unless your benchmarks give AMD a black eye, then I'd definitely be worried :p ).