Ashes of the Singularity User Benchmarks Thread

Page 33 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Feb 19, 2009
10,457
10
76
So, asynchronous computing of compute workload is not "Asynchronous Computing"? Or using the copy and graphics engine? Or let the graphics queue wait for a result from the compute queue?

Asynchronous Compute is more than doing graphics and compute at the same time.

By your definition, Kepler can do Async Compute, because it too can process 32 queues of COMPUTE with Hyper-Q.

Why did NV say Kepler cannot do Async Compute? They even told Ryan Smith that Maxwell 2 is different to Kepler, it can process graphics + compute simultaneously.

At this point:

1. Oxide & AMD & the B3D Async Compute program is lying/wrong.
2. NV is lying.
 

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
I dont get it. How many times it is necessary to explain to you that Asynchronous Compute in DX12 has nothing to do with the ability to execute a graphics and compute queue in parallel.

You can use the copy and the graphics engine at the same time, too. This is Asynchronous Compute. Microsoft demonstrated Asynchronous Compute on a nVidia hardware with a simulation using the copy and compute engine.

BTW: Maxwell v2 gets faster in the programm. The Async Compute time is ~8% better.
 

TheELF

Diamond Member
Dec 22, 2012
4,029
753
126
Not sure how the CPU is related at all. This is about using idle gpu resources.




Clearly something is wrong with the program though. If AMD cards had a 20ms minimum processing time for every compute task they would never exceed 50 frames per second in games.

1.Async aside.
There are no idle resources if the cpu can keep the gpu fed at all times.
People saying that consoles get 30% ergo everybody will get 30% is just crazy,that's just like saying that every cpu and every gpu is exactly the same speed.
Nope,slow cpus with fast gpus will gain more just because fast cpus already max out fast gpus,or at least they get them closer to the limit.

2.I don't get where you get that 20ms stuff from,I did not see that in the example with the cars.

And I am not claiming that nvidia is twice as fast,but we have no idea what's really going on, how much % wise in ashes is compute or other stuff that gives amd that boost,we don't know anything really.
Devs get asked about PC and reply for consoles, there is no knowledge to be had in this whole mess.
 

iiiankiii

Senior member
Apr 4, 2008
759
47
91
I dont get it. How many times it is necessary to explain to you that Asynchronous Compute in DX12 has nothing to do with the ability to execute a graphics and compute queue in parallel.

You can use the copy and the graphics engine at the same time, too. This is Asynchronous Compute. Microsoft demonstrated Asynchronous Compute on a nVidia hardware with a simulation using the copy and compute engine.

BTW: Maxwell v2 gets faster in the programm. The Async Compute time is ~8% better.

So Nvidia can't execute graphics and compute queue in parallel?
 

dogen1

Senior member
Oct 14, 2014
739
40
91
1.Async aside.
There are no idle resources if the cpu can keep the gpu fed at all times.

2.I don't get where you get that 20ms stuff from,I did not see that in the example with the cars.
.

Not really. Something is always left idle.

Anyway. So Async lets you overlap compute over graphics. If your graphics operation is say ROP bound(which compute shaders don't use) or at least bound by something other than what the compute shader is bound by you should see a benefit.

Here's a more complete explanation.

https://forum.beyond3d.com/threads/...t-are-the-benefits.54891/page-14#post-1835425


The 20ms stuff is my reasoning as to why the test from beyond3d is not real world. What games use a compute shader that takes 20something ms to execute on any amd gpu?
 
Last edited:

TheELF

Diamond Member
Dec 22, 2012
4,029
753
126
Not really. Something is always left idle.
Sure, but as you can see with nvidia in dx11, nvidia's cards are very serial,at least as far as graphics+compute is concerned(and as far as we can even tell judging from this beta only) ,as long as you have a fast enough cpu you get 100% out of your card( 's graphics+compute )
*
Sure enough at the same time perhaps the copy unit isn't doing anything so you could use texture streaming at the same time and gain performance,some said that this is considered async compute as well.

But as a lot of amd fanboys stated in this thread, async has to be graphics+compute cause that's what amd is good in so argument closed on async.




*On amd you are right, no matter how fast your cpu is it has 8 ACEs and some of them will always be idle unless you draw a crap load of stuff on the screen at all times.
 

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
So Nvidia can't execute graphics and compute queue in parallel?

I dont know. The graphics + compute time is always slighty faster than both alone combined.

And i checked the load of the GPU. With compute alone the GPU runs at 90%+. So i guess there is only slighty room to improve this situation with an extra graphics queue.
 

dogen1

Senior member
Oct 14, 2014
739
40
91
Sure, but as you can see with nvidia in dx11, nvidia's cards are very serial,at least as far as graphics+compute is concerned(and as far as we can even tell judging from this beta only) ,as long as you have a fast enough cpu you get 100% out of your card( 's graphics+compute )
*
Sure enough at the same time perhaps the copy unit isn't doing anything so you could use texture streaming at the same time and gain performance,some said that this is considered async compute as well.

But as a lot of amd fanboys stated in this thread, async has to be graphics+compute cause that's what amd is good in so argument closed on async.




*On amd you are right, no matter how fast your cpu is it has 8 ACEs and some of them will always be idle unless you draw a crap load of stuff on the screen at all times.

Well, it depends on the architecture obviously. I think GCN was designed to have extra alu(as in ratio of alu: fixed funtion units) over what traditional(non compute) rendering typically uses.
 

Eymar

Golden Member
Aug 30, 2001
1,646
14
91
But as a lot of amd fanboys stated in this thread, async has to be graphics+compute cause that's what amd is good in so argument closed on async.

You don't have to be an amd fanboy to care about async compute and Nvidia's response to it. I have Titan X and interested to know if this will impact VR with regards to maxwell 2. If async compute is anything like CPU multithreading then async compute should help VR in responsiveness. If Nvidia can do async compute or work around it to get below minimum threshold for VR response time (20ms) then all good.
 

tential

Diamond Member
May 13, 2008
7,348
642
121
You don't have to be an amd fanboy to care about async compute and Nvidia's response to it. I have Titan X and interested to know if this will impact VR with regards to maxwell 2. If async compute is anything like CPU multithreading then async compute should help VR in responsiveness. If Nvidia can do async compute or work around it to get below minimum threshold for VR response time (20ms) then all good.
What games are coming out with vr that you'll play in 2016?
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
146
106
You don't have to be an amd fanboy to care about async compute and Nvidia's response to it. I have Titan X and interested to know if this will impact VR with regards to maxwell 2. If async compute is anything like CPU multithreading then async compute should help VR in responsiveness. If Nvidia can do async compute or work around it to get below minimum threshold for VR response time (20ms) then all good.

If 20ms is the borderline, then forget async on AMD in terms of VR.

But I doubt it will be related to VR.
 

TheELF

Diamond Member
Dec 22, 2012
4,029
753
126
If async compute is anything like CPU multithreading

Is it though? With hyperthreading you don't loose any speed if you only have one thread to run (if this one thread is fully optimized to use as much of the core as possible) ,having 8 separate ACEs means that you have to find 8 separate thing to run all at once or you will loose speed no matter how good a thread you have.

I have no idea whats important for VR but having 8 separate ACEs seems to make sense if you want the fastest response time,just as 8 cores in the fx make sense for servers for the same reason,it's just much more likely to have a unit that's not doing anything at the time, so it can react to an incoming request immediately.
 

Hitman928

Diamond Member
Apr 15, 2012
6,749
12,477
136
Didnt you see the time when async is used in this thread?

Yep, and it has already been explained multiple times that the absolute times are meaningless for this test, otherwise GCN cards would never be playable. The creator already said that there was a bug in the program causing the extra delay on AMD cards. The test is just to see if/how cards handle asynchronous compute.
 

Eymar

Golden Member
Aug 30, 2001
1,646
14
91
I have no idea whats important for VR but having 8 separate ACEs seems to make sense if you want the fastest response time,just as 8 cores in the fx make sense for servers for the same reason,it's just much more likely to have a unit that's not doing anything at the time, so it can react to an incoming request immediately.

That's my assumption where async shaders can help. Now if Nvidia can work around async compute to keep latencies low for VR then I have no real worries. I'm still waiting to see what the official word is from Nvidia as maybe Maxwell 2 does support async shaders.
 

tential

Diamond Member
May 13, 2008
7,348
642
121
I'm big on sims (racing, space, air), but mainly racing games so Project Cars, Dirt Rally.
Do those games support vr? I mean what games in 2016 will have vr support? Because I keep hearing people throw out vr, but I'm not hearing of games that have support for it
 

Skurge

Diamond Member
Aug 17, 2009
5,195
1
71
Do those games support vr? I mean what games in 2016 will have vr support? Because I keep hearing people throw out vr, but I'm not hearing of games that have support for it


Those games all support VR. Sims are not games you play for 100hrs and drop. You play them for years.
 

.vodka

Golden Member
Dec 5, 2014
1,203
1,538
136
It seems they've found out why the testing program runs weirdly on GCN:


https://forum.beyond3d.com/threads/dx12-performance-thread.57188/page-18#post-1869645

Post 348 said:
CasellasAbdala said: ↑
Now, all in all, how does this affect a gtx 980Ti? (Objectively speaking) Trying to decide ebtween fury X and this for longevity...
We have some evidence that 980Ti can't do async compute. Why not wait until there's a better variety of tests? Games should benefit substantially from D3D12 even without async compute.

Dygaza said: ↑
There's something really wrong with GCN alltogether in this test. Compute times are just horrible, and GPU usage is way too low (max 10% under compute). Well granted it's not benchmark made for pure performance.
I discovered a mistake I made earlier.

In this post:

DX12 performance thread

I said the loop is 8 cycles. This is radically wrong. It's actually 40 cycles. The new version of CodeXL makes this clear (though there's a whopper of a bug) because it indicates the timings of instructions and points at something I totally forgot: a single work item runs each SIMD at 1/4 throughput over time. Whereas on NVidia a single work item should run at full throughput over time, because the SIMD width matches the work-group width.

For a loop of 1,048,576 iterations, that's 40ms. It's amusing because it means that in the earlier test AMD couldn't drop below 40ms.

In the second test the loop iterates 524,288 times. That's 20ms. So now we get to some truth about this kernel, it runs vastly slower on AMD than on NVidia. OK, there's still 6 ms that I can't explain (which is as much time as GM200 spends), but I think we've almost cracked one of the mysteries for the bizarre slowness on AMD

Apart from that I can't help wondering if the [numthreads(1, 1, 1)] attribute of the kernel is making AMD do something additionally strange.


https://forum.beyond3d.com/threads/dx12-performance-thread.57188/page-18#post-1869657

Post 353 said:
Jawed said: ↑
I said the loop is 8 cycles. This is radically wrong. It's actually 40 cycles. The new version of CodeXL makes this clear (though there's a whopper of a bug) because it indicates the timings of instructions and points at something I totally forgot: a single work item runs each SIMD at 1/4 throughput over time. Whereas on NVidia a single work item should run at full throughput over time, because the SIMD width matches the work-group width.
I forgot about the 4x multiplier as well. The numbers didn't add up to something that felt intuitively right without that in place.

That explains the magnitude of the time increment, mostly. Overhead and maybe clock variance might explain part of the remainder.

Intra-batch timings should have more space to dispatch wavefronts within 20ms, unless dispatch is that high overhead.
I was running with a mental model that the batches were isolated in time, but the GPU could be cycling amongst them for fairness purposes


It had been stated that this is a test of capability, not performance, but at least we know know why it runs so bad on GCN, apart from showing async compute capabilities. So Mahigan was more or less right again, the test wasn't coded with GCN in mind (not that it matters for this particular case).
 

TheELF

Diamond Member
Dec 22, 2012
4,029
753
126
Do those games support vr?
Hmmm,do they even need support?
I mean as long as you have head tracking on your goggles and can bind this to your mouse which is used for free looking anyway I don't see any reason why it shouldn't work.
In sim games you are sitting in a seat anyway and there is no movement except for the vehicle.
 

TheELF

Diamond Member
Dec 22, 2012
4,029
753
126
So Mahigan was more or less right again, the test wasn't coded with GCN in mind (not that it matters for this particular case).

It's a better architecture if you push it and code for it. If you're only using light compute work, nVIDIAs architectures will be superior.
Yup, I guessed as much,if you can't find a way to push hundreds of units buzzing around the screen you won't get much out of GCN.

They made the same mistake with FX and learned nothing from it.
 

dogen1

Senior member
Oct 14, 2014
739
40
91
Yup, I guessed as much,if you can't find a way to push hundreds of units buzzing around the screen you won't get much out of GCN.

They made the same mistake with FX and learned nothing from it.

No, to all of the above. This post is nonsense lol.
 

Gikaseixas

Platinum Member
Jul 1, 2004
2,836
218
106
Games are bound to start using more compute power right? I think it is too soon to call GCN a mistake
 

Eymar

Golden Member
Aug 30, 2001
1,646
14
91
Those games all support VR. Sims are not games you play for 100hrs and drop. You play them for years.

Yup, this. GPU response times (motion to photon term I'm just learning about) probably not a big deal with racing games, but something like Star Citizen could induce nausea if framerate and response is not in ideal range. A VR version of Ashes\RTS game would be awesome (thinking realtime battlefield shown on 3d war room table).