Ashes of the Singularity User Benchmarks Thread

Silverforce11 · Sep 2, 2015

sontin said:
So, asynchronous computing of compute workload is not "Asynchronous Computing"? Or using the copy and graphics engine? Or let the graphics queue wait for a result from the compute queue?

Asynchronous Compute is more than doing graphics and compute at the same time.

By your definition, Kepler can do Async Compute, because it too can process 32 queues of COMPUTE with Hyper-Q.

Why did NV say Kepler cannot do Async Compute? They even told Ryan Smith that Maxwell 2 is different to Kepler, it can process graphics + compute simultaneously.

At this point:

1. Oxide & AMD & the B3D Async Compute program is lying/wrong.
2. NV is lying.

sontin · Sep 2, 2015

I dont get it. How many times it is necessary to explain to you that Asynchronous Compute in DX12 has nothing to do with the ability to execute a graphics and compute queue in parallel.

You can use the copy and the graphics engine at the same time, too. This is Asynchronous Compute. Microsoft demonstrated Asynchronous Compute on a nVidia hardware with a simulation using the copy and compute engine.

BTW: Maxwell v2 gets faster in the programm. The Async Compute time is ~8% better.

TheELF · Sep 2, 2015

dogen1 said:
Not sure how the CPU is related at all. This is about using idle gpu resources.

Clearly something is wrong with the program though. If AMD cards had a 20ms minimum processing time for every compute task they would never exceed 50 frames per second in games.

1.Async aside.
There are no idle resources if the cpu can keep the gpu fed at all times.
People saying that consoles get 30% ergo everybody will get 30% is just crazy,that's just like saying that every cpu and every gpu is exactly the same speed.
Nope,slow cpus with fast gpus will gain more just because fast cpus already max out fast gpus,or at least they get them closer to the limit.

2.I don't get where you get that 20ms stuff from,I did not see that in the example with the cars.

And I am not claiming that nvidia is twice as fast,but we have no idea what's really going on, how much % wise in ashes is compute or other stuff that gives amd that boost,we don't know anything really.
Devs get asked about PC and reply for consoles, there is no knowledge to be had in this whole mess.

iiiankiii · Sep 2, 2015

sontin said:
I dont get it. How many times it is necessary to explain to you that Asynchronous Compute in DX12 has nothing to do with the ability to execute a graphics and compute queue in parallel.

You can use the copy and the graphics engine at the same time, too. This is Asynchronous Compute. Microsoft demonstrated Asynchronous Compute on a nVidia hardware with a simulation using the copy and compute engine.

BTW: Maxwell v2 gets faster in the programm. The Async Compute time is ~8% better.

So Nvidia can't execute graphics and compute queue in parallel?

dogen1 · Sep 2, 2015

TheELF said:
1.Async aside.
There are no idle resources if the cpu can keep the gpu fed at all times.

2.I don't get where you get that 20ms stuff from,I did not see that in the example with the cars.
.

Not really. Something is always left idle.

Anyway. So Async lets you overlap compute over graphics. If your graphics operation is say ROP bound(which compute shaders don't use) or at least bound by something other than what the compute shader is bound by you should see a benefit.

Here's a more complete explanation.

https://forum.beyond3d.com/threads/...t-are-the-benefits.54891/page-14#post-1835425

The 20ms stuff is my reasoning as to why the test from beyond3d is not real world. What games use a compute shader that takes 20something ms to execute on any amd gpu?

TheELF · Sep 2, 2015

dogen1 said:
Not really. Something is always left idle.

Sure, but as you can see with nvidia in dx11, nvidia's cards are very serial,at least as far as graphics+compute is concerned(and as far as we can even tell judging from this beta only) ,as long as you have a fast enough cpu you get 100% out of your card( 's graphics+compute )
*
Sure enough at the same time perhaps the copy unit isn't doing anything so you could use texture streaming at the same time and gain performance,some said that this is considered async compute as well.

But as a lot of amd fanboys stated in this thread, async has to be graphics+compute cause that's what amd is good in so argument closed on async.

*On amd you are right, no matter how fast your cpu is it has 8 ACEs and some of them will always be idle unless you draw a crap load of stuff on the screen at all times.

sontin · Sep 2, 2015

iiiankiii said:
So Nvidia can't execute graphics and compute queue in parallel?

I dont know. The graphics + compute time is always slighty faster than both alone combined.

And i checked the load of the GPU. With compute alone the GPU runs at 90%+. So i guess there is only slighty room to improve this situation with an extra graphics queue.

dogen1 · Sep 2, 2015

TheELF said:
Sure, but as you can see with nvidia in dx11, nvidia's cards are very serial,at least as far as graphics+compute is concerned(and as far as we can even tell judging from this beta only) ,as long as you have a fast enough cpu you get 100% out of your card( 's graphics+compute )
*
Sure enough at the same time perhaps the copy unit isn't doing anything so you could use texture streaming at the same time and gain performance,some said that this is considered async compute as well.

But as a lot of amd fanboys stated in this thread, async has to be graphics+compute cause that's what amd is good in so argument closed on async.

*On amd you are right, no matter how fast your cpu is it has 8 ACEs and some of them will always be idle unless you draw a crap load of stuff on the screen at all times.

Well, it depends on the architecture obviously. I think GCN was designed to have extra alu(as in ratio of alu: fixed funtion units) over what traditional(non compute) rendering typically uses.

Eymar · Sep 2, 2015

TheELF said:
But as a lot of amd fanboys stated in this thread, async has to be graphics+compute cause that's what amd is good in so argument closed on async.

You don't have to be an amd fanboy to care about async compute and Nvidia's response to it. I have Titan X and interested to know if this will impact VR with regards to maxwell 2. If async compute is anything like CPU multithreading then async compute should help VR in responsiveness. If Nvidia can do async compute or work around it to get below minimum threshold for VR response time (20ms) then all good.

tential · Sep 2, 2015

Eymar said:
You don't have to be an amd fanboy to care about async compute and Nvidia's response to it. I have Titan X and interested to know if this will impact VR with regards to maxwell 2. If async compute is anything like CPU multithreading then async compute should help VR in responsiveness. If Nvidia can do async compute or work around it to get below minimum threshold for VR response time (20ms) then all good.

What games are coming out with vr that you'll play in 2016?

ShintaiDK · Sep 2, 2015

Eymar said:
You don't have to be an amd fanboy to care about async compute and Nvidia's response to it. I have Titan X and interested to know if this will impact VR with regards to maxwell 2. If async compute is anything like CPU multithreading then async compute should help VR in responsiveness. If Nvidia can do async compute or work around it to get below minimum threshold for VR response time (20ms) then all good.

If 20ms is the borderline, then forget async on AMD in terms of VR.

But I doubt it will be related to VR.

TheELF · Sep 2, 2015

Eymar said:
If async compute is anything like CPU multithreading

Is it though? With hyperthreading you don't loose any speed if you only have one thread to run (if this one thread is fully optimized to use as much of the core as possible) ,having 8 separate ACEs means that you have to find 8 separate thing to run all at once or you will loose speed no matter how good a thread you have.

I have no idea whats important for VR but having 8 separate ACEs seems to make sense if you want the fastest response time,just as 8 cores in the fx make sense for servers for the same reason,it's just much more likely to have a unit that's not doing anything at the time, so it can react to an incoming request immediately.

Hitman928 · Sep 2, 2015

ShintaiDK said:
If 20ms is the borderline, then forget async on AMD in terms of VR.

But I doubt it will be related to VR.

Why do you say that?

Eymar · Sep 2, 2015

tential said:
What games are coming out with vr that you'll play in 2016?

I'm big on sims (racing, space, air), but mainly racing games so Project Cars, Dirt Rally.

ShintaiDK · Sep 2, 2015

Hitman928 said:
Why do you say that?

Didnt you see the time when async is used in this thread?

Hitman928 · Sep 2, 2015

ShintaiDK said:
Didnt you see the time when async is used in this thread?

Yep, and it has already been explained multiple times that the absolute times are meaningless for this test, otherwise GCN cards would never be playable. The creator already said that there was a bug in the program causing the extra delay on AMD cards. The test is just to see if/how cards handle asynchronous compute.

Eymar · Sep 2, 2015

TheELF said:
I have no idea whats important for VR but having 8 separate ACEs seems to make sense if you want the fastest response time,just as 8 cores in the fx make sense for servers for the same reason,it's just much more likely to have a unit that's not doing anything at the time, so it can react to an incoming request immediately.

That's my assumption where async shaders can help. Now if Nvidia can work around async compute to keep latencies low for VR then I have no real worries. I'm still waiting to see what the official word is from Nvidia as maybe Maxwell 2 does support async shaders.

tential · Sep 2, 2015

Eymar said:
I'm big on sims (racing, space, air), but mainly racing games so Project Cars, Dirt Rally.

Do those games support vr? I mean what games in 2016 will have vr support? Because I keep hearing people throw out vr, but I'm not hearing of games that have support for it

Skurge · Sep 2, 2015

tential said:
Do those games support vr? I mean what games in 2016 will have vr support? Because I keep hearing people throw out vr, but I'm not hearing of games that have support for it

Those games all support VR. Sims are not games you play for 100hrs and drop. You play them for years.

.vodka · Sep 2, 2015

It seems they've found out why the testing program runs weirdly on GCN:

https://forum.beyond3d.com/threads/dx12-performance-thread.57188/page-18#post-1869645

Post 348 said:
CasellasAbdala said: ↑
Now, all in all, how does this affect a gtx 980Ti? (Objectively speaking) Trying to decide ebtween fury X and this for longevity...

We have some evidence that 980Ti can't do async compute. Why not wait until there's a better variety of tests? Games should benefit substantially from D3D12 even without async compute.

Dygaza said: ↑
There's something really wrong with GCN alltogether in this test. Compute times are just horrible, and GPU usage is way too low (max 10% under compute). Well granted it's not benchmark made for pure performance.

I discovered a mistake I made earlier.

In this post:

DX12 performance thread

I said the loop is 8 cycles. This is radically wrong. It's actually 40 cycles. The new version of CodeXL makes this clear (though there's a whopper of a bug) because it indicates the timings of instructions and points at something I totally forgot: a single work item runs each SIMD at 1/4 throughput over time. Whereas on NVidia a single work item should run at full throughput over time, because the SIMD width matches the work-group width.

For a loop of 1,048,576 iterations, that's 40ms. It's amusing because it means that in the earlier test AMD couldn't drop below 40ms.

In the second test the loop iterates 524,288 times. That's 20ms. So now we get to some truth about this kernel, it runs vastly slower on AMD than on NVidia. OK, there's still 6 ms that I can't explain (which is as much time as GM200 spends), but I think we've almost cracked one of the mysteries for the bizarre slowness on AMD

Apart from that I can't help wondering if the [numthreads(1, 1, 1)] attribute of the kernel is making AMD do something additionally strange.

https://forum.beyond3d.com/threads/dx12-performance-thread.57188/page-18#post-1869657

Post 353 said:
Jawed said: ↑
I said the loop is 8 cycles. This is radically wrong. It's actually 40 cycles. The new version of CodeXL makes this clear (though there's a whopper of a bug) because it indicates the timings of instructions and points at something I totally forgot: a single work item runs each SIMD at 1/4 throughput over time. Whereas on NVidia a single work item should run at full throughput over time, because the SIMD width matches the work-group width.

I forgot about the 4x multiplier as well. The numbers didn't add up to something that felt intuitively right without that in place.

That explains the magnitude of the time increment, mostly. Overhead and maybe clock variance might explain part of the remainder.

Intra-batch timings should have more space to dispatch wavefronts within 20ms, unless dispatch is that high overhead.
I was running with a mental model that the batches were isolated in time, but the GPU could be cycling amongst them for fairness purposes

It had been stated that this is a test of capability, not performance, but at least we know know why it runs so bad on GCN, apart from showing async compute capabilities. So Mahigan was more or less right again, the test wasn't coded with GCN in mind (not that it matters for this particular case).

TheELF · Sep 2, 2015

tential said:
Do those games support vr?

Hmmm,do they even need support?
I mean as long as you have head tracking on your goggles and can bind this to your mouse which is used for free looking anyway I don't see any reason why it shouldn't work.
In sim games you are sitting in a seat anyway and there is no movement except for the vehicle.

TheELF · Sep 2, 2015

.vodka said:
So Mahigan was more or less right again, the test wasn't coded with GCN in mind (not that it matters for this particular case).

It's a better architecture if you push it and code for it. If you're only using light compute work, nVIDIAs architectures will be superior.

Yup, I guessed as much,if you can't find a way to push hundreds of units buzzing around the screen you won't get much out of GCN.

They made the same mistake with FX and learned nothing from it.

dogen1 · Sep 2, 2015

TheELF said:
Yup, I guessed as much,if you can't find a way to push hundreds of units buzzing around the screen you won't get much out of GCN.

They made the same mistake with FX and learned nothing from it.

No, to all of the above. This post is nonsense lol.

Gikaseixas · Sep 2, 2015

Games are bound to start using more compute power right? I think it is too soon to call GCN a mistake

Eymar · Sep 2, 2015

Skurge said:
Those games all support VR. Sims are not games you play for 100hrs and drop. You play them for years.

Yup, this. GPU response times (motion to photon term I'm just learning about) probably not a big deal with racing games, but something like Star Citizen could induce nausea if framerate and response is not in ideal range. A VR version of Ashes\RTS game would be awesome (thinking realtime battlefield shown on 3d war room table).

Ashes of the Singularity User Benchmarks Thread

Lifer

Diamond Member

Diamond Member

Senior member

Senior member

Diamond Member

Diamond Member

Senior member

Golden Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Golden Member

Lifer

Diamond Member

Golden Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Senior member

Platinum Member

Golden Member