CasellasAbdala said:
↑Now, all in all, how does this affect a gtx 980Ti? (Objectively speaking) Trying to decide ebtween fury X and this for longevity...
We have some evidence that 980Ti can't do async compute. Why not wait until there's a better variety of tests? Games should benefit substantially from D3D12 even without async compute.
Dygaza said:
↑There's something really wrong with GCN alltogether in this test. Compute times are just horrible, and GPU usage is way too low (max 10% under compute). Well granted it's not benchmark made for pure performance.
I discovered a mistake I made earlier.
In this post:
DX12 performance thread
I said the loop is 8 cycles.
This is radically wrong. It's actually 40 cycles. The new version of CodeXL makes this clear (though there's a whopper of a bug) because it indicates the timings of instructions and points at something I totally forgot: a single work item runs each SIMD at 1/4 throughput over time. Whereas on NVidia a single work item should run at full throughput over time, because the SIMD width matches the work-group width.
For a loop of 1,048,576 iterations, that's 40ms. It's amusing because it means that in the earlier test AMD couldn't drop below 40ms.
In the second test the loop iterates 524,288 times. That's 20ms. So now we get to some truth about this kernel, it runs vastly slower on AMD than on NVidia. OK, there's still 6 ms that I can't explain (which is as much time as GM200 spends), but I think we've almost cracked one of the mysteries for the bizarre slowness on AMD
Apart from that I can't help wondering if the [numthreads(1, 1, 1)] attribute of the kernel is making AMD do something additionally strange.