There is no Asynchronous Compute in Quantum Break, as I had stated...
The game appears to hammer GM20x's L2 cache leading to stuttering. There is no stuttering on AMD GPUs. Source:
http://www.tweaktown.com/guides/7655/quantum-break-pc-performance-analysis/index2.html
The L2 cache, judging by the massive amount of work in the render queue, is being hammered SM wise, with concurrent Warps spilling into L2 cache as well as ROp wise. Any L2 cache reserved for compute work takes away from ROP bandwidth and memory. This forces the ROps into hitting the memory controllers which are themselves not too efficient.
This is what I think is happening..
SM20x is limited to 16 concurrent warps per SM before overflowing into the L2 Cache and causing a pretty drastic performance drop.
Meaning that performance falls starting past 16 concurrent warps or 512 Threads per SM. So while SM20x has great compute performance on paper, this doesn't translate well once you push the architecture:
GCN, has enough L1 and local cache on tap, per CU, to push 40 concurrent wavefronts per CU. That's 2,560 threads executing at full speed.
In Quantum Break, the Volumetric Lighting shader used really pushes GM20x hard. This is why we see GM20x struggle to match Hawaii, let alone Fiji, in this title.
On paper, GM20x has more ROps but the performance of those ROps is directly tied to L2 Cache availability and bandwidth as well as the available memory controller bandwidth:
That's GM107 but with GM20x we have 16 ROps, up from 8, sharing 512KB of L2 Cache, down from 1MB, and tied to a 64-bit memory controller.
Therefore if a game is compute heavy and spills into L2 Cache then the ROp throughput is also affected which translates into more pressure on the Memory controllers which aren't very efficient to begin with:
Therefore as predicted back in the summer of last year, though I received a lot of hate, Hawaii/Grenada will sometimes match a reference GTX 980 Ti in upcoming titles. Fiji will beging to often surpass the GM200 behemoth.
This is largely due to GCNs more highly redundant memory/cache hierarchy (and the power usage that comes with it).