Well for one thing the gains of async compute are dependent on how geometry limited you are ...
No, it doesnt. Or better: Maybe on AMD hardware.
If the Xbox One was able to see moderate gains with async compute with a ratio of 768 ALU ops per 1 rasterized triangle then a 1080 should see similar gains like a 290 would since they both have a ratio of 1280 ALU ops per 1 rasterized triangle ...
It doesnt work this way. Geometry performance is limited by the Compute Units on nVidia hardware. Every unit works on one vertics and can output 4 pixel per clock. Each rasterizer can output 16 pixel per clock. So when you look at GP104 it could output 80 pixel per clock because it has 20 compute units. But in the end it has only 64 ROPs and only four rasterizer.
So, to see any gains you need to be graphics limited and not geometry limited.
The fact that the 1080 is having trouble seeing definitive gains means that Nvidia still doesn't have proper hardware support for async compute ...
What is "proper hardware support"? Pascal supports it properly. You need a different workload on nVidia hardware to benefit from it in the same way.
You cant for example overshoot the GPU with compute workload beause there are only so many units on the GPU. With Pascal you can hide compute workload behind graphics but at some point the gains will vanished.