RS you spent a lot of time analyzing Sniper Elite V2 bottlenecks yet you neglected to look at last generation cards. 6970 is faster than GTX580 so I don't think compute is a bottleneck. The most glaring advantage of 6970 over GTX580 is its texturing performance, peak GFLOPS is also appreciably higher. So I would guess that those things are very important in Sniper Elite V2. Static scheduler doesn't seem to matter.
You can only somewhat accurately compare theoretical across the same GPU architectures (like Fermi/Kepler or across all HD7000 cards). You should be aware that in many of those cases, you have to be very careful about comparing theoreticals because different GPU architectures have different efficiencies. Even if you are comparing the same GPU architecture, you have to still be cognizant of specific SKU bottlenecks (HD7950 vs. HD7870 benchmarks and HD7950 OC vs. HD7870 OC expose an ROP bottleneck that opens up in HD7950 but HD7870 doesn't scale as well with overclocking). Theoreticals may or may not translate to real world performance and other bottlenecks in that specific SKU or its architecture could make theoreticals worthless (HD6970 vs. HD7970 pixel performance in the real world is world's apart, and yet their theoretical pixel fillrate is nearly identical).
Comparing AMD to NV on theoretical metrics alone can lead to all kinds of incorrect conclusions. You have to be careful before correlating theoreticals and direct gaming performance increase. It's been shown many times you cannot do this (GTX280 vs. GTX470, HD4890 vs. HD5770, etc.). Too many examples to list.
HD6970 vs. GTX580:
71% higher theoretical GLOPs
71% higher theoretical texture fillrate
HD6970 is beating GTX580 in Sniper Elite V2 by only 3% at 1200P and just 8% at 1600P. Those theoretical advantages do not translate into real world meaningful ones.
If theoretical GFlops and memory bandwidth were so important, HD6970 would be crushing the 580. It's not.
If GTX580 was so significantly held back by theoretical textures or Gflops, then it would have showed up as an
exponential increase for GTX680 because Fermi and Kepler are basically the same architecture with minor tweaks in Kepler. NV increased texture fillrate in GTX680 by
2.6x and GTX680 has 2.05x more Gflops over GTX580. GTX680 is only 28% faster than GTX580 at 1600P. That means the bottleneck in Fermi/Kepler must be something else (memory bandwidth and/or compute).
Finally if theoretical GLOPs and memory bandwidth mattered in this game for AMD cards, HD6970 would have beaten the 7870.
HD6970 vs. HD7870
5% higher theoretical Glops
14% higher memory bandwidth.
HD6970
loses.
HD7970GE vs. HD6970
59% higher theoretical Gflops
59% higher texture performance
63% higher memory bandwidth
HD7970GE is winning by
69%.
Eliminating possibilities, since HD6970 lost to HD7870 despite more memory bandwidth and higher Gflops and GTX680 only netted 28% increase in performance over the 580 despite more than doubling of texture fillrate and Gflops, while HD7970GE puts out faster performance than any theoretical differences can explain between it and the 6970, an educated guess is Compute Shader performance is the biggest factor between GCN and everyone else in Sniper Elite V2.
It's about recognizing that different games expose different bottlenecks in different GPU architectures and different SKUs. Different GPU architectures have differ net strenths. Compute Shader games expose a bottleneck in GK104. This is a trend in games that use Compute Shaders and in
benchmarks that tests Compute GPGPU of GK104. Pixel shader heavy games expose that bottleneck in Tahiti XT and Tessellation heavy games exposed it in Cypress/Cayman.
GK104 loses in to Tahiti XT
every single game that has heavy Compute Shaders (Sniper Elite V2, Dirt Showdown, Hitman Absolution, Sleeping Dogs).
I don't know why it's so hard for some to accept that Tahiti XT is superior for DirectCompute. The evidence is everywhere. Last time no one had any problems seeing the correlation in games with Tessellation when comparing Fermi vs. Cypress/Cayman. This time using the exact same methods of elimination, we arrive at DirectCompute as a key differentiator in favour of GCN. There is another evidence - AMD says so themselves! On the AMD blog, they go into details where DirectCompute was used to accelerate certain graphical effects (like HDAO/AO, post-processing, global illumination, contact hardening shadows); and furthermore if you guys read up on
VLIW-4 vs. GCN architecture, all those deficiencies of traditional VLIW / SIMD architectures show up in DirectCompute.
So here is the deal. The more DirectCompute there is, the more HD7970 will crush non-DirectCompute architectures. The more DirectCompute becomes a huge bottleneck, the more slower architectures will match closer together because they'll be running into Compute bottleneck; and the more GCN parts will separate themselves from the group. All these advantages of texture fillrate and Glops will be meaningless because the limiting factor will become the Compute Shader performance. Dirt Showdown shows this really well because it's even more Compute heavy than SE V2 is.
The DirectCompute-based GCN architecture is mopping the floor with HD6970 and GTX680. Dirt Showdown's contact hardening shadows and global illumination are
very Compute heavy.
The greatest single theoretical difference between HD7970GE and GTX680/6970 is memory bandwidth - a
50% advantage over 680 and a
64% increase over the 6970. Yet, HD7970 GE is more than doubling HD6970's performance and is 87% higher FPS than a stock 680!
Since theoretical memory bandwidth and GFLOPs alone cannot explain this Dirt Showdown result, the ONLY logical explanation is GCN Tahiti XT architecture
crushes GK104/VLIW Cayman in DirectCompute / Compute Shader performance. This is obviously by design since neither Kepler GK104 or VLIW-4/5 was designed to excel in DirectCompute. And really even if GK110 can match an HD7970GE in this benchmark, it's going to need a 51% larger die size to do this (550mm2 vs. 365mm2). That still means that GCN Tahiti XT is more efficient than Kepler is for DirectCompute on a per mm2 basis.
It's a mirror situation of when NV caught AMD with its pants down with tessellation last 2 generations.
Thus far GTX680 lost to HD7970GE in last 4/4 compute games. Bioshock Infinite will likely be the 5th game.