About the misconception of "compute" in games

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Lepton87

Platinum Member
Jul 28, 2009
2,544
9
81
RS you spent a lot of time analyzing Sniper Elite V2 bottlenecks yet you neglected to look at last generation cards. 6970 is faster than GTX580 so I don't think compute is a bottleneck. The most glaring advantage of 6970 over GTX580 is its texturing performance, peak GFLOPS is also appreciably higher. So I would guess that those things are very important in Sniper Elite V2. Static scheduler doesn't seem to matter.
 

RussianSensation

Elite Member
Sep 5, 2003
19,458
765
126
RS you spent a lot of time analyzing Sniper Elite V2 bottlenecks yet you neglected to look at last generation cards. 6970 is faster than GTX580 so I don't think compute is a bottleneck. The most glaring advantage of 6970 over GTX580 is its texturing performance, peak GFLOPS is also appreciably higher. So I would guess that those things are very important in Sniper Elite V2. Static scheduler doesn't seem to matter.

You can only somewhat accurately compare theoretical across the same GPU architectures (like Fermi/Kepler or across all HD7000 cards). You should be aware that in many of those cases, you have to be very careful about comparing theoreticals because different GPU architectures have different efficiencies. Even if you are comparing the same GPU architecture, you have to still be cognizant of specific SKU bottlenecks (HD7950 vs. HD7870 benchmarks and HD7950 OC vs. HD7870 OC expose an ROP bottleneck that opens up in HD7950 but HD7870 doesn't scale as well with overclocking). Theoreticals may or may not translate to real world performance and other bottlenecks in that specific SKU or its architecture could make theoreticals worthless (HD6970 vs. HD7970 pixel performance in the real world is world's apart, and yet their theoretical pixel fillrate is nearly identical).

Comparing AMD to NV on theoretical metrics alone can lead to all kinds of incorrect conclusions. You have to be careful before correlating theoreticals and direct gaming performance increase. It's been shown many times you cannot do this (GTX280 vs. GTX470, HD4890 vs. HD5770, etc.). Too many examples to list.

HD6970 vs. GTX580:
71% higher theoretical GLOPs
71% higher theoretical texture fillrate
HD6970 is beating GTX580 in Sniper Elite V2 by only 3% at 1200P and just 8% at 1600P. Those theoretical advantages do not translate into real world meaningful ones.

If theoretical GFlops and memory bandwidth were so important, HD6970 would be crushing the 580. It's not.

If GTX580 was so significantly held back by theoretical textures or Gflops, then it would have showed up as an exponential increase for GTX680 because Fermi and Kepler are basically the same architecture with minor tweaks in Kepler. NV increased texture fillrate in GTX680 by 2.6x and GTX680 has 2.05x more Gflops over GTX580. GTX680 is only 28% faster than GTX580 at 1600P. That means the bottleneck in Fermi/Kepler must be something else (memory bandwidth and/or compute).

Finally if theoretical GLOPs and memory bandwidth mattered in this game for AMD cards, HD6970 would have beaten the 7870.

HD6970 vs. HD7870
5% higher theoretical Glops
14% higher memory bandwidth.
HD6970 loses.

HD7970GE vs. HD6970
59% higher theoretical Gflops
59% higher texture performance
63% higher memory bandwidth
HD7970GE is winning by 69%.

Eliminating possibilities, since HD6970 lost to HD7870 despite more memory bandwidth and higher Gflops and GTX680 only netted 28% increase in performance over the 580 despite more than doubling of texture fillrate and Gflops, while HD7970GE puts out faster performance than any theoretical differences can explain between it and the 6970, an educated guess is Compute Shader performance is the biggest factor between GCN and everyone else in Sniper Elite V2.

It's about recognizing that different games expose different bottlenecks in different GPU architectures and different SKUs. Different GPU architectures have differ net strenths. Compute Shader games expose a bottleneck in GK104. This is a trend in games that use Compute Shaders and in benchmarks that tests Compute GPGPU of GK104. Pixel shader heavy games expose that bottleneck in Tahiti XT and Tessellation heavy games exposed it in Cypress/Cayman.

GK104 loses in to Tahiti XT every single game that has heavy Compute Shaders (Sniper Elite V2, Dirt Showdown, Hitman Absolution, Sleeping Dogs).

I don't know why it's so hard for some to accept that Tahiti XT is superior for DirectCompute. The evidence is everywhere. Last time no one had any problems seeing the correlation in games with Tessellation when comparing Fermi vs. Cypress/Cayman. This time using the exact same methods of elimination, we arrive at DirectCompute as a key differentiator in favour of GCN. There is another evidence - AMD says so themselves! On the AMD blog, they go into details where DirectCompute was used to accelerate certain graphical effects (like HDAO/AO, post-processing, global illumination, contact hardening shadows); and furthermore if you guys read up on VLIW-4 vs. GCN architecture, all those deficiencies of traditional VLIW / SIMD architectures show up in DirectCompute.

So here is the deal. The more DirectCompute there is, the more HD7970 will crush non-DirectCompute architectures. The more DirectCompute becomes a huge bottleneck, the more slower architectures will match closer together because they'll be running into Compute bottleneck; and the more GCN parts will separate themselves from the group. All these advantages of texture fillrate and Glops will be meaningless because the limiting factor will become the Compute Shader performance. Dirt Showdown shows this really well because it's even more Compute heavy than SE V2 is.

dirt-fps.gif


The DirectCompute-based GCN architecture is mopping the floor with HD6970 and GTX680. Dirt Showdown's contact hardening shadows and global illumination are very Compute heavy.

The greatest single theoretical difference between HD7970GE and GTX680/6970 is memory bandwidth - a 50% advantage over 680 and a 64% increase over the 6970. Yet, HD7970 GE is more than doubling HD6970's performance and is 87% higher FPS than a stock 680!

Since theoretical memory bandwidth and GFLOPs alone cannot explain this Dirt Showdown result, the ONLY logical explanation is GCN Tahiti XT architecture crushes GK104/VLIW Cayman in DirectCompute / Compute Shader performance. This is obviously by design since neither Kepler GK104 or VLIW-4/5 was designed to excel in DirectCompute. And really even if GK110 can match an HD7970GE in this benchmark, it's going to need a 51% larger die size to do this (550mm2 vs. 365mm2). That still means that GCN Tahiti XT is more efficient than Kepler is for DirectCompute on a per mm2 basis.

It's a mirror situation of when NV caught AMD with its pants down with tessellation last 2 generations.

Thus far GTX680 lost to HD7970GE in last 4/4 compute games. Bioshock Infinite will likely be the 5th game.
 
Last edited:

boxleitnerb

Platinum Member
Nov 1, 2011
2,605
6
81
Sorry I should have explained it better:

HD7970GE @ 1180mhz vs. HD7950 800mhz
Pixel fillrate advantage = 47.5%
Texture fillrate advantage = 68.5%
Gflops advantage = 68.5%
Memory bandwidth advantage = 20%

It's putting down 45.4% higher FPS than HD7950 in Sniper Elite V2. The most likely limiting factor in that game after Compute Shader performance is probably pixel fillrate

Counter example:
The 7970GE has 8% more pixel fillrate than the 7870 LE but performs 40+% faster in Sniper Elite V2 (and Metro 2033, and Sleeping Dogs). Pixel fillrate definitely is not the issue.
Texel fillrate is tied to SP FLOPs with all GCN cards, so you really cannot separate the two.

If memory bandwidth was the main bottleneck for HD7970GE in Sniper Elite V2, it would haven't put down 45% more performance. It'd be limited to 20%. However, you could argue HD7950 has too much memory bandwidth since its ROP limited.

I never said "main" bottleneck. Please read more carefully. I said, that at a certain point memory bandwidth could begin to bottleneck (again, there are relative bottlenecks, too).

...for GCN architecture (what a surprise, the long talked about 32 ROP limitation of HD7970 that keeps rearing its head: HD7950 can't pull away from HD7870 by much at stock 800mhz GPU clocks in most games because it's ROP limited! HD7870 has more pixel fillrate performance over HD7950 at stock, and HD7950's 55% memory bandwidth advantage is mostly wasted until you open up the ROP bottleneck --> which is also why HD7950 screams at 1100-1200mhz overclocks).

Again a hasty conclusion. Don't forget that the 7950 also has only 12% more SP GFLOPs than the 7870 which increases with OC as well. The 7870 LE has almost 16% higher pixel fillrate then the 7950, but practically the same SP GFLOPs and less bandwidth. And it is slower than the 7950 all the time, especially in Sniper Elite V2, Sleeping Dogs, Metro 2033 etc.

Pixel fillrate is not so important as you make it out to be.

I still think Dirt Showdown is a very bad example due to AMDs direct involvement in writing/developing the renderer. You can always find ways to make a game run exceptionally good on your and exceptionally bad on your competitors hardware.

And didn't you say before, because the 7870 LE really isn't any better than the 670/680 in (according to you) heavy-compute titles:
My educated guess is most games are not mostly Compute Shader limited.
Now you're touting compute again for Sniper Elite 2, Sleeping Dogs etc.
I'm confused...you cannot have it both ways.

Look at the 7870 LE and the 7970 GE:
+11% pixel fillrate
+50% bandwidth
+44% SP GFLOPs
+44% texel fillrate

7970 GE is 37% faster on average in 1600p and up to 50% faster in those "compute-heavy" titles.

Only memory bandwidth and SP GFLOPs can explain the massive performance delta between the two. So why could that not also be true for GTX680 vs 7970 GE? Instead you claim, GK104 is bad for compute but neglect the possibility that it's actually because it has less resources to work with in the first place.
Texel fillrate could be a factor, but I find that highly unlikely since the massive advantage there didn't help AMD in the pre-Kelper era and despite a draw in texel fillrate between 680 and 7970 GE the 7970 GE can pull ahead significantly at times. There is just no precedent to logically conclude that texel fillrate is any serious bottleneck for both architectures.
 
Last edited:

Lepton87

Platinum Member
Jul 28, 2009
2,544
9
81
RS do you think Fermi is really so bad at compute shaders that even Cayman wipes the floor with it? In your link 6970 is over 60% faster than GTX570 in Dirt Showdown yet those cards were thought to have pretty much the same performance. Also it's almost on par with GTX680. That's very strange, if anything Fermi should excel at compute compared to those two.
One example in real world app.
after-effects-benchmark.jpg



What's also strange is that GTX680 is way faster than GTX570 in that game than it is on average. GTX680 on average is about 45% faster than GTX570 yet in Dirt Showdown it's over 75% faster.
 
Last edited: