Despite having superior or equal numbers on paper, the Asus Strix 970 couldn't come close to matching the GTX 980's delivered pixel throughput. I promptly raised an eyebrow upon seeing these results, but I didn't have time to investigate the issue any further.
Then, last week, an email hit my inbox from Damien Triolet at Hardware.fr, one of the best GPU reviewers in the business. He offered a clear and concise explanation for these results—and in the process, he politely pointed out why our numbers for GPU fill rates have essentially been wrong for a while. Damien graciously agreed to let me publish his explanation:
For a while, I've thought I should drop you an email about some pixel fillrate numbers you use in the peak rates tables for GPUs. Actually, most people got those numbers wrong as Nvidia is not crystal clear about those kind of details unless you ask very specifically.
The pixel fillrate can be linked to the number of ROPs for some GPUs, but it’s been limited elsewhere for years for many Nvidia GPUs. Basically there are 3 levels that might have a say at what the peak fillrate is :
The number of rasterizers
The number of SMs
The number of ROPs
On both Kepler and Maxwell each SM appears to use a 128-bit datapath to transfer pixels color data to the ROPs. Those appears to be converted from FP32 to the actual pixel format before being transferred to the ROPs. With classic INT8 rendering (32-bit per pixel) it means each SM has a throughput of 4 pixels/clock. With HDR FP16 (64-bit per pixel), each SM has a throughput of 2 pixels/clock.
On Kepler each rasterizer can output up to 8 pixels/clock. With Maxwell, the rate goes up to 16 pixels/clock (at least with the currently released Maxwell GPUs).
So the actual pixels/cycle peak rate when you look at all the limits (rasterizers/SMs/ROPs) would be :
GTX 750 : 16/16/16
GTX 750 Ti : 16/20/16
GTX 760 : 32/24/32 or 24/24/32 (as there are 2 die configuration options)
GTX 770 : 32/32/32
GTX 780 : 40/48/48 or 32/48/48 (as there are 2 die configuration options)
GTX 780 Ti : 40/60/48
GTX 970 : 64/52/64
GTX 980 : 64/64/64
Extra ROPs are still useful to get better efficiency with MSAA and so. But they don’t participate in the peak pixel fillrate.
That’s in part what explains the significant fillrate delta between the GTX 980 and the GTX 970 (as you measured it in 3DMark Vantage). There is another reason which seem to be that unevenly configured GPCs are less efficient with huge triangles splitting (as it’s usually the case with fillrate tests).