Then, last week, an email hit my inbox from Damien Triolet at Hardware.fr, one of the best GPU reviewers in the business. He offered a clear and concise explanation for these resultsand in the process, he politely pointed out why our numbers for GPU fill rates have essentially been wrong for a while. Damien graciously agreed to let me publish his explanation:
For a while, I've thought I should drop you an email about some pixel fillrate numbers you use in the peak rates tables for GPUs. Actually, most people got those numbers wrong as Nvidia is not crystal clear about those kind of details unless you ask very specifically.
The pixel fillrate can be linked to the number of ROPs for some GPUs, but its been limited elsewhere for years for many Nvidia GPUs. Basically there are 3 levels that might have a say at what the peak fillrate is :
The number of rasterizers
The number of SMs
The number of ROPs
On both Kepler and Maxwell each SM appears to use a 128-bit datapath to transfer pixels color data to the ROPs. Those appears to be converted from FP32 to the actual pixel format before being transferred to the ROPs. With classic INT8 rendering (32-bit per pixel) it means each SM has a throughput of 4 pixels/clock. With HDR FP16 (64-bit per pixel), each SM has a throughput of 2 pixels/clock.
On Kepler each rasterizer can output up to 8 pixels/clock. With Maxwell, the rate goes up to 16 pixels/clock (at least with the currently released Maxwell GPUs).
So the actual pixels/cycle peak rate when you look at all the limits (rasterizers/SMs/ROPs) would be :
GTX 750 : 16/16/16
GTX 750 Ti : 16/20/16
GTX 760 : 32/24/32 or 24/24/32 (as there are 2 die configuration options)
GTX 770 : 32/32/32
GTX 780 : 40/48/48 or 32/48/48 (as there are 2 die configuration options)
GTX 780 Ti : 40/60/48
GTX 970 : 64/52/64
GTX 980 : 64/64/64
Extra ROPs are still useful to get better efficiency with MSAA and so. But they dont participate in the peak pixel fillrate.
Thats in part what explains the significant fillrate delta between the GTX 980 and the GTX 970 (as you measured it in 3DMark Vantage). There is another reason which seem to be that unevenly configured GPCs are less efficient with huge triangles splitting (as its usually the case with fillrate tests).
So the GTX 970's peak potential pixel fill rate isn't as high as the GTX 980's, in spite of the fact that they share the same ROP count, because the key limitation resides elsewhere. When Nvidia hobbles the GTX 970 by disabling SMs, the effective pixel fill rate suffers.