do gpus ever get faster clock for clock on the same amount of cores? if anything they seem to get slower. a 336sp gtx460 was no faster clock for clock than the 240sp gtx285. yes I know there are more things involved such as memory bandwidth and such.
We often just look at shaders because it's the easiest # to remember/compare. As a result, it looks like a 336 SP card is barely faster than a 240 SP card. In reality, shaders are no more important than ROPs or TMUs are. The GPU needs to be well balanced or at any one point it can have "too much" of 1 thing and "too little" of another. Classic examples are bucket loads of memory bandwidth in 4870/4890 despite barely faster performance over HD5770 with nearly half the memory bandwidth. Another example is 5830 being way faster on paper than HD4890 was, but in real world, being completely ROP starved. And then we have HD6970 with 71% more texture fill-rate than a GTX580, but it can't really use that advantage because it's bottlenecked somewhere else.
In fact,
ROPs can also be critical. When HD5830 cut the # of ROPs from 32 in 5870 to 16, its performance
plummeted to barely faster than HD4890 despite having 1120 shaders @ 800mhz vs. 800 shaders @ 850 for the 4890. And despite having 56 TMUs and as much memory bandwidth, the 5830 was barely faster than the 4890. The GPU became almost entirely ROP starved.
GTX460 vs. GTX285
Pixel fill-rate (# ROPs x clocks) = 21600 MPixels/sec vs. 20736 MPixels/sec
Texture fill-rate (# TMUs x clocks) = 37800 MTexels/sec vs.
51840 MTexels/sec
Memory bandwidth = 115.2 GB/sec vs.
158.976 GB/sec
On paper, GTX285 should crush the GTX460 without question. And yet, GTX460 is pretty much as fast!
Clearly, Fermi is far more efficient than GT200b was. Or another way to look at it, GT200b had way too much memory bandwidth and texture performance than it could actually utilize. Comparing shaders across same generation makes a lot of sense (such as GTX460 vs. 470 vs. 570 vs. 580, etc.). But comparing shaders with previous generations can often lead to misleading / erroneous results since performance in ROPs and TMUs can have a dramatic impact on performance.
Then there are small improvements that often go unnoticed (because most games might not benefit from them) even in the same generation. GF110 improved FP16 performance. Some games benefited well from this such as
Dirt 2 where GTX570 is 8% faster than a GTX480. Looking at specs of GTX480 vs. 570, there is no way the GTX570 should be leading by 8%.
Similarly, removing shaders may not necessarily reduce performance. HD6870 compensates with much higher clock speeds vs. the 5850 despite far fewer shaders.