Having a fast shader core can have a positive boost to shader intensive games/or even ALU intensive apps, but when the rest of the chip is crawling at a slow frequency then its obvious that the fast shader core gets bottlenecked by the rest of the chip (TMUs, ROPs, fillrates, etc etc) i.e downclocking your core clock will have a bigger impact on performance loss.
I dont know if you can say that there is a limit on number of shaders but depends on just how well the GPU can keep all its ALUs active. For instance R600/RV670 is a VLIW architecture using a shader setup similar to vec5. That alone can effect its performance since it requires more effort to keep the 320 ALUs busy, unlike nVIDIA's G80 architecture where theres no need for specific coding for the compiler or any relevant overhead for
each game due to its scalar nature. R600/RV670 is kind of a driver nightmare for software engineers since the games out now all have massively variant shader instructions making it hard for the R600 scheduler to do its job more efficently unless they go out of there way to optimize each specific title. Not to mention AA being done through the shaders on the R600/RV670. Thats why its hard to say that by increasing clock frequency is better ("brute force") than having more units, but rather its how they are "fed" or "utilized" inorder to keep all of its ALUs doing work. For G80, i think we are clearly seeing a bottleneck. Higher shader clock seems to result in diminishing returns which means there are other bottlenecks that are clearly affecting this negatively. One example is its triangle setup. But this architecture is almost 2 years old so i think its doing great.
Also i think that the whole higher ALU to TEX ratio "is the future" (kind of promoted by ATi and can be seen on their 3~4:1 ALU:TEX designs) may also be untrue and that texturing is still important even as of today. Something along the lines of a linear increase proportional to shader usage, but not exponentially as some many have led to believe. Imo texturing still important enough to create bottlenecks for GPUs that infact lack texturing performance which is clear with the R6x0 series architecture.
G80 doesn't seem to be affected by lower bandwidth, especially seen by the GTX/Ultra comparison. However it would be interesting to see some G9x results (on how memory clock i.e bandwidth affects its performance) since bandwidth seems to the most obvious limiting factor for G9x. Even the 512mb vram seems to be another bottleneck in 19x12 (and above) + AA/AF scenarios.
edit - just some food for thought.

;