This has been par for the course with ATI/AMD for some time now. Nvidia performance at launch usually isn't all that different than with later driver revisions. Sure there are improvements but for the most part NV doesnt leave much untapped performance on the table. With AMD on the other hand it seems to take them a few driver revisions to get all the performance out of a new architecture. Nvidia has the pedal to the floor out of the gate, AMD needs a few laps to get the hang of the car.
That makes sense since Kepler is basically Fermi with no shader hot clocks but GCN was a brand new AMD architecture from the ground-up (kinda like G80 was for NV). This explains why HD7900 series had driver issues with DX9 games that 6900 didn't have, why Kepler was so fast in launch reviews and right now explains why NV hasn't been able to fix any of their performance issues in the titles where AMD leads, besides Shogun 2. Those issues are related to GK104 SKU - memory bandwidth, lack of DirectCompute performance.
This list is not short either: Alan Wake, Anno 2070, Arma II DayZ/Operations, Risen 2, Batman AC + MSAA, Skyrim + Mods + MSAA, Sleeping Dogs, Dirt Showdown, Sniper Elite V2, Battleforge, Metro 2033, Crysis 1/Warhead games, Bulletstorm, Serious Sam 3, etc. <--- NV is unlikely to regain performance in these games until they get to GTX780 and add more physical/functional units/bandwidth/compute.
I don't disagree with Kepler's current bandwidth issues. I'm just saying, as I have been for about the last eight weeks now, 8x MSAA is about the dumbest graphical improvement to enable per the performance hit and lack of noticeable image quality improvement.
If a game can run smoothly with 8xMSAA, why not turn it on? I get your point but with so many games being console ports, we almost have to raise resolution or pile on higher settings or we'll be CPU limited (Or well than a $240 7870 is fast enough most of the time).
What they tried to do in this article by testing GTX660Ti memory bandwidth limitations is by cranking AA as high as possible. They could have overclocked the memory too. The problem is MSAA is both ROP and memory bandwidth dependent. So really the reason 660Ti is so poor for MSAA are is not only memory bandwidth but those 24 ROPs. We realize that you may never play your games with 8xMSAA (especially at 2560x144) and I understand your preference. However, the article wanted to address memory bandwidth performance hit --> this is why they used 8xMSAA and it showed 660Ti has issues, admittedly it really has ROP/memory bandwidth limitations, not just memory bandwidth limitation.