Yeah, G80 upped the GPGPU in favor of gaming, we all remember how 8800GTX was lackluster in terms of gaming...
I think you misinterpreted his comment. What he meant was that NV's performance/watt suffered significantly after they scaled the GPU architecture to be more suitable for both the gaming and GPGPU markets. But now that they are over that hurdle, they can just focus on scaling the Fermi architecture and improving efficiency further since the transition to a brand new advanced GPGPU architecture has occurred.
On the other hand, AMD is only now going multi-purpose GPU route and we don't know how that will actually translate into performance/watt per transistor. It's not unreasonable to assume that what happened to NV is going to happen to AMD too (i.e., reduction in performance/transistor).
Let's say HD7970 could have had 2560 SPs if it simply went to 28nm and reused VLIW-4. It could be that because you now have 30-32 CUs, all that additional GPGPU complexity, you might only be able to fit 2048 SPs into the same die space. At the same time, it's not unreasonable to think that there is a chance that AMD's scalar architecture is more efficient than Fermi's.
Obviously, this is just random speculation on my part. Even once HD7970 launches, it will be difficult to assess how good it really is without having NV's 28nm GPU to compare (unless of course we get 75-100% more performance over 6970).
Still, we must not underestimate the die shrink. The 55nm HD 4870 packed 956m transistors into a die size of 263mm² while the 40nm HD 5870 had 2.15bn transistors -
2.25 times more - in a die area of 334mm², which is
only 27% bigger than the previous one.
Even if on a performance/transistor basis GCN is less efficient than VLIW-4, in terms of performance/watt (not per transistor), HD7970 can still be superior to the HD6970 simply because the shrink from 40nm to 28nm is so significant that they'll be able to fit more SP, TMU and ROP units regardless that some extra die space will be allocated for GPGPU aspects.