So they basically accomplished what AMD had with the 5xxx series, then AMD got similar gains going from 58xx to 69xx as NVidia got going 4xx to 5xx.
TSMC is the culprit.
agreeing that TSMC dropped the ball here. amd had a more robust plan for the 69xx series but had to salvage what they had and make a 32nm blueprint fit on a 40nm node.
1950XTX was on a completely different architecture than the 2900XT. along with all of the video features that was introduced with the new architecture and DX10, the 2900XT and derivative 3xxx series brought unified shaders that could easily be made more robust by adding more shaders later on in design.
the 38xx series was a refinement, a 'tock' process for those familiar with intel's strategy. the 48xx series saw a move from 320 to 800 shaders (2.5x performance increase? i wonder why? because i added 250% more shaders and more memory bandwidth)
you saw the same with the 5xxx series, doubling the shaders from 800 to 1600.
you would think that if amd wanted to increase performance with the 69xx series that they would have added more shaders. but instead, they could not fit all the shaders they wanted to put on it because they had an inferior node to work with than what they originally had planned.
amd also altered the composition of these shaders and they way they work. id say that this change allows it to be more efficient (per shader) than the old architecture found on 38xx 48xx and 58xx series. id consider the 6xxx series to be a 'tock', much like the 38xx series, considering TSMC's 32nm fab failure.