Okey a 334mm2 part is barely faster than a 255mm2 and then: "Arguing that there are architectural improvements in the HD6000 outside of tessellation is very tough". You have to explain that to me, because it sound like a contradiction in my view ?
I guess I was less than clear in my post. My bad! Ok let me try to explain this again. The reason Barts XT is almost as fast as previous Cypress chips is NOT because of efficiencies of Barts, but rather because of the inefficiences of Cypress. AMD simply
overshot by including too many TMUs and SPs and straddled the chip with only 32 ROPs. The problem is those 1600 SPs and TMUs could never be fully used since they are always being bottlenecked.
Take a look at
850mhz HD5850 vs. 850mhz HD5870.
At the same speed, HD5850 and 5870 are basically performing within
3% of each other, despite HD5870 still having a
9% memory bandwidth advantage, 11% shader throughput advantage, 11% texture fill-rate advantage vs. an 850mhz HD5850. Basically, the extra bandwidth, shaders and texture units are being underutilized in the HD5870.
So what did AMD do? They simply removed the extra TMUs and Shaders that were being underutilized (in other words the major bottlenecks in Cypress prevented the chip from fully taking advantage of 80 TMUs and 1600 SPs). Next was acknowledging that the memory bandwidth was also wasteful (just like it was on HD4890). It was logical for them to reduce the design of the memory controller since supporting slower 4GB/sec memory chips still yielded more than satisfactory memory bandwidth. A less complex memory controller from Redwood took 2x less space than the Cypress memory controller on the GPU die.
Thus far, no one on the interent has published any evidence to support the view that SPs, ROPs or TMUs have been redesigned. It looks like they didn't
actually make the TMUs or SPs or ROPs by themselves any more efficient. They simply removed all the extra stuff that wasn't doing much.
Also consider this. 1440 SP @ 725mhz HD5850 has 28% more shaders than HD6870 but HD6870 has 24% faster clock speeds. So in essence, the 2 almost offset one another (but of course 900mhz ROPs > 725mhz ROPs and since ROP is likely the bottleneck, it makes sense that HD6870 is slightly faster than the HD5850).
As opposed to looking at Barts XT as an architectural "re-engineering", it's rather a more
well-balanced design of TMUs:ROPs:SPs than Cypress was. You can see how Fermi GTX470 with a whooping 448 SPs is not much faster than the 336 SPs of GTX460, as a result of having less texture fill-rate than the GF104 chip. Again, another chip that's not well balanced.