Thermal throttling on HD58xx/69xx as an explanation? Now you are just grasping for straws.
The GPU performance delta between HD5870 and HD6970 for arithmetic / ALU computational work under traditional workloads compared to the HD7770 is vast.
HD6970 is 2.11x faster than HD7770 at single-precision calculations or ALU performance, meaning it could have ran at 440mhz and still tied an HD7770. Double precision performance on HD7770 is
8.44x slower than an HD6970. No amount of real world thermal throttling can explain how HD7770 would outperform an HD6970 in any compute app. Something else is the answer. Not only that but the thermal throttling theory itself is bogus as cards like HD5870/6970 don't thermal throttle if you adjust the PowerTune in CCC.
I had an HD6950 @ 6970 880mhz and ran MilkyWay@Home for more than half a year 24/7 loaded and it didn't throttle
once, despite this app loading the GPU to 99%. Stop spreading false information. Even your link shows throttling in Metro 2033 from 880mhz to 850mhz; and that type of drop cannot possibly explain why HD7770 would outperform an HD6970 in this DC app. The furmark throttling was implemented as early as HD5800 and Fermi series by both NV and AMD on purpose to stop the power virus from overloading the VRMs and killing GPUs. This was a well-known and investigated scenario. It's no wonder that Furmark throttled to 600mhz in that link.
If you increase PowerTune to +20%, there is no throttling whatsoever on any AMD cards and never has been on HD6970 even at 99% load in games or distributed computing projects. Of course you seem to be keen on always trying to find negatives in AMD cards rather than objectively trying to arrive at a root cause. As someone who ran countless DC projects on my unlocked 6950, I can say 6970 won't throttle at all.
Also, your logic for BIT_ALIGN_INT holds little water since HD5800/6900 destroy HD7770 in bitcoin mining, the very app that benefits from BIT_ALIGN_INT calculation. So that also can't be the answer. Not to mention BIT_ALIGN_INT is a purely ALU driven calculation and HD6970 has 2.11x the ALU performance over HD7770. Anything that relates to ALUs, shader speed, integer calculations, clock speeds cannot be the answer since HD7770 is inferior to HD5870/6970 in all of those areas. What's left? Architecture specific optimizations that this DC app uses.
The only way HD7770 can beat HD5870/6970 in a non-traditional compute task (i.e., not related to pure ALU, single-precision or double precision performance) is if it uses specific instructions that benefit an architecture that performs faster in OpenCL or DirectCompute, which GCN happens to run faster than Cypress/Cayman in a lot of these style apps. If we look at Compute performance under OpenCL, t
here are situations where HD7770 outperforms HD6970.