First, Turbo brings LESS for Sandy Bridge than Lynnfield:
http://www.computerbase.de/artikel/...-sandy-bridge/47/#abschnitt_skalierungsrating
Core i7 2600: 2%
Core i7 870: 8%
Second, you seem to be confusing multi-thread performance with single thread IPC.
First this is just not the case and very obvious. Think about e.g. Core i7 2600, it runs at 11% higher clock resulting in 2% higher performance? Computerbase is wrong here, though you do not know exactly what they mean. My statement was about IPC and therefore single thread analysis.
Yes I did maybe some confusion because in my post I write about overall performance but in that specific paragraph I write about IPC (I pointed that out but that was maybe not clear enough).
So Sandy Bridge brought performance improvements. They came from better scaling, better HT, higher overclocking in TURBO mode and higher IPC. However the IPC gain is very very little, which is because Intel reached already a very high level. I gave this statement to point out why AMD did not focus on IPC gains (besides very costly regarding RD power) but used other techniques to achieve a high performance level. Also Intel used mainly other techniques for Sandy Brdige than improving IPC (though they have high RD power and already very high IPC). This was already true for Nehalem.
So mainly in the past 3-4 years you have stagnation regarding IPC with Intel on a very high level and AMD on a high level, though they still squeeze out some 1-3% with each generation.
Performance improvements in the last 3-4 comes from more cores, more clock or better scaling (means that multiple cores are less influenced when other cores are as well fully busy, or threads switch from one core to another).
Therefore situation for AMD Bulldozer was like this:
a) Push IPC from high to very high as Intel -> very costly regarding R&D power
or
b) Improve core count/scaling/frequency
For Bulldozer they followed route b: double core count, higher frequency, lower IPC (compared with Star core, compensated by higher frequency), better scaling
Basically you can assume, that lower IPC is compensated by higher frequency, so then the double core count by using this "module technique" remains giving an ~80% performance boost over current Phenom CPUs. And that is enough to surpass Sandy Bridge.
As you can see, by no way they could have achieved such a tremendous performance improvement by improving IPC.
Or to explain you that in another way:
Sandy Bridge Core superscalarity = 3
Bulldozer Core superscalarity = 2
Bulldozer Module superscalarity = 4 (2*2)*
*It is even better than 4, because it is two independent of 2, therefore any mispredictions/pipeline stalls always affect only 2 pipelines in Bulldozer (all 3 in Sandy Bridge, though for some of these stall types HT can be used)