That slide is very vague. "128 bit FP" is what exactly? FMA4 optimized ? Or legacy SIMD? C11.5 leaked numbers show us that in legacy SIMD "8C" Bulldozer is not faster than X6 that works at lower clock.
According to this slide , P2 X6 can do 48 flops/cycle wich is
quite respectable and lead to think that CB can extract
only half of BD s througput , i.e , 32 Flops/cycle wich explain
the score that is close to a SB 2600K wich has also 32 flops/cycle.
So CB undoubtly has no FMA support but still , BD manage
to do as well as the X6 , meaning way more efficient FPU
latency and execution speed.
There s a little more of it when thinking that BD s FPUs
can execute both an ADD and a MULT while SB and the X6
have one FPU for FADD and another for FMUL,
and probably that the codes are optimised for this latter case.
Last edited: