LOL, ninja-edit there :ninja:
My understanding is that Microsoft and Intel coordinated the scheduler and OS to be logical vs. physical core "aware" and as such the scheduler can use the core/thread topology information to coordinate threads in an advantageous way.
The issue with Bulldozer is that you want the threads piled onto modules first so as to increase the chances of the clockspeed of the module itself being turbo-boosted.
So rather than distributing one thread per module (less resource sharing within the module, should be faster) and operating all modules at the stock clock, the idea is to intentionally gang the threads on a given module (reducing performance because of resource sharing) but enabling the turbo-clocks to kick in and, hopefully, elevating performance above and beyond the penalties that come from CMT style resource sharing.