Let me explain clearly to everyone what is going on.
There are bulldozer modules, not bulldozer cores. Let's all get on the same page here and this will go a lot quicker. Half of the problem is someone confusing a core for a module.
I will use interlagos for this explanation since I am in the server business (I will never comment on desktop, don't know enough about that business.)
Interlagos is a 16-core processor. It will have 16 logical integer cores and it will appear to the hardware and OS as 16 cores.
An interlagos will be made up of EIGHT bulldozer modules. Each module will have 2 integer cores plus a shared 256-bit FPU (which we will get to in a second). 8 x 2 = 16.
Each integer core will run one thread (there are 4 pipelines). That means 2 cores per module, simultaneously.
The FPU is 256-bit. During each clock cycle it can be either 256-bit for either core OR it can be 128-bit for each core simultaneously.
Now, on to HT. Proponents of HT claim "performance improvement with ~5% die space increase." The problem with the performance increase is that it is generally ~10-20%. Sometimes it is negative (in which case they recommend that you turn off HT). So, as a tradeoff, 5% die space for ~20% performance increase seems fine, right?
Well we had our engineers do the math on our core. If I took an Interlagos (16 cores, 8 modules) and pulled out half of the cores, I would save ~5% of the die space. You see, there is a lot on the die other than the integer cores. There is cache, northbridge, FPU, etc. In our case, a 16-core interlagos should perform ~80%+ faster than an 8-core Interlagos. With ~5% more die space.
Some will still try to argue HT as a better technology, but it boils down to this: If you are going to add 5% die space to a CPU, would you rather have 10-20% performance increase (with the chance that it is also negative) or would you rather have 80%+ performance increase.
We believe that real cores and real threads give you the best performance.