I also enjoy alternative (not meant in a bad way) development of solutions. Look at each Bulldozer module. It could have been 1 integer unit and one FPU unit and AMD could have perhaps developed some sort of SMT or other type of hyper threading to make it work. Instead, they recognized that 80-90% of workloads are heavy integer so they added a second integer unit which, if the integer and its components are good, should provide better performance in threaded applications than a SMT or hyper threading scenario (2 Bulldozer modules with 4 integer cores vs. 2 core sandy bridge with 4 threads, if all else is equal, the 4 hardware integer units should have an advantage most of the time).
AMD currently has 3 ALUs per core (as does Intel).
Now they're going to 2 ALUs per core. So they didn't add an integer unit, they removed one (likewise, from 3 address generators to 2, so with AMD counting both as 'integer unit', they went from 6 units to 4 units per core).
So I think they're going for a tradeoff: less performance per core, but more cores per die.
At this point it's difficult to say whether Bulldozer or Sandy Bridge's HT will deliver more integer performance per die area.
The idea of HT is the same however... According to Intel, adding HT only increased the total transistor count by about 5%, but performance increase was around 10-30% on average.
At least the question of "How will they feed 4 integer units with one thread?" is now solved: Only two of them are ALUs. So they only feed 2 ALUs, which makes more sense.