Micro ops? Those are interesting stats if true, but it still wouldn't represent the width of the pipeline, but more the width*depth of the pipeline. And we all know dozer has a pipeline on the long side.
Micro-ops are either ALU or AGU, Macro-ops are both, macro-ops are the internal RISC interpretation of the CISC AMD64 ISA.
Width is 3-wide and height is 12-long for K8+(Greyhound 00f_10h).
Width is 4-wide and height is 15-long for K10(Bulldozer 15f_00h).
Width is 8-wide and height is 17-long for Zen(17f-00h).
On the FPU side, I think maybe yes, the FPU is wider (yet shared), so dozer might get higher FPU single thread.
10h -> 42 FPU instructions in execution. 14 entries per for FADD, FMUL, FMISC.
15h -> 64 FPU instructions in execution. Unified scheduler for the two FMACs and two FADD(+1FMISC)s.
But as far as integer width it's k10's 6 wide (3+3) vs dozer's 4 wide (2+2). The AGU can substitute sometimes as ALU, but I thought this was just the case for later generation dozers (I very well might be wrong on this guess).
Greyhound's LSU only could handle 2 AGUs, thus only 2 ALUs would have been active in generic workloads. Bulldozer LSU handles double the Greyhound LSU. So, load/store frames are halved with Bulldozer. Faster to work, quicker to finish.
Bulldozer requires no optimization, other than ISA for FMAC usage. Other than that Bulldozer optimized is not much faster than Bulldozer not-optimized.
Greyhound requires significant optimization, to get around load-store unit and dependencies, etc.
If one has the opportunity to go FX-4100 over Phenom II X4 945, they should take the FX-4100.
Priority of upgrade; Deneb to Zosma(Thuban-QC with Turbo Core 1.0) to FX-4100(Zambezi-QC with Turbo-core 2.0). The better upgrade is the farthest to the right.