From an architecture standpoint I don't understand why it was a step backwards. On paper it seems like it would be better.
I doubt it was better on paper either. Things are only as it seems to the eyes of the beholder. Here's MO:
1. Remember when on AMD analyst day they revealed the architecture of Bulldozer? Few might remember them saying both Bulldozer and Bobcat is designed for "Knee of the Curve" performance.
Knee of the Curve: In a graph showing exponentials that shows trends, there is a point where the graph has a steep gain, then it gains a bit more, but is mostly leveled out.
In traditional CPU designs, the engineers pulled heroic feats to gain 1% extra performance. What AMD is saying is with Bulldozer and Bobcat, they aimed for the easy hanging fruit more or less. From there I figured they are not aiming for Intel's performance per clock. To be honest, I thought it would have been doing better though.
2. Related to #1, they focused on clocks. They simplified the design to save die size. Then suddenly they said the clock time is reduced something like 20%. The most likely way is they increased pipeline stages. The problem with high clock speed designs is when you can't clock them high.
In other words, AMD really has a choice to produce 8-core Zambezi w/16MB$ or 8-core Llano (no GPU) w/16MB$ for about the same die size (an octo-llano would be slightly smaller).
I doubt they had better choice. Remember they seem to be having problem clocking Llano high at desktop frequencies. Now imagine that with 2x the cores.