The way I see it is that AMD made a judgment call 4-5 years ago and assumed that software progression would keep up with and even out-pace an ever-upward-scaling core count in the computing world that would try to feed the software that favored moar coars over IPC. It was a piss-poor call. Somewhere along the way somebody should have said "Hey, wait a minute. Maybe we were overly optimistic in our assessment." Instead, they stuck with it and the delays were both due to GloFo as well as an attempt to cover up the poor performance -- I think that's why it's got so much damn cache and was about 2 years too late.
I think they can certainly do quite a bit to patch it up, but what it's going to require and the amount of time that's needed is another story. They wanted to hold IPC the same with Bulldozer but missed that mark by -10%. In certain cases, a 2500K beats an 8150 by 40-50% in single-threaded performance. That's absolutely atrocious and they quite simply don't have a chance in hell of closing that gap with Piledriver nor Steamroller. I'm guessing they'll try their best by addressing the uberslow cache speeds, bump up stock clocks and that will allow Piledriver to do well in highly-threaded workloads and get close to or even slightly beat the 2600K (remember, that was the whole intent of CMT anyway) but will still display its weaknesses in low-threaded applications but by a smaller margin
What will be really interesting is to see if there was something that was missed and can be improved within the fetch/decoder within a module so that -20% performance hit within-module can be mitigated. I think any changes for the better on that end will help them tremendously as far as highly-threaded workloads go. Basically, attempt to make CMT look less like CMT as far as benchmarks go but still advantageous on the fab and $$ side.
edit:
Just want to add that if you see Piledriver stock clock speeds increased drastically (and judging by the Trinity APUs that does look to be the case) I'd be extremely wary of any significant IPC increases. What they did was lengthen the pipeline and increase the latencies (the cache being the biggie here) so the high clock speeds was something that Bulldozer required in order to make up for that intended IPC halt. Higher clock speeds at lower TDP likely means the cache speeds haven't been addressed and instead they're ramping up the clock speeds on a more refined process in order to make up some ground in per-core performance. Those would be the bad signs.
Unfortunately it seems that's exactly what AMD did. I ran the numbers on Trinity's clock speed in comparison to Llano (both 65W TDP variants and not including Turbo) and AMD's estimates pointed to a ~20-30% increase in PCmark Vantage, which roughly equated to the increase in clock speed from Llano to Trinity. In short, if we're to assume
these xbit graphs and AMD performance estimates are accurate, we'd be looking at Trinity equaling Llano's IPC and that performance improvement would be a direct result of higher clock speeds. That's a good sign even though I just stated it's a bad sign (!) and in essence should have been what Bulldozer brought to the table because this was the initial plan for Bulldozer from the get-go, but it also paints a poor picture going forward because they'll HAVE to change their approach from clock speeds to IPC increases due to their slow progression and limitations on the fab front. It looks as if they took the easy road and just bumped up clock speeds with a minor IPC increase to overcome Bulldozer's initial shortcomings. It looks as if it's a step in the right direction for the short term, but long term they'll require a complete overhaul in approach.
The good sign (better sign, imo) would be a leveling off of clock speeds on a smaller node (not happening until 2013 at the earliest) and a decrease in TDP on CPUs rather than APUs. It's a bit more difficult to judge the APUs because of how strong AMD's graphical performance is and just how it would affect the entire chip. There's also the issue of whether we'll even see any more AMD CPUs for the desktop after Vishera. According to their slides on FA day they're planning to replace them all with APUs going forward.
adding even more to this
😛
http://www.ilsistemista.net/index.p...omparison-whats-wrong-with-amd-bulldozer.html
Not very good grammar (seems to be foreign), but the dude does a good job of gathering info from many sources and puts it together really well.