Actually, I'd be very surprised (and disappointed) if the 2 module BD couldn't beat a SB dual core in highly multi-threaded code. How it will do in single-threaded code is another, and far more interesting question.
There are a lot of people who are expecting zero (or negative) IPC gains, and zero increases in clock speed. I'm not sure why, since everything we've heard of so far is that IPC has increased, and that BD is optimized for higher clocks on a smaller manufacturing process (K10.5 was not optimized for higher clocks, and it can easily hit 4ghz). This suggests that at the least, AMD will be competitive again in multithreaded code (as it was -- very barely -- against Nehalem).
I (like many) doubt BD will hit SB-level IPC, but I think ~ Nehalem-level IPC isn't too optimistic (the design has been out for years, AMD has known the IPC level), and if they can do that + higher clock speeds, they should be able to hold their own against SB and possibly IB.
I do not think that single thread performance is that important. This is because if an application does not need performance there will be no more than one thread started. If it needs plenty performance then it will start many threads. Therefore I think it doesn't matter that much, except you are having quite old software you do not want to update.
Then regarding IPC it is also a bit complicated. Especially for Bulldozer, as it splits a "MegaCore" aka Module in two almost independent cores (front-/backend shared). So it is interesting if you ask for IPC for the module which is definatly higher than everything else (including Sandy Bridge and even possibly successors) on the market. Or you look at IPC for a BD core where it is lower than a Sandy Bridge/Nehalem/Conroe core if hyper threading is deactivated but of course higher if hyper threading is activated on them.
As you see the IPC question is not very suited, as you get no simple answer.
Let's take two 8-thread capable parts, Bulldozer and Sandy Bridge.
With Bulldozer you get 8 fast cores (to OS).
With Sandy Bridge you get 4 very fast cores but only if you do not have more that 4 threads and you get 4 slow cores. If you have more than 4 threads the very fast cores drop in performance.
If you run only 1 thread in the system (amongst some more really minor threads of course) than it will be definately faster on Sandy Bridge if it is integer only and likly faster on Bulldozer if it heavily uses floating point.
Now to make all this even more complex: IPC is defined by instructions per clock. As Bulldozer cores will be clocked very high (~40% higher clock than Phenom) they need longer pipelines. These longer pipelines will take more time if they stall. That both together reduces even the IPC but increases performance. Weired not?
And to make the confusion complete, Bulldozer has changed number of superscalarity and abilities. IPC for integer drops since float IPC increases (per half module). However to make it complete: The IPC/pipeline heavily increases for integer as well as for float.
So question about IPC is just wrong. It is wrong because of multi-core (a two core CPU with high IPC is beaten by a 4 core with lower IPC), because of Hyper Threading (where IPC is dependend on if it is measured at which core and under what condition) and in future because of "module technology" where it is even unclear to which IPC relates to (to core or module).
You could only resolve this if you define IPC as instruction per clock for whole CPU. But then it is meaningless either since adding two more cores increases IPC.
There is one totally dominant parameter for CPU today and even more in the future and that is number of cores.
Why?
You compare two CPUs with different clocks? Hardly to get any 10% difference.
You compare two CPUs with different "IPC"? Hardly to get any 10% difference.
You compare two CPUs with different core count? Easy to get 50%-100% difference.
With a new process technology, e.g. with the step from 45 nm to 32 nm you could get the following:
10% more clock and 10% more IPC/core or
Double the core count.
The above is just the difference between what Intel and AMD did. Intel's Sandy Bridge takes the first option and delivers only slightly performance to previous generation. AMD follows the second path and delivers ~80% of performance gain.