tel is on record saying single-thread IPC is improved 15% but multi-threaded performance is improved "up to 200%", comparing Yorkfield vs Bloomfield.
Actually, Intel said 10-30% for single-thread and 20-100% for multi-thread performance improvement. Which if we average it out=10+30/2=17.5% Since Merom got 20% IPC increase over Yonah, we can assume Nehalem will be 20% faster over Merom.
I read (maybe from aigo) the other day that smt is supposed to be MUCH better than ht was. I used to run seti@home on an old p4 with ht and it only got about 10% more ppd than a normal p4 at 3.0. Supposedly smt is going to be a LOT better than that. I guess we'll see...
There is a reasonable evidence to believe that Nehalem would be MUCH better at SMT than Pentium 4 did(actually HT is Intel's term for SMT, but that's besides the point).
Pentium 4 had two major flaws that blocked SMT from showing true power.
1. Pentium 4 had 1 decoder, ONE decoder!! The Trace Cache was made to make up for it, but then the cache hit rate was said to be low(50-60%). What happens when cache hit rate is low?? The CPU is effectively 1-issue machine lots of times. But 1-issue isn't at the point where there is ILP bottleneck. Therefore, if the SMT thread was running, sometimes it was trying to take advantage of the sometimes-1-issue CPU, so the benefit of SMT was greatly diminished
2. Pentium 4 had a thing called Replay. Basically because the pipeline was so long Intel had to find a way to speculatively process an instruction. Replay was a feature to basically "replay" the execution potentially saving clock cycles and effectively reducing the pipeline stages... when it WORKED!!!
But then when it didn't work(like a cache miss), it would take valuable processing cycles processing that instruction and the CPU would be utilized doing basically nothing during that time. Sometimes, the replay system would "replay" for numerous amount of times!!
Because the CPU was filled up with processing an instruction already, getting another thread meant that the original thread lost performance. So that's why there was sometimes a drastic performance loss and sometimes pretty good performance for SMT.
Nehalem has NONE of the above mentioned problems. Plus, Nehalem is wider, has substantially more memory bandwidth.
That's why even Intel's claims are much much better for SMT gains for Nehalem than it was for Pentium 4. Pentium 4 was claimed to get up to 30% increase with HT. Nehalem, Intel says will get 20-100% increase, but if we disregard the IPC increase the actual improvement will be something like 0(in single thread)-50%.
However we look at it, there is something amazing coming on the horizon

.