Originally posted by: Nemesis 1
If you open the link . Second point . It says this.1.1x-1.25 performance increase in single thread apps' = 10%-25% increase. Off course its at the same clock . How else would Intel do it. .
You are missing the point I was attempting to communicate.
If Intel sells a 3GHz Nehalem and you run a single-threaded application then the chip is going to automatically "turbo mode" the core running the single-threaded app (if you believe the marketing hype) to something >3GHz.
So if you compare single-threaded performance of a "3GHz" Bloomfield chip (albeit running 1 core at 3.5GHz and the remaining 3 cores at 2Ghz to fit into its power envelope) to a 3GHz Yorkfield are you comparing "clock to clock"? No you won't be.
So my question is how much of that 15-25% single-threaded performance boost is from the CPU up-clocking the loaded core by 15-25% versus how much is actually going to come from IPC improvments?
And if that is the case, what happens if I load a Bloomfield with 4 instances of a single-threaded application and turn them all on at once?
Because of TDP restrictions the chip won't operate any of the cores in turbo mode (as they are all fully-loaded with single-threaded apps) so will I still see a 15-25% performance boost in my single-threaded apps on Bloomfield relative to loading a Yorkfield in similiar fashion?
Have I clearly communicated my question now?
Originally posted by: Nemesis 1
Than you wrote this.
Because of hyperthreading you can get up to 2X more threads operating in parallel - so multi-threaded performance could potentially double on Bloomfield over Penryn simply because Bloomfield supports 8 threads and Penryn supports 4.
Exactly and it scales from 1.2 -2x multi threaded arch improvement. First point on slide 1. = 20%-100% improvement in multi thread arch.
If the multi-threaded performance increase is "at best" a linear extrapolation of the number of threads on the chip despite the chip architecture changing from Yorkfield to Bloomfield then that further suggests there is little to no IPC improvement per thread.
If they doubled the threads
AND increased the IPC per thread
AND integrated the memory controller then I would expect the upper end of the improvement range to be >2X and
not just simply listed as "up to" 2X.