So i constantly hear everyone saying that the IPC is so horrible. Yet ive never heard what the IPC actually is. I know its the instructions per cycle, but i have no idea how to determine how many IPC it has. How many does my 2500K have?
Computing absolute IPC is easy to do if you are interested in the theoretical maximum for a given microarchitecture, but theoretical max is rarely relevant.
Instead, what is relevant is actual IPC, which is always something smaller than the theoretical IPC value because of the very reason CPU's have cache - the dataset itself that is being created or analyzed or changed must get to the circuits in the CPU.
And the latency in doing that is to blame for actual IPC's being less than theoretical IPC.
Take a 2500K which has higher actual IPC than an FX-8150, remove the L3$, L2$, and L1$, and it will still have the same theoretical IPC but the actual IPC will be something worse than that of a Celeron 300A.
But, a Celeron 300A's theoretical IPC is far less than even the actual IPC of a 2500K, which is why Intel doesn't take a 300A and just bolt on the same L1$, L2$ and L3$ that they bolted onto a 2500K and call it a day...even if they added all that cache to a Celeron 300A the actual IPC would merely become more closer to the 300A's theoretical IPC which is still far lower than the actual IPC of a 2500K with all that cache.
Now then...as said above, knowing the theoretical IPC is easy if you have the microarchitecture details, but determining the actual IPC is not at all straightforward because it is (1) instruction mix dependent (there are >700 instructions in the ISA, see pic below) which makes it software application dependent, and (2) the dataset itself is user-dependent as well as application dependent.
^ that is a lot of instructions, and each one has a unique theoretical IPC, as well as an actual IPC (effective IPC) which is data-set dependent (cache stalls, data dependencies, etc).
And so computing a specific number for the effective IPC is not at all straightforward.
What is straightforward is to make relative comparisons, on a clock-normalized basis, in the benchmark performance of difference CPU's and microarchitectures.
Compare a 4GHz Thuban to a 4GHz Zambezi to a 4GHz Sandy Bridge. From that kind of a clock-normalized analysis you can arrive at reasonably useful IPC numbers which can then be used to speak to the underlying strengths and weaknesses of a given microarchitecture.
Performance = IPC x GHZ x Threads x Thread_Scaling_Factor
^ both IPC and thread_scaling_factor are application dependent (instruction mix dependent) as well as data-set dependent.
Thus you must find benchmarks that are suitable proxies for generating IPC and thread-scaling info which are indicative of the same class of general software applications that are relevant to the user class. (server apps for server markets, desktop apps for desktop markets, etc)
So just analyzing the actual IPC portion alone will still fail in capturing a significant portion of the performance-impacting characteristics of the overall processor/platform package.