Apple does get 83% more IPC. Please don't spread technical nonsense which can be disproved easily by things such as performance counters. Bolding such false statements also isn't a good image from a CPU forum moderator.
While I agree that the A13 is a hell of a feat of engineering, I want to make sure we are being accurate, and not just consistent, with our grading of things. Because it's more complex than that, as you know.
Here's my diatribe:
About IPC, work done, uops, macroops, etc.
We don't actually have IPC. We have SPECint2006 scores, and AT use the same flags x86 vs mobile so it's not important whether it's SPECint2006 base or not. These scores are normalized ratios as well, which I'm sure some people will hate about it (just kidding!).
The way SPECint2006 measures "speed" - it is not actually considering IPC. It is considering time it takes to complete a set of tasks relative to a reference computer. But it is NOT measuring IPC or anything really analogous to it. You know why, Andrei, but I'll spell it out a little more for others like Richie Rich who seem to conflate work done with IPC.
Many people conflate IPC with work per cycle, which is false. IPC is the number of instructions the CPU can process per cycle. But what do we mean by instruction? Is it how many register transfers can be done per cycle (quite pure)? Or is it more complex: is it how many micro-ops can be done in a cycle, or how many macro-ops? And by whose definition (since even the x86 vendors use different definitions of both uop and macroop)? Or is it just how many program instructions can be burned through?
Perhaps we are most concerned with "work done" as a function of a program asking the CPU to do a task. So, if the benchmark asks the CPU to multiply the number at location x by location y and store it back at x, it sends different instructions to different ISAs:
As an example:
CISC:
MULT x, y
RISC:
LOAD A, x
LOAD B, y
PROD A, B
STORE x, A
For CISC that's one instruction and for RISC it's 4 instructions (if we take a pure "issue-based" count). If it takes four cycles to complete the task, as you'd expect, then that's 0.25 IPC for CISC and 1 IPC for RISC. But they're doing the same exact amount of work, it's just that MULT x,y is a container for 4 smaller instructions, uops. If we instead count uops, then RISC IPC = CISC IPC. But in the case of counting pure CPU instruction issues, then the IPC on the RISC chip will be artificially four times higher than on CISC.
So let's ask ourselves what we mean by IPC, and if it's even relevant: i.e. are we counting instructions issued to the chip, decoded instructions ("uops"), or macro ops? Or something else?
In the end, with the SPECint2006 scores, we are counting work done, not IPC. I think that work done is a more valid comparison. But that's up to each person. What is clear is that SPECint2006 is NOT IPC.
So it is truly VERY difficult to say with any certainty at all what the true IPC of any chip is, based on the information we have, and it's even harder when comparing x86 vs ARM because of the complexities in comparison between chips that heavily use uops and those that don't. Granted, yes, some benchmarks will send instructions that don't need to be decoded, thus removing this limitation, but boy are we going to have to dive deep if we want to compare, on a program-by-program basis, what the true IPC is. And that's only
after deciding what constitutes a true "instruction".
About SPECint2006 normalized to GHz
Per the basic calculations in Richie Rich's signature, which are accurate given what he claims to measure, the A13 can burn through 83% more work per clock than 9900K or 3900X.
However, this is so simplified and does not go into detail. The A13 can clock up to 2.66 GHz, but doesn't all the time. Same for the 9900K and 3900X with their "boost" speeds. And on the SPECint2006 benchmarks, we don't actually know what the average clockspeed was on a test that takes a very long time to run, and may be thermally limited. So even TRYING to find out what the real IPC OR work per clock is for these chips is immediately a fairly futile task unless we get more data.
My conclusion
1. We are all over the place with our discussion with respect to IPC vs SPECint2006 vs real work achieved vs work per socket vs work per watt.
2. Richie Rich's signature claims an IPC victory for A13, but it's completely false, as no one has compared anything analogous to IPC on those chips. It doesn't even accurately encompass "work done per cycle" because he uses boost frequency as the normalizer and doesn't consider average clock speed during the benchmark (which by the way isn't even published it seems, hence we cannot know for certain the true work done per cycle).
3. This is still very fun to talk about and very applicable to Graviton2's future.