Question for those in the know. Does Spec CPU 2006 run predominantly from the cache?
I've been doing some reading to try and understand how the A12x can be so performant in these benchmarks. My mind has tremendous difficulty accepting that such a low power CPU can perform the way that it does. Anyway, I came across an interesting Reddit thread during my research:
ARM vs x86 IPC
The OP of that post basically thinks that Apple ARM CPUs have surpassed Intel in terms of IPC, and seems to spend the majority of the rest of the thread defending that assertion rather successfully....until another Reddit poster EqualityofAutonomy retorted with a counter argument that to me, was finally able to render the OP's assertion as false in a logical manner.
I'll let EqualityofAutonomy's words speak for itself rather than attempt to paraphrase him:
Lowering clocks increases IPC.
Raising clocks decreases IPC.
Most software, in practical use, is probably memory bound.
These are bad comparisons because you're likely just seeing a highly theoretical packed blob that fits neatly in L1 cache performing SIMD to maximize throughput.
Real world problems often don't fit in cache and aren't ridiculously simple and flawlessly optimized canned benchmarks and never get close to theoretical IPC.
You clearly don't even understand IPC. You can't manipulate it like that because it's not a linear relationship. It's a curve, as in higher frequencies produce less and less IPC. Lower frequencies produce greater and greater IPC. But it's not smooth. It's wild and rocky as boundaries of alignment are passed through. You'll see plateaus and seemingly unpredictable spikes. Because at the end of the day with all the background noise there's no reproducible test. Every run is slightly different. The scheduler dispatching similarly but differently. No run is truly identical to another.
The greater the frequency the more stalls that occur and instructions can take longer to retire. That's the more important metric. Increasing frequency can(will) increase the number of cycles instructions take to retire. Thus lowering IPC. The benefit is sometimes that's okay because the clock increase outweighs IPC increase.
Would you rather have a 1 GHz with 10 IPC or a 5 GHz with 3 IPC? That's 10 billion vs 15 billion. That's the sad reality. Okay those are totally made up for an example. But under clocking is very real. Sometimes performance gains happen due to factors like better thermals and less throttling.
To me this made sense. It's common knowledge that many PC workloads are sensitive to memory latency. Gaming is a good example. In fact, that is exactly why Zen 2 comes furnished with such a large L3 cache, to reduce memory latency. However what I did not know, assuming EqualityofAutonomy's assertion is true, was that raising clock speeds can decrease IPC. And the reason he postulates is because the faster a CPU is cycling at, the more it is affected by memory latency and stalls. Both Intel and AMD have spent large amounts of transistors to minimizing the effects of memory latency and stalls because the majority of code tends to be poorly written.
But with Apple's closed ecosystem, they control both the hardware and the software from the top down, so they are able to realize an efficiency that Intel or AMD could never approach in the PC's open ecosystem. I mean, realistically, would an x86-64 based CPU resembling the A12x Vortex do well in real world code that isn't hyper optimized compared to something like a Core i7? My gut tells me no.