- Jul 28, 2019
And this test chip proved that A72 is capable to run at 4 GHz if it's manufactured on HP process and rise the voltage. And Apple's A13 could run at 4 GHz too if manufactured same way. This comparison is very useful when you know how CPU stages work. Doesn't matter how many stages CPU has (useful but might be misleading) because more important is how many transistors contain the critical path chain. Signal velocity propagation across this critical path for every stage is known value called transistor characteristic (freq/voltage curve). Thanks to this we can indirectly compare how short (fast) all stages in pipeline are. For example if A72 runs at 2.5 GHz @ 0.7V and A12 at 2.5 GHz need same 0.7V this basically means their stage lenghts are very similar (no matter how many stages in total core are). Similar stage length means similar scaling to 4 GHz also for Apple CPUs. That's physics.That 4 Ghz example was a test chip, I saw nothing to indicate any advantageous power consumption in the news surrounding it.
It's pointless comparing A72 to the Axx cores, they are completely different uArch designs by different engineering teams.
It looks like you don't want to understand. If TDP limit for desktop CPU is typically 65-105W then Intel and AMD have very different design constrains than ARM chips with TDP 4W. And very different low hanging fruits too. For such a high TDP is higher frequency (by rising voltage to 1.3V) very simple way how to rise performance (+77% 3950X). Frequency is low hanging fruits for x86 desktop. Apple's mobile chips cannot waste such a energy so they have to go the hardest way - by increasing IPC. However the hardest way always pays dividends (same in real life with training and learning) and Apple ends up with the most advanced CPU core on the world with massive +82% IPC advantage over Zen 2.Again, it's about power - it's not "low hanging fruit", it's sheer power consumption.
Zen and Core derivatives perform far more efficiently below 3 Ghz - even 12nm could get you a 45W 8C Zen+ CPU at 2.8 Ghz (2700E).
I'd be interested to see someone do some tests on the 8C Zen2 SKU's underclocked to that 2700E range and see how many watts it pulls on average loads.
Apple's A13 is beating very fine binned Ryzen 9 3950X @ 4.7 GHz. If you would set Zen2 at clock every chip is capable of (like A13 is) then it would be somewhere around 4.2 Ghz.... and A13 would win over Zen 2 with higher margin. 64-core EPYC with TDP 280W (4.4W/core at 2.5 GHz) looks very competitive with A13 (5W/core @2.6 Ghz). Until you realize A13 has a massive +82% IPC advantage. EPYC would need more than 1000W TDP to be able to run at 4Ghz and be still slower than A13@2.6 GHz. I tell you guys Nuvia server chip with IPC like A13 would be total killer for most x86 server world today. AMD needs +25% IPC jump every year to stay competitive against Nuvia on 2024. That's why Zen3 must be something very good. Much better than leaks suggests. If +12% INT IPC gain is correct for Zen3 then AMD will fall into mediocrity again (same as K10.5/Barcelona/Thuban age).
Nice argument and I agree with that. However when you take Apple's A13 instead Atom there is same level of absolute performance as SKL. Just achieved at much lower clock. This means A13 core has to deal with same amount of instructions per time as SKL@4.5 GHz..... this means both cores (A13 and SKL) suffer with same latency issues. So memory and cache system is probably very similar for both cores.Because you need every part of the core to be synchronous. The faster the clock frequency, the harder it is to reach same perf/clock, because the same circuits need to be driven harder. Not only that, sometimes you need whole different circuitry that's more complex just to do the same thing at lower frequencies.
Intel's Atom-based Goldmont Plus has a L1 cache latency of 3 cycles. Skylake and Icelake is 5 cycles(Skylake is 4 in some corner case scenarios). Goldmont Plus aims for 3GHz, and SKL/ICL for 4-5GHz.
You see, 5 cycles @ 5GHz is same as 3 cycles @ 3GHz.
But the main question was whether there is any hard obsticle to design very wide 6xALU core operating at high clock as SKL. There isn't. Yes, it would need good cache system and yes, it would suffer at some type of code still. But look at Apple how they handled it. Their first 6xALU core A11 Monsoon was faster than their last 4xALU core A10 Hurricane but not as you would suggest from +50% ALU increase. Apple kept working hard on cache and memory system with A12 and A13 and finally got great IPC. The old good receipt for success is also my wish for everybody here into new year of 2020: much success through hard work, no excuses.