Solved! ARM Apple High-End CPU - Intel replacement

Richie Rich · Oct 14, 2019

There is a first rumor about Intel replacement in Apple products:

ARM based high-end CPU
8 cores, no SMT
IPC +30% over Cortex A77
desktop performance (Core i7/Ryzen R7) with much lower power consumption
introduction with new gen MacBook Air in mid 2020 (considering also MacBook PRO and iMac)
massive AI accelerator

Source Coreteks:

Doug S · Mar 1, 2020

eek2121 said:
AMD has 14nm parts with a 6 watt TDP. They also have Renoir with a 15 watt TDP. It stands to reason that a 7nm part could easily operate in half the TDP. Of course, AMD won’t do this because there is no demand.

As far as other comments about Apple chips being “faster”, I will believe it when I see it. Running benchmarks is one thing. Doing development work or video/graphics work is something else. They would need to drastically increase the core count, which would negate any power savings.

Apple isn't using all that much less power per core as Intel and AMD are in their big CPUs. This isn't about saving power, though maybe they can gain a little of bit of efficiency over Intel's offerings but that's not why they would want to do it.

They only need to "drastically" increase core count in the "Pro" line. It isn't like they are selling Macbook Airs with 16 x86 cores, or that there is a demand for such. For most of the Mac line, they already have the performance they need from the 4 core 'X' version in the iPad. It performs better than the CPUs currently available in those models and doesn't need to change at all (other than adding a few blocks to support stuff Macs use that iPads don't like Thunderbolt) in particular in clock rate or number of cores.

They'll need something different for the Pro line to reach the number of cores required, and there are several ways to accomplish that and several years to get it done - they wouldn't transition the entire Mac line top to bottom all at once. They'd do the low end first and work their way towards the high end over a couple years or so. The high end customers are more risk averse so they will want to wait until their favorite software has been ported and running on ARM for a while and the bugs worked, and since Apple just introduced a brand new x86 Mac Pro they'd probably want to wait until 2023 at the earliest before they replaced it anyway.

Nothingness · Mar 2, 2020

eek2121 said:
It’s less to do with demand and more to do with margins. Android runs fine on x86, and I suspect iOS would as well. However, Intel and AMD build big, fast, high margin chips.

Ha yes it works fine as well as Windows runs fine on ARM. But then there's legacy, you know the thing x86 apologists claim will prevent ARM from being used? That's part of what killed Intel laughable attempt at entering the smartphone market.

Richie Rich · Mar 2, 2020

eek2121 said:
It’s memory subsystem is horrible, and floating point operations are also much slower than x86.

Read that Andrei's article again please.

iPhone 11 Pro review

The best 2019 iPhone you can wrap one hand around

www.anandtech.com

Memory subsystem:

- 128kB L1$ - 4x bigger than Ryzen (32kB)
- 8MB shared L2$ - 8x bigger than Ryzen (512kB)
- using LPDDR4X at 4200MHz, this year A14 will have LPDDR5 as far as I know (Zen3 will stay at DDR4, DDR5 is expected in Zen4)
- look at the $ latencies

Floating point:

- A13@2.65GHz ….. 65.27 pts
- Ryzen@4.7GHz ... 74.52 pts …. +14% faster than A13

FPU IPC/PPC per GHz shows:

- A13 …... 24,63 pts / GHz … +55% IPC over Zen2
- Ryzen... 15.86 pts / GHz

A13 has +55% FPU IPC than Zen2. Not +80% as for integer however still super strong over any x86 CPU running at almost 5GHz at desktop 100W TDP.
For 15W TDP laptops which typically operate around 3Ghz Apple's A13 core has no competitor in x86 world. And new A14X will extend this domination even further.

The only competitor for Apple's core will be ARM's Cortex A78 as being cheaper, available to any manufacturer and more powerful than x86. IMHO Apple is pushed to react with their ARM MacBook before CortexA78+Win10onARM flood the laptop market (2021/22).
Interesting thing on A77 is that ARM increased +17% transistors in compare to A76 while gaining +20% more performance.
Compared to Ice Lake Intel increased +38% transistors in compare to Coffie Lake while gaining +18% performance only. Not mentioning Intel needed 4 years for this iteration and ARM only 1 year. This shows how Intel became horribly lazy during Bulldozer period.

soresu · Mar 2, 2020

Richie Rich said:
RPi 6 could have A78
RPi 5 could be realistically something 14nm A76 based in 2021/22. Even A76 is huge jump from slow A72.

Like I said don't hold your breath, they aren't concerned with performance nearly as much as you think they are - if so they would have abandoned Broadcom's VideoCore GPU and gone with PowerVR, or Mali for RPi 4.

They stayed on 40nm for ages, I don't expect them to move from 28nm for years, if at all.

Amlogic and Rockchip will have to satisfy ARM SBC performance enthusiasts - I don't see anything higher than A75 (if that) in the RPi future for a loooong time.

Notice even nVidia still haven't upgraded the SHIELD Tegra X1 SoC from 20nm A57 after all this time? That's because it's cheap as sin, even more so because of the huge orders they place for it to fulfill obligations to Nintendo - economics plays a big role in these things.

soresu · Mar 2, 2020

Thala said:
Efficiency does not decrease when you have more cores.

Doesn't that depend on your interconnect and uncore efficiency?

soresu · Mar 2, 2020

Richie Rich said:
Not mentioning Intel needed 4 years of development and ARM only 1 year. This shows how Intel became horribly lazy during Bulldozer period.

As much as I love to dump on Intel and their overpriced laxity - you really have the wrong end of the stick there.

A77/Deimos is an iteration on the "ground up" work of A76, which was a completely new core that took years to develop (first mentioned the year A72 was announced I believe) - and like Zen2 alongside Zen1, it was likely in development alongside A76 for years, as A78 will have been.

Thala · Mar 2, 2020

soresu said:
Doesn't that depend on your interconnect and uncore efficiency?

It depends on many things but in the worst case switching capacitance doubles in the uncore when you double the number of cores. However since many data-pathes are shared between the cores, this is typically not the case. In addition, if we are assuming we are running a multithreaded application there is significant code sharing between cores. In essence your last level cache miss-rate will not increase linearly with number of cores.
In summary assuming we take a 7W TDP 4-core iPad SoC and double the number of cores we would end up well below 14W at iso voltage/frequency. If the plan would be to design a 15W TDP SoC we can use some power to increase voltage and frequency on top of having 8 cores.
Going down the route of having more power at our disposal, at one point we would use ulvt cells before increasing the voltage further because that gives us more frequency at the cost of leakage.

Richie Rich · Mar 2, 2020

soresu said:
As much as I love to dump on Intel and their overpriced laxity - you really have the wrong end of the stick there.

A77/Deimos is an iteration on the "ground up" work of A76, which was a completely new core that took years to develop (first mentioned the year A72 was announced I believe) - and like Zen2 alongside Zen1, it was likely in development alongside A76 for years, as A78 will have been.

I think I have a right end of stick here:

- ARM's after ground up A76 introduced iterations with +20% IPC increase every year as we can see on A77.
- Intel after ground up work on Sky Lake introduced its next iterations with 3% IPC increase every year.

I know that new uarch takes approximately 4 years to develop so there is a lot of parallel development needed. If you want release every year new core you need develop 4 cores in parallel, like Apple and ARM does (and AMD is trying that too). There is no excuse for Intel and his poor execution especially when Ice Lake is just modified Sky Lake and it's not a ground up design at all (should be Golden Cove). During Dozer period Intel had a monopoly which means they had a lot of money and no motivation for investment into IPC development. Corporate got greedy and lazy, this happened many times in history when CEO lacks a vision. Good for AMD though. But compare Apple's A7 and Intel Haswell, both 4xALU and Intel still leads performance wise. And look today, Apple has 3rd gen 6xALU core while Intel is stuck at historical 4xALU for decade with half of IPC. It was hidden as Intel tried to say "we hit the wall of scalar IPC" (supported by AMD's lets go backwards with IPC dozer). Unfortunately Apple demonstrated there is plenty scalar performance. Sadly even here are a some people who still don't see that. Sadly because ARM and Apple engineers did tremendous work at moving human kind forward (it will indirectly help speed up x86 development too) and yet receiving so much hate instead of appreciation.

soresu · Mar 2, 2020

Richie Rich said:
I think I have a right end of stick here:

- ARM's after ground up A76 introduced iterations with +20% IPC increase every year as we can see on A77.

- Intel after ground up work on Sky Lake introduced its next iterations with 3% IPC increase every year.

I know that new uarch takes approximately 4 years to develop so there is a lot of parallel development needed. If you want release every year new core you need develop 4 cores in parallel, like Apple and ARM does (and AMD is trying that too). There is no excuse for Intel and his poor execution especially when Ice Lake is just modified Sky Lake and it's not a ground up design at all (should be Golden Cove). During Dozer period Intel had a monopoly which means they had a lot of money and no motivation for investment into IPC development. Corporate got greedy and lazy, this happened many times in history when CEO lacks a vision. Good for AMD though. But compare Apple's A7 and Intel Haswell, both 4xALU and Intel still leads performance wise. And look today, Apple has 3rd gen 6xALU core while Intel is stuck at historical 4xALU for decade with half of IPC. It was hidden as Intel tried to say "we hit the wall of scalar IPC" (supported by AMD's lets go backwards with IPC dozer). Unfortunately Apple demonstrated there is plenty scalar performance. Sadly even here are a some people who still don't see that. Sadly because ARM and Apple engineers did tremendous work at moving human kind forward (it will indirectly help speed up x86 development too) and yet receiving so much hate instead of appreciation.

No, you still had the wrong end of the stick saying A77 took 1 year to develop - nothing that complicated takes one year of development, you have confused the release/announcement cadence of ARM's big cores with the development time.

Richie Rich · Mar 2, 2020

soresu said:
No, you still had the wrong end of the stick saying A77 took 1 year to develop - nothing that complicated takes one year of development, you have confused the release/announcement cadence of ARM's big cores with the development time.

I wrote:
- it takes 4 years to develop core
- parallel development of 4 cores needed to have every year new core release

I never wrote such a non sense as core dev takes one year.

soresu · Mar 2, 2020

Richie Rich said:
I wrote:
- it takes 4 years to develop core
- parallel development of 4 cores needed to have every year new core release

I never wrote such a non sense as core dev takes one year.

Ah, well, if you want to get technical ARM haven't "released" any cores at all (or at least in a very long time) seeing as they are not only a fabless company, but they also lack physical products as they license cores to other companies.

Intel on the other hand let themselves become completely dependent on their fabs, which AMD partially sidestepped when they spun out theirs to GF.

Had their 10nm strategy gone to plan, even with Bulldozer's children bearing no fruit Intel would still have pushed the offensive - alas their greatest power became their greatest weakness, and for now at least AMD's benefit.

If Intel were doing nothing but designing synthesizable cores as ARM does, it would be a different ballgame entirely in x86 land.

mikegg · Mar 3, 2020

Doug S said:
Apple isn't using all that much less power per core as Intel and AMD are in their big CPUs. This isn't about saving power, though maybe they can gain a little of bit of efficiency over Intel's offerings but that's not why they would want to do it.

They only need to "drastically" increase core count in the "Pro" line. It isn't like they are selling Macbook Airs with 16 x86 cores, or that there is a demand for such. For most of the Mac line, they already have the performance they need from the 4 core 'X' version in the iPad. It performs better than the CPUs currently available in those models and doesn't need to change at all (other than adding a few blocks to support stuff Macs use that iPads don't like Thunderbolt) in particular in clock rate or number of cores.

They'll need something different for the Pro line to reach the number of cores required, and there are several ways to accomplish that and several years to get it done - they wouldn't transition the entire Mac line top to bottom all at once. They'd do the low end first and work their way towards the high end over a couple years or so. The high end customers are more risk averse so they will want to wait until their favorite software has been ported and running on ARM for a while and the bugs worked, and since Apple just introduced a brand new x86 Mac Pro they'd probably want to wait until 2023 at the earliest before they replaced it anyway.

Did Apple release Intel Macs for the low end first or did they just do a grand replacement?

avAT · Mar 3, 2020

senttoschool said:
Did Apple release Intel Macs for the low end first or did they just do a grand replacement?

They released everything over 6 months, but it’s kind of interesting that basically everything shipped with a variant of the T2400 “Yonah“ (besides the Xeon Mac Pro). Forgot that the Mac line used to have less variation in capability.

USER8000 · Mar 3, 2020

Richie Rich said:
Not mentioning Intel needed 4 years for this iteration and ARM only 1 year. This shows how Intel became horribly lazy during Bulldozer period.

Richie Rich said:
I wrote:
- it takes 4 years to develop core
- parallel development of 4 cores needed to have every year new core release

I never wrote such a non sense as core dev takes one year.

That is from your own post - you implied that ARM took one year to develop cores. I know people who had friends who worked at ARM and I am from the UK,you really seem to be overegging things a bit mate.

soresu · Mar 3, 2020

USER8000 said:
That is from your own post - you implied that ARM took one year to develop cores. I know people who had friends who worked at ARM and I am from the UK,you really seem to be overegging things a bit mate.

Greetings from rainy Cumbria fellow Brit.

USER8000 · Mar 3, 2020

soresu said:
Greetings from rainy Cumbria fellow Brit.

I would say greetings from the sunny south of the country,but its raining here too!

eek2121 · Mar 3, 2020

Richie Rich said:
Read that Andrei's article again please.

iPhone 11 Pro review

The best 2019 iPhone you can wrap one hand around

www.anandtech.com

Memory subsystem:

- 128kB L1$ - 4x bigger than Ryzen (32kB)

- 8MB shared L2$ - 8x bigger than Ryzen (512kB)

- using LPDDR4X at 4200MHz, this year A14 will have LPDDR5 as far as I know (Zen3 will stay at DDR4, DDR5 is expected in Zen4)

- look at the $ latencies

Floating point:

- A13@2.65GHz ….. 65.27 pts

- Ryzen@4.7GHz ... 74.52 pts …. +14% faster than A13

FPU IPC/PPC per GHz shows:

- A13 …... 24,63 pts / GHz … +55% IPC over Zen2

- Ryzen... 15.86 pts / GHz

A13 has +55% FPU IPC than Zen2. Not +80% as for integer however still super strong over any x86 CPU running at almost 5GHz at desktop 100W TDP.
For 15W TDP laptops which typically operate around 3Ghz Apple's A13 core has no competitor in x86 world. And new A14X will extend this domination even further.

The only competitor for Apple's core will be ARM's Cortex A78 as being cheaper, available to any manufacturer and more powerful than x86. IMHO Apple is pushed to react with their ARM MacBook before CortexA78+Win10onARM flood the laptop market (2021/22).
Interesting thing on A77 is that ARM increased +17% transistors in compare to A76 while gaining +20% more performance.
Compared to Ice Lake Intel increased +38% transistors in compare to Coffie Lake while gaining +18% performance only. Not mentioning Intel needed 4 years for this iteration and ARM only 1 year. This shows how Intel became horribly lazy during Bulldozer period.

YOU should read the article again.

Specifically this page for memory latency and bandwidth: https://www.anandtech.com/show/14892/the-apple-iphone-11-pro-and-max-review/3

After you are done showing yourself out, then I'll leave it up to you to find the floating point results.

I'm objective, but calling Apple's ARM CPUs "competitive" with high end is a stretch.

Richie Rich · Mar 4, 2020

USER8000 said:
That is from your own post - you implied that ARM took one year to develop cores. I know people who had friends who worked at ARM and I am from the UK,you really seem to be overegging things a bit mate.

If you don't understand so simple thing like the difference between iteration (cadence) and total development time, then you better stay isolated on your island, mate.
Numbers:
- Intel: 18% IPC jump at 4 years (4.5% IPC/year), transistors +38% (0.47 IPCpp/tr), clocks -10%
- ARM: 20% IPC jump at 1 year (20% IPC/year), transistors +17% (1.17 IPCpp/tr), clocks same

Clearly ARM's development speed is 4.4x faster than Intel's. That's too huge to be sustainable in long term. Experts in AMD went even backwards with Dozer IPC and still survived. But it was different time where there was no powerful Cortex cores.

If you take into account how effectively ARM used transistors, you can see ARM is 2.5x more efficient than Intel (0.47 IPCpp/tr to 1.17 IPCpp/tr). This consequently leads to burning more power and limiting max clocks (Ice Lake is no major power advantage over old 14nm parts).

If multiple 4.4 with 2.5 then ARM get 10x higher total architectural improvement rate (not clear comparison though). They went from 2xALU A73 to ~~4xALU~~ 3xALU+1xJump A76 in 2 years, and to 4xALU+2xjump A77 in 3 years. Apple needed to go from 4xALU A7 to 6xALU A11 only 4 years. Similar insane rate of improvement. Intel is stuck at 4xALUs since 2013 Haswell till at least 2020 Tiger Lake - for long 7 years.

eek2121 said:
YOU should read the article again.

Specifically this page for memory latency and bandwidth:

The bandwidth is pretty comparable if you exclude 256-bit vectors which ARM doesn't have. Just look at 16MB test depth where 9900K and 3950X are falling down and Apple A13 still holds high bandwidth. I'd say not bad for 5W iPhone having comparable bandwidth with top desktops

"The A13 here again remains quite unique in its behavior, which is vastly more complex that what we see in any other microarchitecture."
"In general, Apple’s MLP ability is only second to AMD’s Zen processors, and clearly trounces anything else in the mobile space." - in other words A13 at 2.6GHz and 5W TDP is outperforming Intel's 9900K at 5GHz and 100W TDP.
Apple's 16MB System Level Cache which is L3 cache shared with GPU is very advanced design. AMD Renoir cannot use L3 cache for GPU for example.

mikegg · Mar 4, 2020

avAT said:
They released everything over 6 months, but it’s kind of interesting that basically everything shipped with a variant of the T2400 “Yonah“ (besides the Xeon Mac Pro). Forgot that the Mac line used to have less variation in capability.

Interesting.

Doug S said:
...they wouldn't transition the entire Mac line top to bottom all at once. They'd do the low end first and work their way towards the high end over a couple years or so. The high end customers are more risk averse so they will want to wait until their favorite software has been ported and running on ARM for a while and the bugs worked, and since Apple just introduced a brand new x86 Mac Pro they'd probably want to wait until 2023 at the earliest before they replaced it anyway.

This makes sense then. They'll probably transition their entire laptop lineup to ARM over one year, then transition their Mac Pro computers over a few years as professional software catch up.

Laptops have more to gain since they benefit more from the increased efficiency of Apple-designed ARM chips and they're higher volume.

I'm holding out for ARM based Macbook Pros because the 2016-generation and on have disappointed me. They're a downgrade from the 2015 Retina Macbooks in many ways. No significant improvement in performance/battery life ratio, cost way more, bad keyboards, useless touch bar, unnecessarily large touchpads, dongle hell.

name99 · Mar 4, 2020

eek2121 said:
I don't understand why people have said that Apple's ARM CPUs are faster than x86. I have, in my possession, an iPhone Pro Max. I have run various benchmarks on it, and I've also looked up Benchmarks online. The CPU is still very much in line with a Core i3 or low end i5 when it comes to performance.

I have my doubt about Apple switching to ARM unless they can shore up the performance. Things are getting even more complicated, because Zen cores could, in theory, run in the same power envelope as a high end Snapdragon while providing better performance.

Which benchmarks?
The claim is the SINGLE-THREADED CPU benchmarks match pretty much the best Intel has to offer.
OBVIOUSLY multi-core benchmarks will do better on a system with more cores.
OBVIOUSLY OS benchmarks have fsckall to do with the issue.
OBVIOUSLY system benchmarks are testing many different things, not just the CPU.

So: SINGLE-THREADED CPU benchmarks, things like GB5, SPEC, or browser benchmarks (when compares iOS Safari to iOS mac, so that essentially the same code is being compared, not different browser engines).

Why is this THE metric of interest? Because single-threaded CPU performance is the hardest thing to make faster. Adding extra cores is easy.

So, once again, which benchmarks?
Here's GB5 against the fastest (single threaded) mac Apple ships, i9 9900K at 5GHz.

iMac (27-inch Retina Early 2019) vs iPhone 11 Pro Max - Geekbench

browser.geekbench.com

name99 · Mar 4, 2020

Richie Rich said:
He talks about Geekbench5 score in MT which is true:

- Apple A13 - 2899 pts iPhone 11

- i3-9100 - 3228 pts Dell Vostro

- i5-1035 - 3816 pts IceLake HP laptop

- Apple A12X - 4607 pts

- Cortex A77 - 3433 pts new Snapdragon 865

MT score is not fully comparable because A13 is only 2 big core + 4 little core and Intels are 4 big cores + SMT.
You would need to compare with 4 big core A12X from iPad Pro GB5 score. And bang, an old A12X has MT score of 4607 and outperforming any 4-core x86 laptop chip by huge margin (while running at just 2.5 GHz).

If we isolate just single core performance than results are very different and suddenly Apple A13 is outperforming Intel by large number:

- Apple A13 - 1332 pts iPhone 11

- i3-9100 - 1129 pts Dell Vostro

- i5-1035 - 1214 pts IceLake HP laptop

- Apple A12X - 1113 pts iPad Pro

- Cortex A77 - 928 pts Snapdragon 865

And Apple A13 is running within 5W TDP, A12X 7W TDP...….. in compare to Intel laptops with 15W TDP and ST core running over 4 GHz (and still loosing in every way).
There is no point to answer some others - haters never write numbers and facts.

EDIT: Added Cortex A77 in new Snapdragon into comparison. It's 4+4 big.LITTLE and is on par with 4-core x86 Intels in MT load (with much lower power consumption). ST load is lower due to max clock at 2.8 GHz. IMHO biggest danger for Intel and AMD will come from generic Cortex cores such as this A77 or new A78. Don't forget that A77 has slightly higher IPC/PPC than Zen2 (+8% according to SPECint2006). This is huge milestone for generic Cortex core having higher IPC/PPC for the first time in history. Also very interesting from uarch point of view - A77 has 4xALU + 2xjump units, this half way to Apple's 6xALU design when Apple's core uses 4xALU + 2x simpleALU/jump units (and clearly wider than x86 stuck at 4xALU for decade). I can speculate that A78 might upgrade those 2xjump units into 2x simpleALU/jump in Apple style (seems logical and evolutional step to me but who knows). When every cheap Raspberry PI will have A78 with IPC/PPC higher than Ice Lake and Zen3, this is the biggest threat for x86 laptops and desktops. Apple is great technological demonstrator for future ARM Cortex cores though.

Just FYI Apple cores are now 7-wide, not 6.
"Width" is an imprecise term for OoO CPUs because there are many different chokepoints, but the A11 on appear to be able to SUSTAIN 7-wide operations (ie 7-wide decode and retire) with 6 (up from 4 in previous designs) ALU's.
There's still scope for Apple to go wider without breaking the bank. Two obvious next steps are
- 4th NEON unit, giving either 4-way NEON or 2-way SVE on 256-bit registers
- aggressive I-fusion of pairs of the form rA= rB op rC; rA=rA op rD, the significant point being back-to-back re-use of rA requires only one register allocation.
(And similar single-destination-register allocation pairs, for example the obvious load-store extensions)

For x86 the biggest difficulty in going wider is decode. For ARM decode is easy, the bigger difficulty in going wider are
- register rename (or more precisely resource allocation at the register rename stage)
- sustaining this wide I-fetch without stumbling, which requires a decoupled fetch engine, a very wide bus from I-cache to the I-queue, a very good branch predictor, and the ability to predict multiple branches per cycle (at the very least you need to be able to predict one fall-through and one taken per cycle, otherwise you're going to be killed by the branches, about 1/6 instructions, vs taken branches about 1/10 instructions).

But wider is not all there is to performance. You also want
- not to wait on RAM. This means very smart caches and very good prefetchers. Apple is stellar at both. ARM has stunning prefetchers (likely better than Apple A13, at least for some purposes. Presumably that will change with A14?) but their caches are more average.
- tricks in the core to defer work that is waiting on RAM. This is a HUGE topic, and it's unclear how much Apple (or anyone else) is doing. IMHO the best technique of immediate value is called Long Term Parking, published a few years ago. But it's unclear if Apple uses this yet.
- resource amplification. This refers to ways that you can get more value out of your core for given power, cycles, and area. The most common such technique is instruction fusion. Once again Apple is very aggressive here, with ARM lagging Apple a few years but still doing an OK job. There is still room for even more aggression by Apple, along with other techniques like instruction criticality prediction (which apparently no-one is yet using).
- additional types of speculation, for example load-value or load-address prediction. Qualcomm has been looking at these seriously for years now, they keep publishing (and their numbers keep getting better) but it's unclear their long-term plan. (Accumulate patents and try to license them to Apple or ARM?) Once again, it looks like no-one is actually using these yet.

Most of these ideas seem like they could perhaps be used by x86 as well (smarter caches, long term parking, value speculation, criticality). But x86 is so mired in complexity and baroque rules for exactly what has to be done for something else can be done (the memory ordering rules, which constrain various optimizations) that it's unclear if they can ever get there in the form of a CPU bug-free enough to be useful.

If you want to see a rough overview of where Apple is with respect to instruction units:

AnandTech Forums: Technology, Hardware, Software, and Deals

Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

www.anandtech.com

That gives A12 but A13 seems to be much the same. Of course the uncore matters as much as the core, which is why AnandTech gives as much space to describing the cache and memory controller for A12, then A13.
(Even so, AnandTech can't really get at either the cache smarts or the prefetchers, beyond the obvious issue of streams.)

name99 · Mar 4, 2020

Thunder 57 said:
Still not buying it. Let's see one do real work. As Mark (I think) said, why aren't Amazon/MS/etc running a bunch of phone chips in their servers? There's far more to it.

As for clocking higher, I was going by Richie Rich's own (optimistic, IMO) numbers.

1. Intel Core i9 9900K @5GHz ......... SPECint2006 score: 54.28 ...... 10.86 pts/GHz
2. Apple A13 @2.65 GHz .................. SPECint2006 score: 52.82 ...... 19.93 pts/GHz ...... +83 % IPC over 9900K
3. AMD Ryzen 3950X @4.6 GHz ...... SPECint2006 score:50.02 ...... 10.87 pts/GHz ...... + 0% IPC over 9900K .... fastest clocked Ryzen beaten by iPhone CPU
4. ARM Cortex A77@2.84 GHz ......... SPECint2006 score: 33.32 ...... 11.73 pts/GHz ...... + 8% IPC over 9900K

So Apple's A13 would need to clock considerably higher for it to take the massive lead he thinks it's capable of. If you're not going to clock it higher, what's the point? You'll get similar performance and lose compatibility. So no, don't tell me I "obviously" haven't been paying attention.

What the USER cares about is performance. Doesn't matter if that comes from IPC or frequency.

What the ARCHITECT cares about (and this includes internet amateur architects) is the split between IPC and frequency because these have different technology characteristics.

Frequency takes lots of power and is getting harder and hard to increase at smaller nodes. (Witness Intel's releasing 10nm cores at lower GHz than 14nm. Sure than 14nm has been tuned to the moon, but doing so for 10nm will take years, and the end result will be what, maybe 5% faster than 14nm+++?)

IPC just takes designer brains and lots of transistors.

This is why people like myself and Richie Rich care about IPC, and have so much respect for Apple. Apple clearly understood where technology was headed around 2007 (or whenever they started designing the A7) and Apple bet all their chips on designs that achieved speed through IPC. As of the A12 (last time I did the numbers) A12 is a bit over 4x the speed of A7, ~half of that through doubled GHz, about half through doubled IPC. (And that's more impressive than it sounds because that doubled GHz didn't hit a worse memory wall! Much of the smarts in the A12 don't increase IPC, but they do maintain it flat at double the GHz.)

And this is just as relevant going forward. Intel appears to have dug themselves into a hole (same as with P4) optimizing every aspect of their operations for GHz, and now screwed when they can't get higher GHz. Whereas Apple is happy to ride their annual 20% to 30% performance increases, both from new process (5nm should give maybe 10..15 % over 7nm), and ongoing IPC improvements of 10 to 20% per year.
(Sure INTC promised IPC improved for IceLake. What they delivered was maybe 15% -- what Apple does every year, but 5 years after Skylake. They claim they'll do the same for Tigerlake. We'll see. And we'll see if doing so again limits how far they can push GHz.)

Nothing says "I don't understand CPU's" like demanding that Apple chase GHz...

name99 · Mar 4, 2020

eek2121 said:
AMD has 14nm parts with a 6 watt TDP. They also have Renoir with a 15 watt TDP. It stands to reason that a 7nm part could easily operate in half the TDP. Of course, AMD won’t do this because there is no demand.

As far as other comments about Apple chips being “faster”, I will believe it when I see it. Running benchmarks is one thing. Doing development work or video/graphics work is something else. They would need to drastically increase the core count, which would negate any power savings.

Development work? WTF do you think the LLVM sub-benchmark of GB does?

name99 · Mar 4, 2020

eek2121 said:
It is important to note that a Ryzen 3700X (65W TDP) gets around the same SC score as the Apple chip, however, Geekbench has already been shown to favor Apple’s desktop operating system, so we don’t know what optimizations that Apple has under the hood. Chip performance could very likely fall apart due to the “open” nature of the Mac. When you combine this with the fact that Apple would have to use an emulator, the prospects of having better performance go out the window.

I don’t use Intel chips here, because they only have very specific parts on 10nm.

citation needed.
claiming this repeatedly doesn't make it true.
neither does it explain SPEC, or browser benchmarks.

name99 · Mar 4, 2020

eek2121 said:
It’s less to do with demand and more to do with margins. Android runs fine on x86, and I suspect iOS would as well. However, Intel and AMD build big, fast, high margin chips.

I read the Anandtech article again and it baffles me why people think that this chip is ready for the desktop. The only thing it has going for it is integer performance. It’s memory subsystem is horrible, and floating point operations are also much slower than x86.

Look, just stop. You're embarrassing yourself. Both of these are ridiculous claims and by insisting on them you're deciding you want to play with the kids interested in insulting each other, rather than with the adults interested in how this technology actually works.

Solved! ARM Apple High-End CPU - Intel replacement

Senior member

Diamond Member

Diamond Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Golden Member

Senior member

Diamond Member

Senior member

Diamond Member

Platinum Member

Junior Member

Golden Member

Diamond Member

Golden Member

Diamond Member

Senior member

Platinum Member

Senior member

Senior member

Senior member

Senior member

Senior member

Senior member