Solved! ARM Apple High-End CPU - Intel replacement

Page 8 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Richie Rich

Senior member
Jul 28, 2019
470
229
76
There is a first rumor about Intel replacement in Apple products:
  • ARM based high-end CPU
  • 8 cores, no SMT
  • IPC +30% over Cortex A77
  • desktop performance (Core i7/Ryzen R7) with much lower power consumption
  • introduction with new gen MacBook Air in mid 2020 (considering also MacBook PRO and iMac)
  • massive AI accelerator

Source Coreteks:
 
  • Like
Reactions: vspalanki
Solution
What an understatement :D And it looks like it doesn't want to die. Yet.


Yes, A13 is competitive against Intel chips but the emulation tax is about 2x. So given that A13 ~= Intel, for emulated x86 programs you'd get half the speed of an equivalent x86 machine. This is one of the reasons they haven't yet switched.

Another reason is that it would prevent the use of Windows on their machines, something some say is very important.

The level of ignorance in this thread would be shocking if it weren't depressing.
Let's state some basics:

(a) History. Apple has never let backward compatibility limit what they do. They are not Intel, they are not Windows. They don't sell perpetual compatibility as a feature. Christ, the big...

DrMrLordX

Lifer
Apr 27, 2000
21,619
10,827
136
The amount of "Apple's cores absolutely can't keep up with Intel's, despite all benchmark evidence available" I'm seeing really surprises me. From an Occam's Razor perspective, it seems irrational. I'm fairly sure there are cases where Apple's clock-normalized wins over Intel are smaller than some of the ones I've seen, but every piece of evidence shows that Apple has developed a powerful and credible core, and that migrating the Mac, if they wanted to do it, is within technical reach.

I know this is an old(ish) post, but I think it bears response even at this point. Apple is, first-and-foremost, a lifestyle technology company. It is clear they have a superior CPU design team. They have the money and the chops to work on interconnects and push their products into nearly any segment they want: laptops, desktops, workstations, servers, HPC, etc. Or at least that's the way it seems. It is not clear how their ARM designs would hold up at higher core counts and/or higher clockspeeds. What is also unclear is how Apple makes more money expanding into markets they have mostly abandoned in the past. PC/laptop shipments are in decline. How much more of that market does Apple want for their own? Will they risk their incredible margins to capture buyers that wouldn't normally consider the Apple ecosystem? And do they even want to compete in the server market?

Apple's SoC designs are trapped inside a corporation that doesn't necessarily make its money by having the highest volume of shipments or by having the fastest hardware. Intel and AMD should fear their prowess, yes . . . but Apple is not a technology-first kind of company. I don't think they want to compete head-to-head with either company (per se).

On the flipside, I do see the possibility of (for example) A14-powered MacBooks being rational, especially if Intel can't offer them enough incentives to stay on x86. It may also make sense for Apple to work on their own cloud server architecture featuring heavily-modified versions of their SoCs. If their superior tech lets them push more software and services, then so be it.
 
  • Love
Reactions: wintercharm

beginner99

Diamond Member
Jun 2, 2009
5,210
1,580
136
The amount of "Apple's cores absolutely can't keep up with Intel's, despite all benchmark evidence available" I'm seeing really surprises me. From an Occam's Razor perspective, it seems irrational. I'm fairly sure there are cases where Apple's clock-normalized wins over Intel are smaller than some of the ones I've seen, but every piece of evidence shows that Apple has developed a powerful and credible core, and that migrating the Mac, if they wanted to do it, is within technical reach. And the rest of the ARM ecosystem hasn't stopped moving either - Fujitsu A64FX is far from "slow and low power."

The issue is to make 1:1 comparisons which currently is impossible. Main factors affecting benchmarks are the operating system. it's obvious apple can optimize much more and windows for example has been known to run a lot slower in many cases. Then there are the vector extensions which greatly matters if they are used or not and gimping a benchmark to not use them is clearly not a 1:1 comparison as said extension impact the design and power use of the cpu.

Does the A13 have higher IPC than intel skylake on "common code"? certainly yes. Does it have higher IPC when skylake could use avx2? And higher IPC than icelake which could use avx512? probably not. I admit avx512 is a niche but it just shows that even IPC isn't a clear, hard definition.

Then there is the issue of diminishing returns. The apple core clearly is more efficient. But that is what it was designed for including the process node. Intels cores and processes are usually more geared for high frequency and performance. It's not certain apple could just adjust the design to higher frequency and more cores and be able to reach intels level of performance.

And then there is the often forgotten die size / core size. A wide relatively "large" core works fine if you only sell it in >$700 devices. It's however an issue if you must sell it also in $300 craptops. So core size/performance matters too.
 

Nothingness

Platinum Member
Jul 3, 2013
2,400
733
136
The issue is to make 1:1 comparisons which currently is impossible. Main factors affecting benchmarks are the operating system. it's obvious apple can optimize much more and windows for example has been known to run a lot slower in many cases.
Anandtech has run SPEC 2006 on the Apple chips and SPEC 2006 has almost no dependency on the OS (though having huge page support might help, but this applies to any x86 OS and I don't know if iOS has such support [beyond using 16KB pages as a default]).

Then there are the vector extensions which greatly matters if they are used or not and gimping a benchmark to not use them is clearly not a 1:1 comparison as said extension impact the design and power use of the cpu.

Does the A13 have higher IPC than intel skylake on "common code"? certainly yes. Does it have higher IPC when skylake could use avx2? And higher IPC than icelake which could use avx512? probably not. I admit avx512 is a niche but it just shows that even IPC isn't a clear, hard definition.
Agreed. Though remember that turbo speed is reduced when running AVX2/512.

Then there is the issue of diminishing returns. The apple core clearly is more efficient. But that is what it was designed for including the process node. Intels cores and processes are usually more geared for high frequency and performance. It's not certain apple could just adjust the design to higher frequency and more cores and be able to reach intels level of performance.
On this I definitely agree! But even a 4-core Axx at current speed would make a very nice chip for a laptop.

And then there is the often forgotten die size / core size. A wide relatively "large" core works fine if you only sell it in >$700 devices. It's however an issue if you must sell it also in $300 craptops. So core size/performance matters too.
Any ARM Apple-based laptop would be high-end and as expensive as an Intel-based one (and I'm not sure you can find an Intel-based laptop at $300 running a core CPU less than 3 years old).
 

beginner99

Diamond Member
Jun 2, 2009
5,210
1,580
136
Any ARM Apple-based laptop would be high-end and as expensive as an Intel-based one (and I'm not sure you can find an Intel-based laptop at $300 running a core CPU less than 3 years old).

Exactly and because of that apple will have an advanatge again because they have a much smaller range of products. But I don't think apple will switch away from x86 anytime soon because they would also need to replace the mac pro (or discontinue the line).

$300 might be pushing it but there was on this forum just yesterday links to a $450 icelake based laptop.
 

Thala

Golden Member
Nov 12, 2014
1,355
653
136
But I don't think apple will switch away from x86 anytime soon because they would also need to replace the mac pro (or discontinue the line).

So what? Even a tiny company like AMD managed to develop something like EPYC based on much less performant cores. Last time i checked Mac Pros are just containing Intel Xeons - a really low hanging fruit to beat.
 

Richie Rich

Senior member
Jul 28, 2019
470
229
76
So what? Even a tiny company like AMD managed to develop something like EPYC based on much less performant cores. Last time i checked Mac Pros are just containing Intel Xeons - a really low hanging fruit to beat.
Exactly. A12 Vortex core has 2.07 mm2 and Apple can put there a lot of them. Once they have developed powerfull core it's relatively easy to create several different CPUs.

We can expect at 5nm EUV node higher clock around 2.8 GHz for iPhone SoC. For laptop it could be 3.2 GHz (this is equivalent of 5 GHz Skylake). That's insane performance for laptop. And even more insane if A14 will bring some IPC improvement.
 
  • Like
Reactions: Nothingness

beginner99

Diamond Member
Jun 2, 2009
5,210
1,580
136
So what? Even a tiny company like AMD managed to develop something like EPYC based on much less performant cores. Last time i checked Mac Pros are just containing Intel Xeons - a really low hanging fruit to beat.

Exactly. A12 Vortex core has 2.07 mm2 and Apple can put there a lot of them. Once they have developed powerfull core it's relatively easy to create several different CPUs.

It's not the core that is the problem, it's the end products that are needed. I've written this dozen times in these apple ditches x86 threads. For laptop up to imac Apple might get a way with a single 8-core chip/soc design as the lower end laptops could reuse the ipad "x" chip version. But where it fails is the mac pro which you now can soon get with up to 28 cores. That market is far, far too small to warrant a custom chip just for it. And given that the new version will release shortly and stay around for at least a couple years, I just don't see them ditching x86.
It's the small size of the mac pro market. Either apple stays on x86, moves to ARM and kills mac pro or still moves and takes loses on a custom mac pro chip. Occams razor. Which is the easiest and most likely path? And note I'm saying x86, this could be AMD too.
 

Richie Rich

Senior member
Jul 28, 2019
470
229
76
@beginner99: I appreciate your constructive discussion and I agree with most of it. However what if Apple use same layout as EPYC 1 (4x CPU connected together by bus)? That would be super easy to do, 32-core CPU with 8 memory channels. There is NUMA problem but Apple has it's own OS so it can be handled OK.

Just imagine you are Apple CEO. Would you prefer to use competitors CPU instead of your own (much better performance, power efficiency and much cheaper, you pay only for silicon)? For Intel CPU you pay their margins and dividends. There are a economic (cheaper=higher margins), performace (customer happiness) and political (not dependent on Intel anymore) advantages.
 

Thala

Golden Member
Nov 12, 2014
1,355
653
136
@beginner99: I appreciate your constructive discussion and I agree with most of it. However what if Apple use same layout as EPYC 1 (4x CPU connected together by bus)? That would be super easy to do, 32-core CPU with 8 memory channels. There is NUMA problem but Apple has it's own OS so it can be handled OK.

Just imagine you are Apple CEO. Would you prefer to use competitors CPU instead of your own (much better performance, power efficiency and much cheaper, you pay only for silicon)? For Intel CPU you pay their margins and dividends. There are a economic (cheaper=higher margins), performace (customer happiness) and political (not dependent on Intel anymore) advantages.

In addition you would have the advantage of the same ISA from iPhone to Mac Pro...you just do not have to bother about x86 at all anymore.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
When I was browsing the internet about the potential performance of Apple's laptop or desktop variant of their bionic series processors, one common refrain I read was that Apple's high IPC is intertwined with low clock speed. Apparently, higher clock speeds lowers IPC mainly due to the CPU having to wait on memory more often. Something else too was that wider architectures like the A13 may perform well in some tasks, while in other applications with limited parallelism, its performance would suffer much more than a more balanced architecture like Intel's Core or AMD's Zen series.

So to get a higher level of performance across a larger amount of applications, a potential laptop or desktop A13/A14 variant would have to scale up the clock speed, which means IPC would suffer. Not only that, but how would such an architecture be competitive against Intel or AMD in heavy workloads common on laptop or desktop PCs without SMT or wider SIMD vectors?

AFAIK, Apple only has Neon which if I'm not mistaken, is akin to SSE2 or something. AVX/AVX2/AVX-512 blows it out of the water. Or am I wrong on this?
 

soresu

Platinum Member
Dec 19, 2014
2,656
1,858
136
AFAIK, Apple only has Neon which if I'm not mistaken, is akin to SSE2 or something. AVX/AVX2/AVX-512 blows it out of the water. Or am I wrong on this?
No, you are correct - but the benefit of those wider SIMD instructions depends a lot on the workload.

Apple is limited to 128 bit NEON for now, but I imagine that the SVE2/TME announcement has them at least in design phase of a future core supporting wider vector lengths, perhaps even further depending on their foreknowledge of SVE2's development at ARM.
 
  • Like
Reactions: Carfax83

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
No, you are correct - but the benefit of those wider SIMD instructions depends a lot on the workload.

Apple is limited to 128 bit NEON for now, but I imagine that the SVE2/TME announcement has them at least in design phase of a future core supporting wider vector lengths, perhaps even further depending on their foreknowledge of SVE2's development at ARM.

Wow, I didn't even know they were working on SVE2. Was the original SVE even implemented in any processor? If so, how did it compare to something like Intel's AVX SIMD?
 

soresu

Platinum Member
Dec 19, 2014
2,656
1,858
136
Wow, I didn't even know they were working on SVE2. Was the original SVE even implemented in any processor? If so, how did it compare to something like Intel's AVX SIMD?
All I know is that Fujitsu were going to use it in a custom ARM core for a "post-K" supercomputer CPU called A64FX, that was quite a while ago now, I don't think the chips were ever intended for wider use.

Found some PDF links about it:

1.
2.
 

soresu

Platinum Member
Dec 19, 2014
2,656
1,858
136
Long story short A64FX targets a 2.7+ TFLOPS (DP) 48 core CPU, with the "post-K" system being projected at 150k+ nodes, so at least 405 DP PetaFLOPS.

This slide outlines the per core performance for A64FX:

1573014340276.png
 
  • Like
Reactions: Carfax83

Nothingness

Platinum Member
Jul 3, 2013
2,400
733
136
When I was browsing the internet about the potential performance of Apple's laptop or desktop variant of their bionic series processors, one common refrain I read was that Apple's high IPC is intertwined with low clock speed. Apparently, higher clock speeds lowers IPC mainly due to the CPU having to wait on memory more often.
That's not exactly the issue. What you describe already exists on any CPU: if you lower clock speed you'll get higher IPC because the memory latency in ns is constant, but in terms of cycles it will be reduced, hence more work per cycle. But in the case of Apple chip, the belief is that due to the rather low clock speed one can make more work in a cycle because you can traverse more logic (some people talk about FO4; you can read more about it here).

That's oversimplified but I hope you get the idea :)

Something else too was that wider architectures like the A13 may perform well in some tasks, while in other applications with limited parallelism, its performance would suffer much more than a more balanced architecture like Intel's Core or AMD's Zen series.
I'm not sure this makes a lot of sense. SPECint 2006 results seem to show a rather balanced architecture.

So to get a higher level of performance across a larger amount of applications, a potential laptop or desktop A13/A14 variant would have to scale up the clock speed, which means IPC would suffer.
Yes, but the $1,000,000 question is by how much.
 
  • Like
Reactions: Carfax83

beginner99

Diamond Member
Jun 2, 2009
5,210
1,580
136
@beginner99: I appreciate your constructive discussion and I agree with most of it. However what if Apple use same layout as EPYC 1 (4x CPU connected together by bus)? That would be super easy to do, 32-core CPU with 8 memory channels. There is NUMA problem but Apple has it's own OS so it can be handled OK.

This would mean that the same CPU in a laptop would we a lot of unneeded "connection links". Wastes die area and maybe even power. Another problem is that it assumes Apple has such a "bus" or infinity fabric alternative available. Can the ARM interconnects work off-die? Don't really know.

The could also go with IO dies but it is still a lot of effort for a very small market.

Just imagine you are Apple CEO. Would you prefer to use competitors CPU instead of your own (much better performance, power efficiency and much cheaper, you pay only for silicon)? For Intel CPU you pay their margins and dividends. There are a economic (cheaper=higher margins), performance (customer happiness) and political (not dependent on Intel anymore) advantages.

We don't know the actual costs so impossible to judge on it. I'm sure Apple gets a far better deal from Intel than we think. It's also depends on the actual market size of the macbooks, imacs and mac pro. Maybe they are larger than I think, especially the mac pro?

I think it's actually less cost for Apple to stay on x86 (cost isn't just money, also risks and complexity) . Another own chip would mean another design team is needed. That is a lot of expensive hires. Software needs to be adjusted albeit I think for Apples own software the probably have that already in-house. The bigger problem is 3rd party software. Would everybody adjust or some just say "not worth it, too small market"? Stuff from adobe, MS, other video/creation type software macs are often used for. Will these all follow?
 

Nothingness

Platinum Member
Jul 3, 2013
2,400
733
136
Stuff from adobe, MS, other video/creation type software macs are often used for. Will these all follow?
Given that Adobe will port some of their tools to Windows ARM and to iPad, I guess they'd have little issue porting them to Mac OS X ARM.

MS have already ported some of their tools to Windows ARM so again it will likely be easy to port to Mac OS X ARM.

Of course that's only two companies.

But beyond what you point, some people insist on running Windows on their Mac, and that would be an issue I guess.
 

soresu

Platinum Member
Dec 19, 2014
2,656
1,858
136
The bigger problem is 3rd party software. Would everybody adjust or some just say "not worth it, too small market"? Stuff from adobe, MS, other video/creation type software macs are often used for. Will these all follow?
This is a bigger problem than some give it credit.

x86 Windows will not simply be dumped even if x86 Mac is, which means that cross platform DCC packages would need to maintain 2 backends permanently - not something I imagine would be popular for the owners of larger packages like Autodesk with their full suite of software, something they have already cut back spending on not so long ago.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
I'm not sure this makes a lot of sense. SPECint 2006 results seem to show a rather balanced architeure

OK this might seem like a dumb question but indulge me nevertheless. Theoretically, for a personal desktop PC which is used for everything from gaming to encoding to browsing to editing, what sort of CPU would you prefer assuming they have the same ISA:

1) A 10 wide CPU that runs at 2.5ghz (very high IPC but low clock speed)

2) A 5 wide CPU that runs at 5ghz (very high clock speed but moderate IPC)

It's likely nowhere near enough information to make a proper informed decision, but whatever thoughts you have would still be useful to me :cool:
 

Thala

Golden Member
Nov 12, 2014
1,355
653
136
OK this might seem like a dumb question but indulge me nevertheless. Theoretically, for a personal desktop PC which is used for everything from gaming to encoding to browsing to editing, what sort of CPU would you prefer assuming they have the same ISA:

A 10 wide CPU that runs at 2.5ghz (very high IPC but low clock speed)

2) A 5 wide CPU that runs at 5ghz (very high clock speed but moderate IPC)

It's likely nowhere near enough information to make a proper informed decision, but whatever thoughts you have would still be useful to me :cool:

Answer: A 10 wide CPU that runs 3.5-4Ghz - something which should be achievable with just upping voltage and frequency of A13 along with some additional buffers in the backend :)
Thing is the 5GHz CPU (some random Intel CPU) is already running at voltage and clock limits while the 2.5GHz CPU (in case of A13) is not.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Answer: A 10 wide CPU that runs 3.5-4Ghz - something which should be achievable with just upping voltage and frequency of A13 along with some additional buffers in the backend :)
Thing is the 5GHz CPU (some random Intel CPU) is already running at voltage and clock limits while the 2.5GHz CPU (in case of A13) is not.

My whole point in asking that question, was to show that not all workloads would respond well to a wide architecture. I'm sure that Intel and AMD aren't stupid. If ultra wide architectures with high IPC but low clock speeds would allow them to dominate performance across a wide variety of workloads, they would have built that CPU already I wager. Gaming in particular, seems to prefer higher clock speeds along with low memory latency, which is why the 9900K is so dominant in gaming. A theoretical 10 wide CPU at say 2.5ghz would probably be destroyed in gaming by higher clocked CPUs with lower IPC. Factor in things like SMT and AVX and the case Apple being able to compete with Intel/AMD with more powerful variants of their bionic series CPUs looks grim to me.

Now if they can clock it significantly faster like you mentioned, then yeah, that would be a real threat. But to do so would require changing a great deal of the microarchitecture from what I understand, to make it more suitable and performant at higher clock speeds.
 

Nothingness

Platinum Member
Jul 3, 2013
2,400
733
136
My whole point in asking that question, was to show that not all workloads would respond well to a wide architecture. I'm sure that Intel and AMD aren't stupid. If ultra wide architectures with high IPC but low clock speeds would allow them to dominate performance across a wide variety of workloads, they would have built that CPU already I wager. Gaming in particular, seems to prefer higher clock speeds along with low memory latency, which is why the 9900K is so dominant in gaming. A theoretical 10 wide CPU at say 2.5ghz would probably be destroyed in gaming by higher clocked CPUs with lower IPC.
I tend to agree: there's diminishing return in getting wider because dependencies in the instruction window tend to limit what you can do without having to speculate too much and then increasing the numbers of replay.

But I'm not sure your example of 9900K is any proof that a higher frequency is the way to go. There might be many factors that could explain why 9900K is better: better prefetchers, drivers better optimized for Intel, etc.

And I agree with @Thala that increasing frequency too much is not efficient (remember P4?).

Factor in things like SMT and AVX and the case Apple being able to compete with Intel/AMD with more powerful variants of their bionic series CPUs looks grim to me.
If Apple wants it, they can add SVE2 which is miles ahead of AVX. And if they feel like SMT is the way to go, they can also add it.

Now if they can clock it significantly faster like you mentioned, then yeah, that would be a real threat. But to do so would require changing a great deal of the microarchitecture from what I understand, to make it more suitable and performant at higher clock speeds.
Yeah, the question being how large that "great deal" is :D
 
  • Like
Reactions: Carfax83

Thala

Golden Member
Nov 12, 2014
1,355
653
136
Now if they can clock it significantly faster like you mentioned, then yeah, that would be a real threat. But to do so would require changing a great deal of the microarchitecture from what I understand, to make it more suitable and performant at higher clock speeds.

I was not talking about changing microarchtiecture, but just clocking the current architecture higher. Did you understand the point of the 5GHz CPU beeing at the voltage and frequency limits while the A13 is not?
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
And I agree with @Thala that increasing frequency too much is not efficient (remember P4?).

Yeah I definitely remember the P4. Thought about it even when I wrote that post. But the P4 had very low IPC compared to something like the Core series, which is much more balanced and robust in that respect as well as in clock speed.

If Apple wants it, they can add SVE2 which is miles ahead of AVX. And if they feel like SMT is the way to go, they can also add it.

Yeah I did some reading about SVE2. Neither SVE or SVE2 has yet to be implemented yet though, so I guess we'll have to wait and see how they turn out.

By the time SVE/SVE2 is implemented, AVX-512 should be a mainstay in x86 vector computing, and Intel will probably be looking at AVX-1024. Judging by how aggressive Intel is going at implementing it in their consumer lineup that is.
 
Last edited: