Solved! ARM Apple High-End CPU - Intel replacement

Richie Rich · Oct 14, 2019

There is a first rumor about Intel replacement in Apple products:

ARM based high-end CPU
8 cores, no SMT
IPC +30% over Cortex A77
desktop performance (Core i7/Ryzen R7) with much lower power consumption
introduction with new gen MacBook Air in mid 2020 (considering also MacBook PRO and iMac)
massive AI accelerator

Source Coreteks:

DrMrLordX · Nov 4, 2019

SarahKerrigan said:
The amount of "Apple's cores absolutely can't keep up with Intel's, despite all benchmark evidence available" I'm seeing really surprises me. From an Occam's Razor perspective, it seems irrational. I'm fairly sure there are cases where Apple's clock-normalized wins over Intel are smaller than some of the ones I've seen, but every piece of evidence shows that Apple has developed a powerful and credible core, and that migrating the Mac, if they wanted to do it, is within technical reach.

I know this is an old(ish) post, but I think it bears response even at this point. Apple is, first-and-foremost, a lifestyle technology company. It is clear they have a superior CPU design team. They have the money and the chops to work on interconnects and push their products into nearly any segment they want: laptops, desktops, workstations, servers, HPC, etc. Or at least that's the way it seems. It is not clear how their ARM designs would hold up at higher core counts and/or higher clockspeeds. What is also unclear is how Apple makes more money expanding into markets they have mostly abandoned in the past. PC/laptop shipments are in decline. How much more of that market does Apple want for their own? Will they risk their incredible margins to capture buyers that wouldn't normally consider the Apple ecosystem? And do they even want to compete in the server market?

Apple's SoC designs are trapped inside a corporation that doesn't necessarily make its money by having the highest volume of shipments or by having the fastest hardware. Intel and AMD should fear their prowess, yes . . . but Apple is not a technology-first kind of company. I don't think they want to compete head-to-head with either company (per se).

On the flipside, I do see the possibility of (for example) A14-powered MacBooks being rational, especially if Intel can't offer them enough incentives to stay on x86. It may also make sense for Apple to work on their own cloud server architecture featuring heavily-modified versions of their SoCs. If their superior tech lets them push more software and services, then so be it.

beginner99 · Nov 4, 2019

SarahKerrigan said:
The amount of "Apple's cores absolutely can't keep up with Intel's, despite all benchmark evidence available" I'm seeing really surprises me. From an Occam's Razor perspective, it seems irrational. I'm fairly sure there are cases where Apple's clock-normalized wins over Intel are smaller than some of the ones I've seen, but every piece of evidence shows that Apple has developed a powerful and credible core, and that migrating the Mac, if they wanted to do it, is within technical reach. And the rest of the ARM ecosystem hasn't stopped moving either - Fujitsu A64FX is far from "slow and low power."

The issue is to make 1:1 comparisons which currently is impossible. Main factors affecting benchmarks are the operating system. it's obvious apple can optimize much more and windows for example has been known to run a lot slower in many cases. Then there are the vector extensions which greatly matters if they are used or not and gimping a benchmark to not use them is clearly not a 1:1 comparison as said extension impact the design and power use of the cpu.

Does the A13 have higher IPC than intel skylake on "common code"? certainly yes. Does it have higher IPC when skylake could use avx2? And higher IPC than icelake which could use avx512? probably not. I admit avx512 is a niche but it just shows that even IPC isn't a clear, hard definition.

Then there is the issue of diminishing returns. The apple core clearly is more efficient. But that is what it was designed for including the process node. Intels cores and processes are usually more geared for high frequency and performance. It's not certain apple could just adjust the design to higher frequency and more cores and be able to reach intels level of performance.

And then there is the often forgotten die size / core size. A wide relatively "large" core works fine if you only sell it in >$700 devices. It's however an issue if you must sell it also in $300 craptops. So core size/performance matters too.

Nothingness · Nov 4, 2019

beginner99 said:
The issue is to make 1:1 comparisons which currently is impossible. Main factors affecting benchmarks are the operating system. it's obvious apple can optimize much more and windows for example has been known to run a lot slower in many cases.

Anandtech has run SPEC 2006 on the Apple chips and SPEC 2006 has almost no dependency on the OS (though having huge page support might help, but this applies to any x86 OS and I don't know if iOS has such support [beyond using 16KB pages as a default]).

Then there are the vector extensions which greatly matters if they are used or not and gimping a benchmark to not use them is clearly not a 1:1 comparison as said extension impact the design and power use of the cpu.

Does the A13 have higher IPC than intel skylake on "common code"? certainly yes. Does it have higher IPC when skylake could use avx2? And higher IPC than icelake which could use avx512? probably not. I admit avx512 is a niche but it just shows that even IPC isn't a clear, hard definition.

Agreed. Though remember that turbo speed is reduced when running AVX2/512.

Then there is the issue of diminishing returns. The apple core clearly is more efficient. But that is what it was designed for including the process node. Intels cores and processes are usually more geared for high frequency and performance. It's not certain apple could just adjust the design to higher frequency and more cores and be able to reach intels level of performance.

On this I definitely agree! But even a 4-core Axx at current speed would make a very nice chip for a laptop.

And then there is the often forgotten die size / core size. A wide relatively "large" core works fine if you only sell it in >$700 devices. It's however an issue if you must sell it also in $300 craptops. So core size/performance matters too.

Any ARM Apple-based laptop would be high-end and as expensive as an Intel-based one (and I'm not sure you can find an Intel-based laptop at $300 running a core CPU less than 3 years old).

beginner99 · Nov 4, 2019

Nothingness said:
Any ARM Apple-based laptop would be high-end and as expensive as an Intel-based one (and I'm not sure you can find an Intel-based laptop at $300 running a core CPU less than 3 years old).

Exactly and because of that apple will have an advanatge again because they have a much smaller range of products. But I don't think apple will switch away from x86 anytime soon because they would also need to replace the mac pro (or discontinue the line).

$300 might be pushing it but there was on this forum just yesterday links to a $450 icelake based laptop.

Thala · Nov 4, 2019

beginner99 said:
But I don't think apple will switch away from x86 anytime soon because they would also need to replace the mac pro (or discontinue the line).

So what? Even a tiny company like AMD managed to develop something like EPYC based on much less performant cores. Last time i checked Mac Pros are just containing Intel Xeons - a really low hanging fruit to beat.

Richie Rich · Nov 4, 2019

Thala said:
So what? Even a tiny company like AMD managed to develop something like EPYC based on much less performant cores. Last time i checked Mac Pros are just containing Intel Xeons - a really low hanging fruit to beat.

Exactly. A12 Vortex core has 2.07 mm2 and Apple can put there a lot of them. Once they have developed powerfull core it's relatively easy to create several different CPUs.

We can expect at 5nm EUV node higher clock around 2.8 GHz for iPhone SoC. For laptop it could be 3.2 GHz (this is equivalent of 5 GHz Skylake). That's insane performance for laptop. And even more insane if A14 will bring some IPC improvement.

beginner99 · Nov 5, 2019

Thala said:
So what? Even a tiny company like AMD managed to develop something like EPYC based on much less performant cores. Last time i checked Mac Pros are just containing Intel Xeons - a really low hanging fruit to beat.

Richie Rich said:
Exactly. A12 Vortex core has 2.07 mm2 and Apple can put there a lot of them. Once they have developed powerfull core it's relatively easy to create several different CPUs.

It's not the core that is the problem, it's the end products that are needed. I've written this dozen times in these apple ditches x86 threads. For laptop up to imac Apple might get a way with a single 8-core chip/soc design as the lower end laptops could reuse the ipad "x" chip version. But where it fails is the mac pro which you now can soon get with up to 28 cores. That market is far, far too small to warrant a custom chip just for it. And given that the new version will release shortly and stay around for at least a couple years, I just don't see them ditching x86.
It's the small size of the mac pro market. Either apple stays on x86, moves to ARM and kills mac pro or still moves and takes loses on a custom mac pro chip. Occams razor. Which is the easiest and most likely path? And note I'm saying x86, this could be AMD too.

Richie Rich · Nov 5, 2019

@beginner99: I appreciate your constructive discussion and I agree with most of it. However what if Apple use same layout as EPYC 1 (4x CPU connected together by bus)? That would be super easy to do, 32-core CPU with 8 memory channels. There is NUMA problem but Apple has it's own OS so it can be handled OK.

Just imagine you are Apple CEO. Would you prefer to use competitors CPU instead of your own (much better performance, power efficiency and much cheaper, you pay only for silicon)? For Intel CPU you pay their margins and dividends. There are a economic (cheaper=higher margins), performace (customer happiness) and political (not dependent on Intel anymore) advantages.

Thala · Nov 5, 2019

Richie Rich said:
@beginner99: I appreciate your constructive discussion and I agree with most of it. However what if Apple use same layout as EPYC 1 (4x CPU connected together by bus)? That would be super easy to do, 32-core CPU with 8 memory channels. There is NUMA problem but Apple has it's own OS so it can be handled OK.

Just imagine you are Apple CEO. Would you prefer to use competitors CPU instead of your own (much better performance, power efficiency and much cheaper, you pay only for silicon)? For Intel CPU you pay their margins and dividends. There are a economic (cheaper=higher margins), performace (customer happiness) and political (not dependent on Intel anymore) advantages.

In addition you would have the advantage of the same ISA from iPhone to Mac Pro...you just do not have to bother about x86 at all anymore.

Carfax83 · Nov 5, 2019

When I was browsing the internet about the potential performance of Apple's laptop or desktop variant of their bionic series processors, one common refrain I read was that Apple's high IPC is intertwined with low clock speed. Apparently, higher clock speeds lowers IPC mainly due to the CPU having to wait on memory more often. Something else too was that wider architectures like the A13 may perform well in some tasks, while in other applications with limited parallelism, its performance would suffer much more than a more balanced architecture like Intel's Core or AMD's Zen series.

So to get a higher level of performance across a larger amount of applications, a potential laptop or desktop A13/A14 variant would have to scale up the clock speed, which means IPC would suffer. Not only that, but how would such an architecture be competitive against Intel or AMD in heavy workloads common on laptop or desktop PCs without SMT or wider SIMD vectors?

AFAIK, Apple only has Neon which if I'm not mistaken, is akin to SSE2 or something. AVX/AVX2/AVX-512 blows it out of the water. Or am I wrong on this?

soresu · Nov 5, 2019

Carfax83 said:
AFAIK, Apple only has Neon which if I'm not mistaken, is akin to SSE2 or something. AVX/AVX2/AVX-512 blows it out of the water. Or am I wrong on this?

No, you are correct - but the benefit of those wider SIMD instructions depends a lot on the workload.

Apple is limited to 128 bit NEON for now, but I imagine that the SVE2/TME announcement has them at least in design phase of a future core supporting wider vector lengths, perhaps even further depending on their foreknowledge of SVE2's development at ARM.

Carfax83 · Nov 5, 2019

soresu said:
No, you are correct - but the benefit of those wider SIMD instructions depends a lot on the workload.

Apple is limited to 128 bit NEON for now, but I imagine that the SVE2/TME announcement has them at least in design phase of a future core supporting wider vector lengths, perhaps even further depending on their foreknowledge of SVE2's development at ARM.

Wow, I didn't even know they were working on SVE2. Was the original SVE even implemented in any processor? If so, how did it compare to something like Intel's AVX SIMD?

soresu · Nov 5, 2019

Carfax83 said:
Wow, I didn't even know they were working on SVE2. Was the original SVE even implemented in any processor? If so, how did it compare to something like Intel's AVX SIMD?

All I know is that Fujitsu were going to use it in a custom ARM core for a "post-K" supercomputer CPU called A64FX, that was quite a while ago now, I don't think the chips were ever intended for wider use.

Found some PDF links about it:

1.
2.

soresu · Nov 5, 2019

Long story short A64FX targets a 2.7+ TFLOPS (DP) 48 core CPU, with the "post-K" system being projected at 150k+ nodes, so at least 405 DP PetaFLOPS.

This slide outlines the per core performance for A64FX:

Nothingness · Nov 6, 2019

Carfax83 said:
When I was browsing the internet about the potential performance of Apple's laptop or desktop variant of their bionic series processors, one common refrain I read was that Apple's high IPC is intertwined with low clock speed. Apparently, higher clock speeds lowers IPC mainly due to the CPU having to wait on memory more often.

That's not exactly the issue. What you describe already exists on any CPU: if you lower clock speed you'll get higher IPC because the memory latency in ns is constant, but in terms of cycles it will be reduced, hence more work per cycle. But in the case of Apple chip, the belief is that due to the rather low clock speed one can make more work in a cycle because you can traverse more logic (some people talk about FO4; you can read more about it here).

That's oversimplified but I hope you get the idea

Something else too was that wider architectures like the A13 may perform well in some tasks, while in other applications with limited parallelism, its performance would suffer much more than a more balanced architecture like Intel's Core or AMD's Zen series.

I'm not sure this makes a lot of sense. SPECint 2006 results seem to show a rather balanced architecture.

So to get a higher level of performance across a larger amount of applications, a potential laptop or desktop A13/A14 variant would have to scale up the clock speed, which means IPC would suffer.

Yes, but the $1,000,000 question is by how much.

beginner99 · Nov 6, 2019

Richie Rich said:
@beginner99: I appreciate your constructive discussion and I agree with most of it. However what if Apple use same layout as EPYC 1 (4x CPU connected together by bus)? That would be super easy to do, 32-core CPU with 8 memory channels. There is NUMA problem but Apple has it's own OS so it can be handled OK.

This would mean that the same CPU in a laptop would we a lot of unneeded "connection links". Wastes die area and maybe even power. Another problem is that it assumes Apple has such a "bus" or infinity fabric alternative available. Can the ARM interconnects work off-die? Don't really know.

The could also go with IO dies but it is still a lot of effort for a very small market.

Richie Rich said:
Just imagine you are Apple CEO. Would you prefer to use competitors CPU instead of your own (much better performance, power efficiency and much cheaper, you pay only for silicon)? For Intel CPU you pay their margins and dividends. There are a economic (cheaper=higher margins), performance (customer happiness) and political (not dependent on Intel anymore) advantages.

We don't know the actual costs so impossible to judge on it. I'm sure Apple gets a far better deal from Intel than we think. It's also depends on the actual market size of the macbooks, imacs and mac pro. Maybe they are larger than I think, especially the mac pro?

I think it's actually less cost for Apple to stay on x86 (cost isn't just money, also risks and complexity) . Another own chip would mean another design team is needed. That is a lot of expensive hires. Software needs to be adjusted albeit I think for Apples own software the probably have that already in-house. The bigger problem is 3rd party software. Would everybody adjust or some just say "not worth it, too small market"? Stuff from adobe, MS, other video/creation type software macs are often used for. Will these all follow?

Nothingness · Nov 6, 2019

beginner99 said:
Stuff from adobe, MS, other video/creation type software macs are often used for. Will these all follow?

Given that Adobe will port some of their tools to Windows ARM and to iPad, I guess they'd have little issue porting them to Mac OS X ARM.

MS have already ported some of their tools to Windows ARM so again it will likely be easy to port to Mac OS X ARM.

Of course that's only two companies.

But beyond what you point, some people insist on running Windows on their Mac, and that would be an issue I guess.

soresu · Nov 6, 2019

beginner99 said:
The bigger problem is 3rd party software. Would everybody adjust or some just say "not worth it, too small market"? Stuff from adobe, MS, other video/creation type software macs are often used for. Will these all follow?

This is a bigger problem than some give it credit.

x86 Windows will not simply be dumped even if x86 Mac is, which means that cross platform DCC packages would need to maintain 2 backends permanently - not something I imagine would be popular for the owners of larger packages like Autodesk with their full suite of software, something they have already cut back spending on not so long ago.

Carfax83 · Nov 6, 2019

Nothingness said:
I'm not sure this makes a lot of sense. SPECint 2006 results seem to show a rather balanced architeure

OK this might seem like a dumb question but indulge me nevertheless. Theoretically, for a personal desktop PC which is used for everything from gaming to encoding to browsing to editing, what sort of CPU would you prefer assuming they have the same ISA:

1) A 10 wide CPU that runs at 2.5ghz (very high IPC but low clock speed)

2) A 5 wide CPU that runs at 5ghz (very high clock speed but moderate IPC)

It's likely nowhere near enough information to make a proper informed decision, but whatever thoughts you have would still be useful to me

Thala · Nov 6, 2019

Carfax83 said:
OK this might seem like a dumb question but indulge me nevertheless. Theoretically, for a personal desktop PC which is used for everything from gaming to encoding to browsing to editing, what sort of CPU would you prefer assuming they have the same ISA:

A 10 wide CPU that runs at 2.5ghz (very high IPC but low clock speed)

2) A 5 wide CPU that runs at 5ghz (very high clock speed but moderate IPC)

It's likely nowhere near enough information to make a proper informed decision, but whatever thoughts you have would still be useful to me

Answer: A 10 wide CPU that runs 3.5-4Ghz - something which should be achievable with just upping voltage and frequency of A13 along with some additional buffers in the backend

Thing is the 5GHz CPU (some random Intel CPU) is already running at voltage and clock limits while the 2.5GHz CPU (in case of A13) is not.

Carfax83 · Nov 6, 2019

Thala said:
Answer: A 10 wide CPU that runs 3.5-4Ghz - something which should be achievable with just upping voltage and frequency of A13 along with some additional buffers in the backend
Thing is the 5GHz CPU (some random Intel CPU) is already running at voltage and clock limits while the 2.5GHz CPU (in case of A13) is not.

My whole point in asking that question, was to show that not all workloads would respond well to a wide architecture. I'm sure that Intel and AMD aren't stupid. If ultra wide architectures with high IPC but low clock speeds would allow them to dominate performance across a wide variety of workloads, they would have built that CPU already I wager. Gaming in particular, seems to prefer higher clock speeds along with low memory latency, which is why the 9900K is so dominant in gaming. A theoretical 10 wide CPU at say 2.5ghz would probably be destroyed in gaming by higher clocked CPUs with lower IPC. Factor in things like SMT and AVX and the case Apple being able to compete with Intel/AMD with more powerful variants of their bionic series CPUs looks grim to me.

Now if they can clock it significantly faster like you mentioned, then yeah, that would be a real threat. But to do so would require changing a great deal of the microarchitecture from what I understand, to make it more suitable and performant at higher clock speeds.

Nothingness · Nov 7, 2019

Carfax83 said:
My whole point in asking that question, was to show that not all workloads would respond well to a wide architecture. I'm sure that Intel and AMD aren't stupid. If ultra wide architectures with high IPC but low clock speeds would allow them to dominate performance across a wide variety of workloads, they would have built that CPU already I wager. Gaming in particular, seems to prefer higher clock speeds along with low memory latency, which is why the 9900K is so dominant in gaming. A theoretical 10 wide CPU at say 2.5ghz would probably be destroyed in gaming by higher clocked CPUs with lower IPC.

I tend to agree: there's diminishing return in getting wider because dependencies in the instruction window tend to limit what you can do without having to speculate too much and then increasing the numbers of replay.

But I'm not sure your example of 9900K is any proof that a higher frequency is the way to go. There might be many factors that could explain why 9900K is better: better prefetchers, drivers better optimized for Intel, etc.

And I agree with @Thala that increasing frequency too much is not efficient (remember P4?).

Factor in things like SMT and AVX and the case Apple being able to compete with Intel/AMD with more powerful variants of their bionic series CPUs looks grim to me.

If Apple wants it, they can add SVE2 which is miles ahead of AVX. And if they feel like SMT is the way to go, they can also add it.

Now if they can clock it significantly faster like you mentioned, then yeah, that would be a real threat. But to do so would require changing a great deal of the microarchitecture from what I understand, to make it more suitable and performant at higher clock speeds.

Yeah, the question being how large that "great deal" is

Thala · Nov 7, 2019

Carfax83 said:
Now if they can clock it significantly faster like you mentioned, then yeah, that would be a real threat. But to do so would require changing a great deal of the microarchitecture from what I understand, to make it more suitable and performant at higher clock speeds.

I was not talking about changing microarchtiecture, but just clocking the current architecture higher. Did you understand the point of the 5GHz CPU beeing at the voltage and frequency limits while the A13 is not?

beginner99 · Nov 7, 2019

Thala said:
Did you understand the point of the 5GHz CPU beeing at the voltage and frequency limits while the A13 is not?

And how do you know that? I bet the A13 can't OC anywhere near 4ghz within reasonable limits (voltage, heat)

Carfax83 · Nov 7, 2019

Nothingness said:
And I agree with @Thala that increasing frequency too much is not efficient (remember P4?).

Yeah I definitely remember the P4. Thought about it even when I wrote that post. But the P4 had very low IPC compared to something like the Core series, which is much more balanced and robust in that respect as well as in clock speed.

If Apple wants it, they can add SVE2 which is miles ahead of AVX. And if they feel like SMT is the way to go, they can also add it.

Yeah I did some reading about SVE2. Neither SVE or SVE2 has yet to be implemented yet though, so I guess we'll have to wait and see how they turn out.

By the time SVE/SVE2 is implemented, AVX-512 should be a mainstay in x86 vector computing, and Intel will probably be looking at AVX-1024. Judging by how aggressive Intel is going at implementing it in their consumer lineup that is.

Solved! ARM Apple High-End CPU - Intel replacement

Senior member

Lifer

Diamond Member

Platinum Member

Diamond Member

Golden Member

Senior member

Diamond Member

Senior member

Golden Member

Diamond Member

Platinum Member

Diamond Member

Platinum Member

Platinum Member

Platinum Member

Diamond Member

Platinum Member

Platinum Member

Diamond Member

Golden Member

Diamond Member

Platinum Member

Golden Member

Diamond Member

Diamond Member