Solved! ARM Apple High-End CPU - Intel replacement

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Richie Rich

Senior member
Jul 28, 2019
470
229
76
There is a first rumor about Intel replacement in Apple products:
  • ARM based high-end CPU
  • 8 cores, no SMT
  • IPC +30% over Cortex A77
  • desktop performance (Core i7/Ryzen R7) with much lower power consumption
  • introduction with new gen MacBook Air in mid 2020 (considering also MacBook PRO and iMac)
  • massive AI accelerator

Source Coreteks:
 
  • Like
Reactions: vspalanki
Solution
What an understatement :D And it looks like it doesn't want to die. Yet.


Yes, A13 is competitive against Intel chips but the emulation tax is about 2x. So given that A13 ~= Intel, for emulated x86 programs you'd get half the speed of an equivalent x86 machine. This is one of the reasons they haven't yet switched.

Another reason is that it would prevent the use of Windows on their machines, something some say is very important.

The level of ignorance in this thread would be shocking if it weren't depressing.
Let's state some basics:

(a) History. Apple has never let backward compatibility limit what they do. They are not Intel, they are not Windows. They don't sell perpetual compatibility as a feature. Christ, the big...

Steelbom

Senior member
Sep 1, 2009
438
17
81
I think XCode will already compile applications for both architectures so it's not difficult. The biggest problems would be software that takes advantage of or is designed around architecture specific features or quirks.

The other side of this is that if Apple is building their own ARM SoC they can always add in some hardware accelerators or special instructions of their own that those companies would want to target.
I see I see. I'm not too familiar with how apps on different architectures work.
Why would Intel sell them a license and why would Apple even want to pay what something like that would cost? They'd probably need a separate license from AMD as well for the 64-bit instruction set. Then they'd have to design an entirely new CPU that goes into a few millions of Macs each year instead of hundreds of millions of iDevices.
Actually, yes you're probably right about that. I'm not sure about the cost of licensing x86 but if it is expensive then selling a few million Macs a year may not be worth the fees.
If they do anything I suspect it's just a straight jump to ARM with some of their hardware models and a gradual shift over time. That's the easiest and requires no dependence on other companies, their road maps, etc. It wouldn't be the first time Apple has done something like this so they have a pretty good idea of what the problems will be and how to handle them.

Partnering with someone like AMD to make a hybrid chip is an outside chance at best. There are some reasons for something like that to happen, but there are plenty not to do it as well. But I'd still rate it as far more likely than Apple building their own x86 CPU.
How powerful can an ARM chip be though? Obviously it's very power efficient, but can it scale to high clock speeds in time to like 4-5GHz to match some of the more powerful x86 CPUs? And if so, is it still more power efficient than an equivalent x86 CPU?
 

jpiniero

Lifer
Oct 1, 2010
14,600
5,221
136
How powerful can an ARM chip be though? Obviously it's very power efficient, but can it scale to high clock speeds in time to like 4-5GHz to match some of the more powerful x86 CPUs? And if so, is it still more power efficient than an equivalent x86 CPU?

Doesn't really need as high of clock speeds when the IPC is so much higher.
 

soresu

Platinum Member
Dec 19, 2014
2,662
1,862
136
How powerful can an ARM chip be though? Obviously it's very power efficient, but can it scale to high clock speeds in time to like 4-5GHz to match some of the more powerful x86 CPUs? And if so, is it still more power efficient than an equivalent x86 CPU?
Clock scaling is not ISA dependent so much as it is uArch dependent (obviously process too of course).

Point of fact I'm not even aware of anything intrinsically ISA dependent where clock scaling is concerned, although I imagine ISA standard register numbers probably factor in there or something.

An ARM uArch could potentially be just as much of an efficiency dumpster fire as Bulldozer or NetBurst in theory.
 
  • Like
Reactions: Nothingness

scannall

Golden Member
Jan 1, 2012
1,946
1,638
136
How powerful can an ARM chip be though? Obviously it's very power efficient, but can it scale to high clock speeds in time to like 4-5GHz to match some of the more powerful x86 CPUs? And if so, is it still more power efficient than an equivalent x86 CPU?

ARM, like Power, like x86 or Alpha is a standard that instructions set are built around. Some ISA's are more efficient than others. But at the end of the day, how many transistors can you throw at your problem,. And still make a profit on the die size. x86 as an ISA sucks to be honest. The big BUT is that is is everywhere. Gawd, how deep should I go on this?
 
Last edited:

Steelbom

Senior member
Sep 1, 2009
438
17
81
Clock scaling is not ISA dependent so much as it is uArch dependent (obviously process too of course).

Point of fact I'm not even aware of anything intrinsically ISA dependent where clock scaling is concerned, although I imagine ISA standard register numbers probably factor in there or something.

An ARM uArch could potentially be just as much of an efficiency dumpster fire as Bulldozer or NetBurst in theory.
I see, I see. Good to know.
ARM, like Power, like x86 or Alpha is a standard that instructions set are built around. Some ISA's are more efficient than others. But at the end of the day, how many transistors can you throw at your problem,. And still make a profit on the die size. x86 as an ISA sucks to be honest. The big BUT is that is is everywhere. Gawd, how deep should I go on this?
Ah really, didn't know that. I guess it would be interesting to see use ARM and mix things up.
 
  • Like
Reactions: wintercharm

naukkis

Senior member
Jun 5, 2002
706
578
136
How powerful can an ARM chip be though? Obviously it's very power efficient, but can it scale to high clock speeds in time to like 4-5GHz to match some of the more powerful x86 CPUs? And if so, is it still more power efficient than an equivalent x86 CPU?

So you don't have noticed that Apple A13 in a freakin phone is as powerful as fastest desktop x86 cpus. That's why whole discussion - Apple could emulate x86 with their desktop ARM cpu and have still almost comparable performance to fastest x86 cpus and with native ARM code have the fastest desktop machine easily. It's so much better than x86 cpus that question actually is why Apple haven't done the switch yet.
 

amrnuke

Golden Member
Apr 24, 2019
1,181
1,772
136
So you don't have noticed that Apple A13 in a freakin phone is as powerful as fastest desktop x86 cpus. That's why whole discussion - Apple could emulate x86 with their desktop ARM cpu and have still almost comparable performance to fastest x86 cpus and with native ARM code have the fastest desktop machine easily. It's so much better than x86 cpus that question actually is why Apple haven't done the switch yet.
Exactly. The current A13 is a 6 core (2+4) with 4 core GPU and 8 core NPU. We can see that the iPhone is capable of nearly 1080p gaming at 60fps. I mean, to me, it's a no-brainer that if they can do all this on a phone battery, with the limited size of the phone itself and its inherent heat/efficiency issues, they could do some really interesting things in the laptop space with an "L13" laptop chip based on a similar design.

The techie in me is actually super-excited to see consumer-space upscaling of ARM. May not be long, if it works well, before we see it in the iMac as well.
 
  • Like
Reactions: Richie Rich

Nothingness

Platinum Member
Jul 3, 2013
2,420
750
136
x86 as an ISA sucks to be honest
What an understatement :D And it looks like it doesn't want to die. Yet.

So you don't have noticed that Apple A13 in a freakin phone is as powerful as fastest desktop x86 cpus. That's why whole discussion - Apple could emulate x86 with their desktop ARM cpu and have still almost comparable performance to fastest x86 cpus and with native ARM code have the fastest desktop machine easily. It's so much better than x86 cpus that question actually is why Apple haven't done the switch yet.
Yes, A13 is competitive against Intel chips but the emulation tax is about 2x. So given that A13 ~= Intel, for emulated x86 programs you'd get half the speed of an equivalent x86 machine. This is one of the reasons they haven't yet switched.

Another reason is that it would prevent the use of Windows on their machines, something some say is very important.
 

amrnuke

Golden Member
Apr 24, 2019
1,181
1,772
136
Another reason is that it would prevent the use of Windows on their machines, something some say is very important.
Is Windows ARM dead in the water? I thought they were still pushing work on it.

Regardless, we're starting to see the two ISAs converge - Apple is scaling ARM up toward desktop-level performance maintaining significant efficiency, while Intel and AMD are scaling down power draw while maintaining significant performance. I don't see much difference in the two other than in dogma and compatibility. An x86-64 Zen2 core and an ARMv8.2 Cortex-A77 core are more similar to each other than an Opteron and the ARMv7-based CPUs.

The big issue is always going to be software, which is where ARM is going to have the most trouble in the laptop and desktop marketplace. For glorified tablets, like chromebooks, ARM is fine. But if you're looking at Office, Adobe, or many other professional applications, it's hard to see ARM making inroads any time soon in a landscape still dominated by Windows.
 
  • Like
Reactions: wintercharm

naukkis

Senior member
Jun 5, 2002
706
578
136
Yes, A13 is competitive against Intel chips but the emulation tax is about 2x. So given that A13 ~= Intel, for emulated x86 programs you'd get half the speed of an equivalent x86 machine. This is one of the reasons they haven't yet switched.

You forgot that you are comparing phone chip against desktop chip. Yes, Apple's phone chip is as fast as best Intel desktop but it's still chip in phone which have something like 4W burst and even lower sustained power budget for whole SOC. Give that same A13 chip more power budget and it will be faster. If whole design is optimized for desktop - or even for laptop performance will be much higher.
 

Thala

Golden Member
Nov 12, 2014
1,355
653
136
What an understatement :D And it looks like it doesn't want to die. Yet.


Yes, A13 is competitive against Intel chips but the emulation tax is about 2x. So given that A13 ~= Intel, for emulated x86 programs you'd get half the speed of an equivalent x86 machine. This is one of the reasons they haven't yet switched.

Another reason is that it would prevent the use of Windows on their machines, something some say is very important.

The emulation tax aside, which is a stop-gap solution, an A13 would easily beat an Intel/AMD chip at the same power budget. As i said, native apps are just a matter of compilation without any actual (or very minimal) code changes.
In any case the notion that you get the same speed of an equivalent Intel machine (say both 15W TDP) is just wrong.

If Boot-Camp continues to exists, you might be able to boot native ARM Windows without bigger issues*. And once you have ARM Windows running you can boot Linux as well via WSL. Thing of course is, unless it is a Qualcomm ARM CPU the drivers will be lacking.
So if you want to run Windows on an ARM SoC your best bet today is a Qualcomm device if you want to have full driver support.

*You need proper implementation of EFI in order to boot Windows.
 
Last edited:

soresu

Platinum Member
Dec 19, 2014
2,662
1,862
136
As i said, native apps are just a matter of compilation without any actual (or very minimal) code changes.
Nooooo, a massive codebase like Autodesk Maya for example could not simply be just recompiled like so.

The ease of recompilation you speak of sounds more like some sort of as yet unmade machine learning transpiler that has been trained on ISA and ABI semantics for translating between them in a similar fashion to human language translation.

I suggest you have a conversation with any halfway decent emulator coder to find out just how difficult it is to get one running on a new ISA, especially if it was never designed to be ISA cross platform from the beginning (ie program structure ABI abstractions and so forth).
 

Nothingness

Platinum Member
Jul 3, 2013
2,420
750
136
The emulation tax aside, which is a stop-gap solution, an A13 would easily beat an Intel/AMD chip at the same power budget. As i said, native apps are just a matter of compilation without any actual (or very minimal) code changes.
In any case the notion that you get the same speed of an equivalent Intel machine (say both 15W TDP) is just wrong.
According to @Andrei. the iPhone consumes up to 6.3W when running single threaded SPEC 2006. The test that consumes the least is 3.8W. So we have at least 3W per core (I don't know if Andrei isolates CPU power, I think he can't do that). Let's put 4 such chips on a SoC with a GPU and we easily get to 15W.

Also you can't just push voltage to increase frequency and hope everything will be well. If Apple really want to get to 3 GHz and up they will likely have to leave some IPC on the table by increasing pipeline depth and/or reduce width.

If Boot-Camp continues to exists, you might be able to boot native ARM Windows without bigger issues*. And once you have ARM Windows running you can boot Linux as well via WSL. Thing of course is, unless it is a Qualcomm ARM CPU the drivers will be lacking.
So if you want to run Windows on an ARM SoC your best bet today is a Qualcomm device if you want to have full driver support.

*You need proper implementation of EFI in order to boot Windows.
And you'll still be paying the emulation tax to run Windows x86 binaries with the added pain that x86-64 is unlikely to ever be emulated.

I am just playing the devil's advocate here: I don't care about Windows, I don't care about legacy and would buy an Axx device running MacOS X and Xcode natively without even thinking twice.
 

Nothingness

Platinum Member
Jul 3, 2013
2,420
750
136
Nooooo, a massive codebase like Autodesk Maya for example could not simply be just recompiled like so.

The ease of recompilation you speak of sounds more like some sort of as yet unmade machine learning transpiler that has been trained on ISA and ABI semantics for translating between them in a similar fashion to human language translation.

I suggest you have a conversation with any halfway decent emulator coder to find out just how difficult it is to get one running on a new ISA, especially if it was never designed to be ISA cross platform from the beginning (ie program structure ABI abstractions and so forth).
As far as I can see @Thala was talking about source recompilation, not dynamic translation. That's not trivial in all cases (intrinsics, assembly language, assumptions about memory order rules especially when multi-threading), but it's quite often not that difficult.

I remember the 32-bit to 64-bit transition when MIPS and Alpha went out. Now that was a pain because programmers assumed the size of an integer and a pointer were the same. But in most cases it was not that hard to fix. Just a pain, that was worth the cost :)
 

naukkis

Senior member
Jun 5, 2002
706
578
136
According to @Andrei. the iPhone consumes up to 6.3W when running single threaded SPEC 2006. The test that consumes the least is 3.8W. So we have at least 3W per core (I don't know if Andrei isolates CPU power, I think he can't do that). Let's put 4 such chips on a SoC with a GPU and we easily get to 15W.

With performance equal to Intel 4c desktop chips, not laptop.....

Also you can't just push voltage to increase frequency and hope everything will be well. If Apple really want to get to 3 GHz and up they will likely have to leave some IPC on the table by increasing pipeline depth and/or reduce width.

Only if they have critical hotspots in design. And as they could clock their chip to 2.66GHz in phone you estimate that already 3GHz with much greater thermal and power envelope would be a problem?
 

Thala

Golden Member
Nov 12, 2014
1,355
653
136
Nooooo, a massive codebase like Autodesk Maya for example could not simply be just recompiled like so.

The ease of recompilation you speak of sounds more like some sort of as yet unmade machine learning transpiler that has been trained on ISA and ABI semantics for translating between them in a similar fashion to human language translation.

I suggest you have a conversation with any halfway decent emulator coder to find out just how difficult it is to get one running on a new ISA, especially if it was never designed to be ISA cross platform from the beginning (ie program structure ABI abstractions and so forth).

The question of compilation is not related to the size of the code base - at all.
I am also not talking about re-compilation at runtime - just static compilation of the source code.

That's not trivial in all cases (intrinsics, assembly language, assumptions about memory order rules especially when multi-threading), but it's quite often not that difficult.

Let me quickly address the issues you mentioned:

1) intrinsics and assembly - many programs indeed use intrinsics and assembly. However in every case known to me there is a code path available without intrinsics and assembly. The modification typically consists of enabling the code path in the sources - which is typically very simple.*

2) memory ordering is (typically) not visible at application level - it is hidden in the implementation of OS primitives. So you multi-threaded code just runs under ARM without any modification. If you do synchronization without any OS primitives (which you should not do anyway, and i do not know any app which behaves like this) then you need to add memory barriers on ARM.

*Just last week i compiled 7-zip for ARM64 and it did contain assembly for x86, which i could easily switch over to the available c-functions. And while i were at it, i did add ARM64 intrinsics for CRC calculation. The slighly modified source code now compiles for x86,x64 and ARM64 :)
 

Nothingness

Platinum Member
Jul 3, 2013
2,420
750
136
With performance equal to Intel 4c desktop chips, not laptop.....
Well single thread performance for short periods of time is not that far between laptops and desktops: i7-1065G7 is within 15% of i7-9900K on CINT2006. OTOH for sustained performance and number of cores desktop is way better.

A13 with 52.8 is between i7-1065G7 47.7 score and i7-9900K 54.3. That's extremely impressive but I would not be surprised if a 4 core A13 was rated at more than 10W TDP.

Only if they have critical hotspots in design. And as they could clock their chip to 2.66GHz in phone you estimate that already 3GHz with much greater thermal and power envelope would be a problem?
Given how wide the chip is, I'm pretty sure they have many critical paths.
 

Thala

Golden Member
Nov 12, 2014
1,355
653
136
Well single thread performance for short periods of time is not that far between laptops and desktops: i7-1065G7 is within 15% of i7-9900K on CINT2006. OTOH for sustained performance and number of cores desktop is way better.

That is because a single core does not use close to the power available to desktop CPUs from TDP perspective. So a much lower rated mobile CPU does achieve similar performance in single core indeed. However if you are running all cores, it suddenly shows that the desktop TDP is actually required for high performance.

A13 with 52.8 is between i7-1065G7 47.7 score and i7-9900K 54.3. That's extremely impressive but I would not be surprised if a 4 core A13 was rated at more than 10W TDP.

My take is 8xA13 @3GHz < 15W. Not sure where you got the idea from that a single A13 core uses 3W? In this case it would be much less efficient than say Cortex A76...which consumes roughly 750mW@2.8GHz.
Likewise Micrsoft put 4xCortex A76@3GHz + 4Cortex A55@??GHz into a 7W TDP machine (Surface Pro X).

Given how wide the chip is, I'm pretty sure they have many critical paths.

Does not really matter how many critical pathes you have - they all benefit equally from higher voltage. Given that the most critical path allows 2.6GHz at something like 0.8V VCC shows how much frequency headroom there is. And this is without significant binning - the fastest chips are most likely much better.
 
Last edited:

soresu

Platinum Member
Dec 19, 2014
2,662
1,862
136
The question of compilation is not related to the size of the code base - at all.
If it is so easy then why have the big software packages like Adobe CC (sans Photoshop/Premiere) and Autodesk's assorted mammoths not been turned out for WARM already?

Where are all the ARM ports for AAA games?

Yes, a handful have made it to the Switch, but these are big earners so quelle surprise that they would invest the time to port them.

God knows that WARM would be a much easier sell for Microsoft and Qualcomm if those things were in abundance natively on the platform.
 

Thala

Golden Member
Nov 12, 2014
1,355
653
136
If it is so easy then why have the big software packages like Adobe CC (sans Photoshop/Premiere) and Autodesk's assorted mammoths not been turned out for WARM already?

Where are all the ARM ports for AAA games?

Yes, a handful have made it to the Switch, but these are big earners so quelle surprise that they would invest the time to port them.

God knows that WARM would be a much easier sell for Microsoft and Qualcomm if those things were in abundance natively on the platform.

There is a big difference of something being easy and the decision of management of SW companies to make such a decision.

Besides you still not understand the difference of porting something - and just targetting another architecture. For Switch you really have to port something - because the Switch does not run Windows and as such does not support Windows APIs. The fact that the Switch featuring an ARM CPU is almost irrelevant when it comes to porting efforts - the differences of API however do matter a lot.
Corollary if the APIs are identical but the target architecture is different - then you just compile for the new target architecture without any (or very small) code modification.

Point in case...7Zip i just recompiled for ARM64. It now runs natively on any Windows ARM machine. Does it run on the Switch? Not even remotely - someone would need to port it.
 
Last edited:
  • Like
Reactions: Lodix

soresu

Platinum Member
Dec 19, 2014
2,662
1,862
136
Corollary if the APIs are identical but the target architecture is different - then you just compile for the new target architecture without any (or very small) code modification.
So Firefox took years to move to x64 on Windows because of API differences from 32 bit Windows then?