Solved! ARM Apple High-End CPU - Intel replacement

Page 7 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Richie Rich

Senior member
Jul 28, 2019
470
229
76
There is a first rumor about Intel replacement in Apple products:
  • ARM based high-end CPU
  • 8 cores, no SMT
  • IPC +30% over Cortex A77
  • desktop performance (Core i7/Ryzen R7) with much lower power consumption
  • introduction with new gen MacBook Air in mid 2020 (considering also MacBook PRO and iMac)
  • massive AI accelerator

Source Coreteks:
 
  • Like
Reactions: vspalanki
Solution
What an understatement :D And it looks like it doesn't want to die. Yet.


Yes, A13 is competitive against Intel chips but the emulation tax is about 2x. So given that A13 ~= Intel, for emulated x86 programs you'd get half the speed of an equivalent x86 machine. This is one of the reasons they haven't yet switched.

Another reason is that it would prevent the use of Windows on their machines, something some say is very important.

The level of ignorance in this thread would be shocking if it weren't depressing.
Let's state some basics:

(a) History. Apple has never let backward compatibility limit what they do. They are not Intel, they are not Windows. They don't sell perpetual compatibility as a feature. Christ, the big...

Nothingness

Platinum Member
Jul 3, 2013
2,405
736
136
Google seem to be on a similar track with Chrome OS, I think they ported Android Studio to work on it?

If so, I would be surprised if Apple is not pursuing similar efforts with iPad OS.
If they did that, that'd also mean allowing generating code natively. I don't think Apple want that.
 

Thunder 57

Platinum Member
Aug 19, 2007
2,674
3,796
136
And desktop cpu's will lose to server farm. I did though that we are comparing single-core performance, for which best available measuring tool which is also used with cpu manufacturers is spec. And those Anand spec tests show that in specInt A13 and 9900K integer scores are equal. And equal spec scores from phone and best performing desktop cpu is pretty funny thing to happen - most people still refuse to believe it's actually happening. And it should not happen, only reason of such a weird results are that Apple has extremely good cpu arch for phone and at similar time x86 manufacturers have piss poor archs for desktop.

:rolleyes: Piss poor doesn't work in a competitive environment. Do you (and others) really think you're the first to think "ARM awesome! x86 sucks!, Let's scale ARM up"? I'd be willing to bet that in five years ARM is still limited to low power devices while x86 is used in anything where performance matters.
 
  • Like
Reactions: Magic Carpet

scannall

Golden Member
Jan 1, 2012
1,946
1,638
136
:rolleyes: Piss poor doesn't work in a competitive environment. Do you (and others) really think you're the first to think "ARM awesome! x86 sucks!, Let's scale ARM up"? I'd be willing to bet that in five years ARM is still limited to low power devices while x86 is used in anything where performance matters.
It's more a chicken and the egg thing. x86 isn't good, but it's deeply embedded everywhere. It's hard enough to get companies to move from one vendor to another within x86, let alone to a different ISA. Just the Apple A series chips indicate there is a lot available under the hood, if you throw the transistor count at it. But moving the world to a different ISA? That won't happen anytime soon. It has nothing to do with good or bad, and everything to do with inertia.
 

soresu

Platinum Member
Dec 19, 2014
2,657
1,858
136
Do you (and others) really think you're the first to think "ARM awesome! x86 sucks!, Let's scale ARM up
Considering Neoverse N1 and China's Phytium Xiaomi/Mars designs are already in the region of 64 cores, I'd say it's already scaled up.

I'm less certain about the Phytium design, but the N1 is a server oriented A76 variant, so not exactly a slouch at 64 cores, even with NEON's intrinsic vector length limiting N1's SIMD code viability vs Epyc alternatives.

So long story short, for scalar code and less vector/SIMD heavy code you will get a pretty efficient chip.

Obviously being A76 based it also lacks SMT.
 

beginner99

Diamond Member
Jun 2, 2009
5,210
1,580
136
And equal spec scores from phone and best performing desktop cpu is pretty funny thing to happen

It isn't if the test is short enough. The main difference between your magical phone and a desktop is only cooling ability, else there is nothing that hinders you to put whatever performant chip into a phone, it will just throttle quicker. or said otherwise the difference is power management and not peak performance. Just look at intel laptops were the exact same CPUs go as in the desktop. the laptop version will run short running tests at the same speed as the desktop (at same clock speed) because it's the same cpu. It will quickly fall behind though because it will need to throttle.

In fact a large single-core good at burst loads would be ideal for a phone (not many things running parallel) compared to desktop/server.
 

Nothingness

Platinum Member
Jul 3, 2013
2,405
736
136
:rolleyes: Piss poor doesn't work in a competitive environment. Do you (and others) really think you're the first to think "ARM awesome! x86 sucks!, Let's scale ARM up"? I'd be willing to bet that in five years ARM is still limited to low power devices while x86 is used in anything where performance matters.
I invite you to look at the system ranked #156 on the Top500 supercomputer list. Also read about Fujitsu A64FX. And various workstations and blades built.

In fact what is sorely missing is a good laptop/desktop. But claiming ARM is going only to low power devices, I'm afraid just shows you don't really know what you are talking about.
 
Last edited:
  • Like
Reactions: wintercharm

Nothingness

Platinum Member
Jul 3, 2013
2,405
736
136
In fact a large single-core good at burst loads would be ideal for a phone (not many things running parallel) compared to desktop/server.
That's not entirely correct: many small tasks have to run continuously on a phone. That's why have a big core and several smaller ones makes a lot of sense. For a desktop that'd be useless, but I wonder if for a laptop small cores could not be of use.
 

SarahKerrigan

Senior member
Oct 12, 2014
361
514
136
Never thought about it like that. Yes, x86 assembly sucks. That's probably why they teach MIPS or ARM assembly at most universities. I did MIPS, but they did spend some time on x86 as well. So the assembly may suck, but x86 is still good.



Because it's not. Let's see that phone transcode some h264.

You may be surprised to learn that 264 is included in SPEC - subtest 464.h264ref. Of course, it's a reference C encoder, which means we can't resort to "See??? Hand-tuned assembly on x86 beats C on ARM!" arguments - but the A13 does rather well at it.

The amount of "Apple's cores absolutely can't keep up with Intel's, despite all benchmark evidence available" I'm seeing really surprises me. From an Occam's Razor perspective, it seems irrational. I'm fairly sure there are cases where Apple's clock-normalized wins over Intel are smaller than some of the ones I've seen, but every piece of evidence shows that Apple has developed a powerful and credible core, and that migrating the Mac, if they wanted to do it, is within technical reach. And the rest of the ARM ecosystem hasn't stopped moving either - Fujitsu A64FX is far from "slow and low power."
 

insertcarehere

Senior member
Jan 17, 2013
639
607
136
For laptops it can work yes maybe even imac with some compromise. But macpro is not doable. You can get macpros with dual xeons and with the next macpro the could go with zen 2 with up to 64 cores.
Even if we assume Apple somehow can't just go the AMD Chiplet route for higher performance applications, a large monolithic die of ~250mm2 on 7nm (Navi 10 size) would fit a buttload of these cores, should Apple really want to pursue such a path (assuming they don't just let the mac pro rot on the vine).

Proof needed. Single-threaded in what? geekbench? How do you get the power numbers per core? All the non-core stuff which desktop CPUs have a lot more uses power too. A13 doesn't have pcie, sata or infinity fabric power use. And we have seen how much power that stuff uses.
Only thing I can say is that Apple is known to offer subpar connectivity for the price and makes additional profit on that by selling dongles. Might work for laptop where fanboys still buy even so it only has 1 usb-c port. but on mac pro? Nope.
spec2006-a13_575px.png

Notice the A13 scoring above the 3900x on SpecInt2006 and just below on SpecFP2006, a single-threaded test
3900X_power_575px.png

Correction: On a chip-by-chip comparison (the A13 in the iphone is almost certainly measured on a chip level), the 3900x consumes 50 W, vs the A13s ~5w, of which ~20w was attributed to the Ryzen 2 core. I shouldn't have to remind you that the A13 SoC also integrates stuff like a LTE Modem, iGPU and various ISPs which are absent on any Ryzen 2 chip.

burst performance is one thing sustained completely different. Why can I run the same core in a 5w laptop that is in a full blown server with >200w? Sustained performance. The jump from ULV to server cpus in terms of wattage is far bigger than from Axx to laptop cpus. So your point isn't really much of value. The main difference simply is power management or sustained performance.

And then the fact that apple designs 1 soc that goes into mobile devices only in contrast to intels cores that go into laptops to desktops to servers. And the cores 5w CPUs in ULV laptops are identical to the cores in a 9900k using close to 150w. What's the difference? Cooling or said otherwise sustained performance.

Why are you penalizing the Apple A13 for being placed in a far more thermally limiting form factor than any x86 processor that's within the same galaxy in terms of performance? Place the same chip in a motherboard with an actively cooled heatsink and suddenly the sustained chip performance would be stellar.
 

the2199

Junior Member
Oct 17, 2019
13
4
81
It's more a chicken and the egg thing. x86 isn't good, but it's deeply embedded everywhere. It's hard enough to get companies to move from one vendor to another within x86, let alone to a different ISA. Just the Apple A series chips indicate there is a lot available under the hood, if you throw the transistor count at it. But moving the world to a different ISA? That won't happen anytime soon. It has nothing to do with good or bad, and everything to do with inertia.
Yes this is so correct

I cannot fathom how people think moving from x86 to arm is a piece of cake even for a company with a looked ecosystem like apple

every company will have some sort of limit and even if the company is apple

and even if there is a supercomputer running arm that dome don't every developer in the world will start optimizing their software for arm

because optimizing your software or a certain (ISA) is not done by a single press on the keyboard. it takes a lot of effort and hard work to optimize your software for a specific (ISA) not just running it. and not video encoding program that lacks optimization like compression libraries like libzip and 3d rendering software blender also lack optimization

so if apple starts using arm for there laptops and someone tests it. and run blender and 7zip and x264 or x265 or RAV1E .and let say it is an 8 core CPU and all the cores are big because laptops have higher thermal budget and it is running at 3.5 ghz on all cores and 4 ghz on a single core .and we will underclock any intel or amd CPU to the same frequency .and the result arm cpu is more efficient compared to the x86 but the x86 CPU will buy way faster in any CPU benchmark

using the most popular (ISA) in the world has to account for something. and that something is getting an insane amount of cpu optimization

and intel and amd are going to improve their IPC year after a year

and i love arm and i have 5 raspberry pi 4

arm will be used for laptops for the extremely high battery life mode but the power user laptop will be x86

and will be used in server the company does not need high CPU power .and don't love to pay for high electricity bill and then they will hire developers to optimize the code for arm

people, when you think about Hardware, think about software also. because hardware worth nothing without software.
 

name99

Senior member
Sep 11, 2010
404
303
136
Yes this is so correct

I cannot fathom how people think moving from x86 to arm is a piece of cake even for a company with a looked ecosystem like apple

every company will have some sort of limit and even if the company is apple

and even if there is a supercomputer running arm that dome don't every developer in the world will start optimizing their software for arm

because optimizing your software or a certain (ISA) is not done by a single press on the keyboard. it takes a lot of effort and hard work to optimize your software for a specific (ISA) not just running it. and not video encoding program that lacks optimization like compression libraries like libzip and 3d rendering software blender also lack optimization

so if apple starts using arm for there laptops and someone tests it. and run blender and 7zip and x264 or x265 or RAV1E .and let say it is an 8 core CPU and all the cores are big because laptops have higher thermal budget and it is running at 3.5 ghz on all cores and 4 ghz on a single core .and we will underclock any intel or amd CPU to the same frequency .and the result arm cpu is more efficient compared to the x86 but the x86 CPU will buy way faster in any CPU benchmark

using the most popular (ISA) in the world has to account for something. and that something is getting an insane amount of cpu optimization

and intel and amd are going to improve their IPC year after a year

and i love arm and i have 5 raspberry pi 4

arm will be used for laptops for the extremely high battery life mode but the power user laptop will be x86

and will be used in server the company does not need high CPU power .and don't love to pay for high electricity bill and then they will hire developers to optimize the code for arm

people, when you think about Hardware, think about software also. because hardware worth nothing without software.

Meanwhile in the real world:

Blender also exists for ARM. I've previously mentioned Wolfram Player/Mathematica.

Obviously optimization is not as advanced on ARMv8 as x86, but the companies involved can see the way the world is headed and are willing to make the investment to move their code bases across.
For more data points, of course the client MS (and Apple) properties have already moved over.

As for " if apple starts using arm for there laptops and someone tests it. and run blender and 7zip and x264 or x265 or RAV1E "; well the whole paragraph rapidly loses coherence, but I think what it is trying to claim is that
- these apps will run slower on an Apple ARM desktop than they do on current Macs AND
- that people will care.
To which I say bwah hah ha hah.

There comes a point at which reality denial transitions from amusing to irritating to just sad.
 

soresu

Platinum Member
Dec 19, 2014
2,657
1,858
136
Blender also exists for ARM.
I'd love to see a Neoverse N1 64 core bench result from that.
As for " if apple starts using arm for there laptops and someone tests it. and run blender and 7zip and x264 or x265 or RAV1E "; well the whole paragraph rapidly loses coherence, but I think what it is trying to claim is that
The rav1e guys are in the midst of porting the dav1d 8 bit NEON optimisations to their codebase, so it's far from done yet, and way behind x264 or x265 - unsurprising given the age of AV1 itself.
 

TheGiant

Senior member
Jun 12, 2017
748
353
106
Meanwhile in the real world:

Blender also exists for ARM. I've previously mentioned Wolfram Player/Mathematica.

Obviously optimization is not as advanced on ARMv8 as x86, but the companies involved can see the way the world is headed and are willing to make the investment to move their code bases across.
For more data points, of course the client MS (and Apple) properties have already moved over.

As for " if apple starts using arm for there laptops and someone tests it. and run blender and 7zip and x264 or x265 or RAV1E "; well the whole paragraph rapidly loses coherence, but I think what it is trying to claim is that
- these apps will run slower on an Apple ARM desktop than they do on current Macs AND
- that people will care.
To which I say bwah hah ha hah.

There comes a point at which reality denial transitions from amusing to irritating to just sad.
I dont think Apple will go this way
with Icelake and Tigerlake/AMD alternative they dont have the horsepower to big desktop/WS computing

IMO Apple starts to create an home/office ecosystem that merges together computational abilities of all apple devices on one network. It started with option to use Ipad Pro as second display for macbook pro
It will continue with Ipad using as computing resource sharing multithreaded workloads between devices
Continue to Iphones

We dont need more cores on the desktop

I have at home 6600K (sons), 3900X (mine), surface pro 4 (i5 6300U), wife (thinkpad with 7300U)
Cell phones huawei p20pro, iphone SE 2x
It will son continue with both daughters (probably they get the SE iphones and wife and son get new SE2)

All phones are happy idling on the cables and desktops most of the night

So I have more computing power in workload that doesnt need any interactive behaviour (encoding, rendering, computing, simulations.....). You just throw the load at the home farm.
The only thing that remains is gaming, which cannot be transfered cause of latency/interactive behaviour), but we will see with the xCloud etc

IMO Apple has the HW, SW, platform, everything to use this advantage and I will be very happy to join it
 

amrnuke

Golden Member
Apr 24, 2019
1,181
1,772
136
Even if we assume Apple somehow can't just go the AMD Chiplet route for higher performance applications, a large monolithic die of ~250mm2 on 7nm (Navi 10 size) would fit a buttload of these cores, should Apple really want to pursue such a path (assuming they don't just let the mac pro rot on the vine).


spec2006-a13_575px.png

Notice the A13 scoring above the 3900x on SpecInt2006 and just below on SpecFP2006, a single-threaded test
3900X_power_575px.png

Correction: On a chip-by-chip comparison (the A13 in the iphone is almost certainly measured on a chip level), the 3900x consumes 50 W, vs the A13s ~5w, of which ~20w was attributed to the Ryzen 2 core. I shouldn't have to remind you that the A13 SoC also integrates stuff like a LTE Modem, iGPU and various ISPs which are absent on any Ryzen 2 chip.



Why are you penalizing the Apple A13 for being placed in a far more thermally limiting form factor than any x86 processor that's within the same galaxy in terms of performance? Place the same chip in a motherboard with an actively cooled heatsink and suddenly the sustained chip performance would be stellar.
This is correct, even though it doesn't tell the whole story.

We will need to see how the A13 or its laptop/desktop equivalent performs in sustained tasks such as Blender, Corona, Cinema4D - as well as narrowly-threaded usages like gaming - before we can judge its performance. SpecInt and SpecFP are impressive but I want to see what happens when Apple's design includes 8 Lightning cores and a decent thermal solution so we can test it in a way more similar to the real world.
 
  • Like
Reactions: wintercharm

beginner99

Diamond Member
Jun 2, 2009
5,210
1,580
136
I cannot fathom how people think moving from x86 to arm is a piece of cake even for a company with a looked ecosystem like apple

Exactly. most companies even have problems moving away from oracle and if you think x86 is expensive think again. oracle is sucking you dry and still many think it's better than migrating all software or it literally takes 1 decade.
 
  • Like
Reactions: the2199

the2199

Junior Member
Oct 17, 2019
13
4
81
Meanwhile in the real world:

Blender also exists for ARM. I've previously mentioned Wolfram Player/Mathematica.

Obviously optimization is not as advanced on ARMv8 as x86, but the companies involved can see the way the world is headed and are willing to make the investment to move their code bases across.
For more data points, of course the client MS (and Apple) properties have already moved over.

As for " if apple starts using arm for there laptops and someone tests it. and run blender and 7zip and x264 or x265 or RAV1E "; well the whole paragraph rapidly loses coherence, but I think what it is trying to claim is that
- these apps will run slower on an Apple ARM desktop than they do on current Macs AND
- that people will care.
To which I say bwah hah ha hah.

There comes a point at which reality denial transitions from amusing to irritating to just sad.
what a pathetic response .but that to be expected because it is coming from an (apple delusional). and i didn't say that blender does not exist on arm. I said that it is not optimized
 
Last edited:

Richie Rich

Senior member
Jul 28, 2019
470
229
76
Apple new 5nm A14 samples delivered in late September: https://wccftech.com/a14-5nm-iphone-12-september/
Zen 4 at 5nm is planned for 2021, one year later than A14 (Zen 2 was one year after A12). It makes sense IMO.

New A14 will be a likely a base for ARM MacBook. Assuming that A13 didn't upgrade the cores at all, the new A14 will bring new uarch with significant performance jump as A11/12 did. Nice coincidence.
 

Richie Rich

Senior member
Jul 28, 2019
470
229
76
Could someone help to explain A12 Vortex core dis-balance, please?
We know that 50% of instructions are load/store, from those are mostly load ones. Given that it means Vortex core has a significant dis-balance between 6xALUs and 2xLSU.

Second thing is the performance. The two aditional ALUs are simple/branch shared type. This could bring theoretically approx +20-30% IPC? However Vortex deliver +58% of IPC over Skylake which is 3x more than what is expected. Combination of dis-balance and high performance is mystery. There must be something smart inside.

Did Apple engineers developed some new advanced technique at reorder buffer? Something like load ROB predictor was on Conroe? Or are they using such a large instruction window that they can extract very high ILP and eliminate these costly load/store instructionsat the same time?
 
  • Love
Reactions: wintercharm

Nothingness

Platinum Member
Jul 3, 2013
2,405
736
136
Random stuff for a better IPC:
- Shorter mispred replay latency
- Lower load-to-use latency
- Better branch prediction
- Longer pipe stages due to lower clocks