Solved! ARM Apple High-End CPU - Intel replacement

Richie Rich · Oct 14, 2019

There is a first rumor about Intel replacement in Apple products:

ARM based high-end CPU
8 cores, no SMT
IPC +30% over Cortex A77
desktop performance (Core i7/Ryzen R7) with much lower power consumption
introduction with new gen MacBook Air in mid 2020 (considering also MacBook PRO and iMac)
massive AI accelerator

Source Coreteks:

Glo. · Jun 29, 2020

Eug said:
I’m waiting for A14X before I buy.

Truthfully though, I don’t care about the CPU speed so much. I’m more interested in the other upgrades like mini-LED. I’m not buying another Mac anytime soon. I’ll be buying an iPad Pro.

I want a New iPad Mini, with Pro's design

.

Antey · Jun 29, 2020

Why is the Samsung galaxy book with 8cx on windows 10 ARM getting better results than the A12Z on MacOS (compared to android/ios)? isnt macOS already arm native in these machines? whats the difference?

SAMSUNG ELECTRONICS CO., LTD. Galaxy Book S - Geekbench

Benchmark results for a SAMSUNG ELECTRONICS CO., LTD. Galaxy Book S with an Intel Pentium II/III processor.

browser.geekbench.com

jpiniero · Jun 29, 2020

Antey said:
Why is the Samsung galaxy book with 8cx on windows 10 ARM getting better results than the A12Z on MacOS (compared to android/ios)? isnt macOS already arm native in these machines? whats the difference?

SAMSUNG ELECTRONICS CO., LTD. Galaxy Book S - Geekbench

Benchmark results for a SAMSUNG ELECTRONICS CO., LTD. Galaxy Book S with an Intel Pentium II/III processor.

browser.geekbench.com

Geekbench 5 does have a aarch64 version on Windows.

Eug · Jun 29, 2020

Antey said:
Why is the Samsung galaxy book with 8cx on windows 10 ARM getting better results than the A12Z on MacOS (compared to android/ios)? isnt macOS already arm native in these machines? whats the difference?

SAMSUNG ELECTRONICS CO., LTD. Galaxy Book S - Geekbench

Benchmark results for a SAMSUNG ELECTRONICS CO., LTD. Galaxy Book S with an Intel Pentium II/III processor.

browser.geekbench.com

The Geekbench scores for Mac ARM are a result of Rosetta translation.

gdansk · Jun 29, 2020

Richie Rich said:
This roseta2 leak is 830 pts @ 2.4 GHz, resulting in 346 pts/GHz .... which is 78% performance of native A12.

If you check the json it's closer to 2.5GHz

Richie Rich said:
Another thing is that all core freq for Zen2 is around 4.2 GHz. Thats far far away from 4.7 GHz.

Yes, and will run all their cores at 3.3GHz too

Well I suppose it'll be closer due to the new node. I find it *extremely* unlikely that one CPU with 4 ALU and 3 AGU will be 40% higher IPC than another CPU with 4 ALU and 3 AGU. The 15 execution units isn't going to help anything because 4 of them are extremely limited in function. How is it a benefit that indirect memory load uses two execution slots rather than one? This is how it boils down X1 vs Zen2.

And Rich, how much higher IPC does A78 have compared to Zen 2?

Richie Rich · Jun 29, 2020

gdansk said:
Well I suppose it'll be closer due to the new node. I find it *extremely* unlikely that one CPU with 4 ALU and 3 AGU will be 40% higher IPC than another CPU with 4 ALU and 3 AGU. The 15 execution units isn't going to help anything because 4 of them are extremely limited in function. How is it a benefit that indirect memory load uses two execution slots rather than one? This is how it boils down X1 vs Zen2.

And Rich, how much higher IPC does A78 have compared to Zen 2?

A78 is about 8% higher in IPC (GeekBench5) than Zen2 while having 1.33 mm2 core vs. 3.6mm2 Zen2.
A78 is about 15% higher in IPC (SPEC).

Info - TOP 20 of the World's Most Powerful CPU Cores - IPC/PPC comparison

Added cores: A53 - little core used in some low-end smartphones in 8-core config (Snapdragon 450) A55 - used as little core in every modern Android SoC A72 - "high" end Cortex core used in Snapdragon 625 or Raspberry Pi 4 A73 - "high" end Cortex core A75 - "high" end Cortex core Bulldozer -...

forums.anandtech.com

*extremely* unlikely? I doubt ARM LLC is lying about performance projections. It's pretty easy to run benchmark in SW simulation. We can SW simulate engine dyno test and export engine sound in WAV file 20 years in automotive industry. I bet they run a lot of benchmarks in SW simulation while they do the development iteration. What about 82% higher IPC for 6xALU and 2 AGU core like A13? That's the stone hard reality that's lazy x86 world is facing right now.

gdansk · Jun 29, 2020

Richie Rich said:
*extremely* unlikely? I doubt ARM LLC is lying about performance projections. It's pretty easy to run benchmark in SW simulation. We can SW simulate engine dyno test and export engine sound in WAV file 20 years in automotive industry. I bet they run a lot of benchmarks in SW simulation while they do the development iteration. What about 82% higher IPC for 6xALU and 2 AGU core like A13? That's the stone hard reality that's lazy x86 world is facing right now.

I'm not implying they're lying. I'm saying that your numbers are not inline with their projections. On paper, I do not see anyway that A78 would be higher IPC than Zen2. Same number of ALU and AGU, half the FPU, equal 4 way decode, smaller caches. I argue that any compiler than can use AVX2 instructions will achieve higher IPC on Zen2 than on A78.

Doug S · Jun 29, 2020

Glo. said:
eperm-d995af6e2ef02771 - Geekbench 5 CPU Search - Geekbench

800 Pts in Geekbench v5 on MacOS Big Sur single core score, 2600 pts Multicore score. A14Z, latest, and greatest from Apple. 4C/4C design.

2020 MacBook Air with 2C/4T:
1005 Pts, single threaded, 2000 pts multithreaded score.

Essentially means that ARM still has large disadvantage to x86, even in Apple designs.

Secondly, the scores in iOS platform, are extremely skewed by the platform's performance. On MacOs, the scores in GB5 are lower than on iOS. Which means, that simply iOS platform is extremely well optimized.

There is no more equal level comparison right now between both arch's on one platform, now. So yeah, ARM v9 at best will be tying with x86 Intel's. But still might be losing to AMD's designs.

I wonder what will Richie Rich say about those scores of A14Z under MacOS in GB5...

A solution Apple hacked up for developers (i.e. which will never be sold) using a two year out of date core, running a pre-release OS and a pre-release Rosetta produces results under x86 and you are going to argue it means "ARM still has a large disadvantage to x86". You should go into politics, you obviously have a pretty strong ability to twist reality into whatever form serves your pre-existing biases.

If you've ever used pre-release stuff, and from this it is apparent you have not, it is always loaded down with a lot of debugging code that saps performance. That's beyond the fact it is a two year old core, running pre-release software, and emulating x86. You think this means redacted for the performance of the Macs Apple will release in December? Wow, you really are clueless.

We have a zero tolerance policy for profanity in the tech sub-forums.
Don't do it again.

Iron Woode
Super Moderator

Thunder 57 · Jun 29, 2020

Richie Rich said:
There are some license limits. I doubt you can use ARM's IP for modification and using for another ISA. AMD did GREAT MISTAKE by canceling their K12 ARM core. No doubt about that today. Back in 2015 I considered that as good move but not today. Keller was right again. The thing is that Nvidia is licensing Cortex A78 cores so Nvidia can finally beat AMD with Cortex CPUs. Isn't that funny? I guess some AMD hard core fans could have brain stroke from that.

I dare say (again) that AMD wouldn't exist today if they focused on K12 instead of Zen. You must think Lisa Su is an idiot. Sure, maybe Keller wanted to pursue K12 but AMD didn't have the resources to do both at the time.

No. Apple is the CPU tech leader. They have about 5 years of advantage to x86 now. They will offer double performance per watt in laptops, the new A14 will be the most powerfull CPU in ST and will beat Zen3@4.6 GHz easily.

Hahaha

. Lets wait until we see what Apple puts out before we make such bold statements. As always, you state unknown things as fact. People here don't care for that.

ARM is the ISA leader now. Upcoming ARMv9 and SVE2 2048-bit capable SIMDs. Fugaku super computers thanks to SVE can destroy GPU based super computers. And you will have smartphones with ARMv9 SVE2 in H2 2021.

Intel and AMD will have to fight very hard for survival. With the current speed of their development they are dead already IMO.

Thanks for another laugh. You are like a broken record. 6xALU, SMT4, SVE2 2048. I guess GPU's are dead since we can all just use SVE now? Oh, and AMD and Intel are already dead and just living on borrowed time. Sounds about right.

Glo. · Jun 29, 2020

Doug S said:
A solution Apple hacked up for developers (i.e. which will never be sold) using a two year out of date core, running a pre-release OS and a pre-release Rosetta produces results under x86 and you are going to argue it means "ARM still has a large disadvantage to x86". You should go into politics, you obviously have a pretty strong ability to twist reality into whatever form serves your pre-existing biases.

If you've ever used pre-release stuff, and from this it is apparent you have not, it is always loaded down with a lot of debugging code that saps performance. That's beyond the fact it is a two year old core, running pre-release software, and emulating x86. You think this means redacted for the performance of the Macs Apple will release in December? Wow, you really are clueless.

Butthurt or outraged, hmm?

Again, Shadow of the Tomb Raider was running on this very silicon with better performance than Renoir can deliver in 1080p. This games example people used, to prove how Rosetta 2 was advanced, and efficient, after the keynote.

Now they turn tables to prove its not optimized, yet?

I guess, Rosetta 2 has to be actually pretty efficient in translating the code, after all, hmm?

marcUK2 · Jun 29, 2020

If the rumoured 50% increase in fpu performance is real for zen3, we will find out in a few months, and I'm sure that levels the field to arm ipc somewhat in stuff that actually matters.

I'm not so knowledgeable on arm, apart from efficiency, are they strong in integer/float/vector ?

JasonLD · Jun 29, 2020

Glo. said:
Butthurt or outraged, hmm?

Again, Shadow of the Tomb Raider was running on this very silicon with better performance than Renoir can deliver in 1080p. This games example people used, to prove how Rosetta 2 was advanced, and efficient, after the keynote.

Now they turn tables to prove its not optimized, yet?

I guess, Rosetta 2 has to be actually pretty efficient in translating the code, after all, hmm?

75% of native performance looks pretty impressive for emulation. A14 based Apple Macs could be looking at performance almost as good as previous generation Macs on non-native Apps at equal amount of cores.

marcUK2 · Jun 29, 2020

I remember g4,...I had one, way faster than x86 because of altivec.....for a few months, then it turned into a farce. Admittedly not apples fault, but not great for the platform. Nowadays I think processors are so fast, it's kind of irrelevant for most things. I don't get what Apple stands to really gain here. Loss of bootcamp. Annoying developers, questionable performance, questionable long term development...
maybe it's just a sales trick...I guess they did this twice already, and the faithful upgrade, might be a boost in sales during the recession, or could be a complete flop depending how bad it gets.
no doubt they could have got a hell of a deal from AMD if they were annoyed with Intel, saving them whatever they may save by fabbing their own chips. And tbh, renoir, 4000 and 5000 is surely beyond most peoples laptop requirements. And losing 16core 3950x imacs, zen 3 32/64 core threadrippers this year...could really have given the mac a good kick up the performance charts

gdansk · Jun 29, 2020

marcUK2 said:
I remember g4,...I had one, way faster than x86 because of altivec.....for a few months, then it turned into a farce. Admittedly not apples fault, but not great for the platform. Nowadays I think processors are so fast, it's kind of irrelevant for most things. I don't get what Apple stands to really gain here. Loss of bootcamp. Annoying developers, questionable performance, questionable long term development...
maybe it's just a sales trick...I guess they did this twice already, and the faithful upgrade, might be a boost in sales during the recession, or could be a complete flop depending how bad it gets.
no doubt they could have got a hell of a deal from AMD if they were annoyed with Intel, saving them whatever they may save by fabbing their own chips. And tbh, renoir, 4000 and 5000 is surely beyond most peoples laptop requirements. And losing 16core 3950x imacs, zen 3 threadrippers this year, could really have given the mac a good kick up the performance charts

I think they gain a lot in the long run. They are no longer beholden to Motorola or IBM like they were in the PowerPC days. If TSMC hypothetically falls behind, they can change their contract manufacturer. They are already designing and verifying the A cores for their iPhones so it isn't a massive investment to design larger chips for laptops and desktops. They ship enough units that additional R&D cost will be repaid by lower unit costs. Plus they'll eliminate the GPU cost as well, by using their integrated graphics (at least in laptops). The cost savings are substantial and they will no longer have to deal with Intel's designs not having the characteristics they desire (namely in cooling solutions).

Doug S · Jun 29, 2020

JasonLD said:
75% of native performance looks pretty impressive for emulation. A14 based Apple Macs could be looking at performance almost as good as previous generation Macs on non-native Apps at equal amount of cores.

Well that depends on how representative Geekbench is in this case, but yes 75% of native is more than good enough since I feel pretty certain Apple waited as long as they did to go ARM to insure every ARM Mac was faster (in native) than the x86 Mac it will replace.

marcUK2 · Jun 29, 2020

Is there any possibility with the ARMchitecture that they can do a 4way chiplet design with an IO chip?

Thala · Jun 29, 2020

Doug S said:
Well that depends on how representative Geekbench is in this case, but yes 75% of native is more than good enough since I feel pretty certain Apple waited as long as they did to go ARM to insure every ARM Mac was faster (in native) than the x86 Mac it will replace.

Indeed. The very fact, that it outperforms Intel CPUs under emulation is an indication what is ahead of us!
I never thought that i will ever experience this, one SoC under pure SW emulation outperforms a native SoC under similar technology and TDP constraints. This is after all the years doing translation with HW support like Transmeta - and today we are getting this performance by pure SW emulation!

Thala · Jun 29, 2020

gdansk said:
I'm not implying they're lying. I'm saying that your numbers are not inline with their projections. On paper, I do not see anyway that A78 would be higher IPC than Zen2. Same number of ALU and AGU, half the FPU, equal 4 way decode, smaller caches. I argue that any compiler than can use AVX2 instructions will achieve higher IPC on Zen2 than on A78.

If both were x86-64 i could see some validity to your argument. But x86-64 never was and never will be as efficient with respect to resource usage as ARMv8.

marcUK2 · Jun 29, 2020

Thala said:
Indeed. The very fact, that it outperforms Intel CPUs under emulation is an indication what is ahead of us!
I never thought that i will every experience this, one SoC under pure SW emulation outperforms a native SoC under similar technology and TDP constraints.

Is it emulation though? I've read in several places that rosetta2 actually recompiles into arm on install. Now I'm sure it's not as good as a real compile from source, but it might be 95% + as good.

Doug S · Jun 29, 2020

Thala said:
Indeed. The very fact, that it outperforms Intel CPUs under emulation is an indication what is ahead of us!
I never thought that i will every experience this, one SoC under pure SW emulation outperforms a native SoC under similar technology and TDP constraints.

I kind of doubt Rosetta was truly using SW emulation in this case. To reach 75% of native I'll bet it was using the "translation at install time" static translation Rosetta is capable of. The only way you could get anywhere 75% with a JIT would be if the application in question was substantially inner loop dominated. While I don't doubt Geekbench has some components that basically are (i.e. compression or encryption tests) there are others like Clang and SQLite which are going to suck on a JIT.

Glo. · Jun 29, 2020

Three things that we should all be excited from Apple's own Silicon.

1) Use of ARMv9 architecture.
2) LPDDR5 and its bandwidth. 6400 MHz is giving 128 GB of memory bandwidth.
3) Apple essentially making a developer platform for ARM architecture, with way higher adoption, than ever before.

And for only those three reasons, and the fact, that the most beneficial OS from all of this is going to be Linux, I for one am Rooting for Apple silicon team.

SarahKerrigan · Jun 29, 2020

Glo. said:
eperm-d995af6e2ef02771 - Geekbench 5 CPU Search - Geekbench

800 Pts in Geekbench v5 on MacOS Big Sur single core score, 2600 pts Multicore score. A14Z, latest, and greatest from Apple. 4C/4C design.

2020 MacBook Air with 2C/4T:
1005 Pts, single threaded, 2000 pts multithreaded score.

Essentially means that ARM still has large disadvantage to x86, even in Apple designs.

Secondly, the scores in iOS platform, are extremely skewed by the platform's performance. On MacOs, the scores in GB5 are lower than on iOS. Which means, that simply iOS platform is extremely well optimized.

There is no more equal level comparison right now between both arch's on one platform, now. So yeah, ARM v9 at best will be tying with x86 Intel's. But still might be losing to AMD's designs.

I wonder what will Richie Rich say about those scores of A14Z under MacOS in GB5...

Since that means that Rosetta is getting within 25% of native performance, that is frankly a phenomenal number. I expect most apps to be worse. That's basically okay; dynamic translation is firmly a game of good-enough.

gdansk · Jun 29, 2020

Thala said:
If both were x86-64 i could see some validity to your argument. But x86-64 never was and never will be as efficient with respect to resource usage as ARMv8.

Resource usage in this sense is die area, transistors and perhaps power. The front end of both designs can decode and issue a similar number of micro-ops. x86-64 instructions decode to more micro-ops but this isn't an issue as these instructions correspond to multiple ARMv8 instructions. On paper there is no way to expect A78 to have 15% higher IPC than both Skylake and Zen 2.

SarahKerrigan · Jun 29, 2020

gdansk said:
Resource usage in this sense is die area, transistors and perhaps power. The front end of both designs can decode and issue a similar number of micro-ops. x86-64 instructions decode to more micro-ops but this isn't an issue as these instructions correspond to multiple ARMv8 instructions. On paper there is no way to expect A78 to have higher IPC than both Skylake and Zen 2.

I've seen N1 have higher iso-clock perf than SKL-SP and Zen2 in my testing, and that's a narrower design in some respects than A78 is. It's not always immediately obvious from a glance over a frontend how a core is going to perform. Power8 was an 8-wide monster and I found it generally roughly matched Haswell at iso clock, or did a little worse; N1 is comparatively narrow but seems to do great. 64+64K L1 in particular helps a lot, according to the profiling I've done.

None of this is really germane to Apple, though.

gdansk · Jun 29, 2020

SarahKerrigan said:
I've seen N1 have higher iso-clock perf than SKL-SP and Zen2 in my testing, and that's a narrower design in some respects than A78 is. It's not always immediately obvious from a glance over a frontend how a core is going to perform. Power8 was an 8-wide monster and I found it generally roughly matched Haswell at iso clock, or did a little worse; N1 is comparatively narrow but seems to do great. 64+64K L1 in particular helps a lot, according to the profiling I've done.

None of this is really germane to Apple, though.

A78 has an either 64 or 32 KiB 4 way which isn't going to help in most workloads compared to Zen 2's 32 KiB 8 way associative cache. It's germane to the subject of some supposed instruction set superiority. There is almost none.

Solved! ARM Apple High-End CPU - Intel replacement

Senior member

Diamond Member

Member

Lifer

Lifer

Diamond Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Member

Senior member

Member

Diamond Member

Diamond Member

Member

Golden Member

Golden Member

Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Senior member

Diamond Member