Solved! ARM Apple High-End CPU - Intel replacement

Page 47 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Richie Rich

Senior member
Jul 28, 2019
470
229
76
There is a first rumor about Intel replacement in Apple products:
  • ARM based high-end CPU
  • 8 cores, no SMT
  • IPC +30% over Cortex A77
  • desktop performance (Core i7/Ryzen R7) with much lower power consumption
  • introduction with new gen MacBook Air in mid 2020 (considering also MacBook PRO and iMac)
  • massive AI accelerator

Source Coreteks:
 
  • Like
Reactions: vspalanki
Solution
What an understatement :D And it looks like it doesn't want to die. Yet.


Yes, A13 is competitive against Intel chips but the emulation tax is about 2x. So given that A13 ~= Intel, for emulated x86 programs you'd get half the speed of an equivalent x86 machine. This is one of the reasons they haven't yet switched.

Another reason is that it would prevent the use of Windows on their machines, something some say is very important.

The level of ignorance in this thread would be shocking if it weren't depressing.
Let's state some basics:

(a) History. Apple has never let backward compatibility limit what they do. They are not Intel, they are not Windows. They don't sell perpetual compatibility as a feature. Christ, the big...

jpiniero

Lifer
Oct 1, 2010
14,509
5,159
136
Why is the Samsung galaxy book with 8cx on windows 10 ARM getting better results than the A12Z on MacOS (compared to android/ios)? isnt macOS already arm native in these machines? whats the difference?


Geekbench 5 does have a aarch64 version on Windows.
 

Eug

Lifer
Mar 11, 2000
23,583
996
126
Why is the Samsung galaxy book with 8cx on windows 10 ARM getting better results than the A12Z on MacOS (compared to android/ios)? isnt macOS already arm native in these machines? whats the difference?

The Geekbench scores for Mac ARM are a result of Rosetta translation.
 

gdansk

Golden Member
Feb 8, 2011
1,973
2,353
136
This roseta2 leak is 830 pts @ 2.4 GHz, resulting in 346 pts/GHz .... which is 78% performance of native A12.
If you check the json it's closer to 2.5GHz

Another thing is that all core freq for Zen2 is around 4.2 GHz. Thats far far away from 4.7 GHz.
Yes, and will run all their cores at 3.3GHz too :rolleyes: Well I suppose it'll be closer due to the new node. I find it *extremely* unlikely that one CPU with 4 ALU and 3 AGU will be 40% higher IPC than another CPU with 4 ALU and 3 AGU. The 15 execution units isn't going to help anything because 4 of them are extremely limited in function. How is it a benefit that indirect memory load uses two execution slots rather than one? This is how it boils down X1 vs Zen2.

And Rich, how much higher IPC does A78 have compared to Zen 2?
 
Last edited:
  • Like
Reactions: Tlh97

Richie Rich

Senior member
Jul 28, 2019
470
229
76
Well I suppose it'll be closer due to the new node. I find it *extremely* unlikely that one CPU with 4 ALU and 3 AGU will be 40% higher IPC than another CPU with 4 ALU and 3 AGU. The 15 execution units isn't going to help anything because 4 of them are extremely limited in function. How is it a benefit that indirect memory load uses two execution slots rather than one? This is how it boils down X1 vs Zen2.

And Rich, how much higher IPC does A78 have compared to Zen 2?
A78 is about 8% higher in IPC (GeekBench5) than Zen2 while having 1.33 mm2 core vs. 3.6mm2 Zen2.
A78 is about 15% higher in IPC (SPEC).


*extremely* unlikely? I doubt ARM LLC is lying about performance projections. It's pretty easy to run benchmark in SW simulation. We can SW simulate engine dyno test and export engine sound in WAV file 20 years in automotive industry. I bet they run a lot of benchmarks in SW simulation while they do the development iteration. What about 82% higher IPC for 6xALU and 2 AGU core like A13? That's the stone hard reality that's lazy x86 world is facing right now.
 

gdansk

Golden Member
Feb 8, 2011
1,973
2,353
136
*extremely* unlikely? I doubt ARM LLC is lying about performance projections. It's pretty easy to run benchmark in SW simulation. We can SW simulate engine dyno test and export engine sound in WAV file 20 years in automotive industry. I bet they run a lot of benchmarks in SW simulation while they do the development iteration. What about 82% higher IPC for 6xALU and 2 AGU core like A13? That's the stone hard reality that's lazy x86 world is facing right now.
I'm not implying they're lying. I'm saying that your numbers are not inline with their projections. On paper, I do not see anyway that A78 would be higher IPC than Zen2. Same number of ALU and AGU, half the FPU, equal 4 way decode, smaller caches. I argue that any compiler than can use AVX2 instructions will achieve higher IPC on Zen2 than on A78.
 
  • Like
Reactions: Tlh97

Doug S

Platinum Member
Feb 8, 2020
2,201
3,405
136

800 Pts in Geekbench v5 on MacOS Big Sur single core score, 2600 pts Multicore score. A14Z, latest, and greatest from Apple. 4C/4C design.

2020 MacBook Air with 2C/4T:
1005 Pts, single threaded, 2000 pts multithreaded score.

Essentially means that ARM still has large disadvantage to x86, even in Apple designs.

Secondly, the scores in iOS platform, are extremely skewed by the platform's performance. On MacOs, the scores in GB5 are lower than on iOS. Which means, that simply iOS platform is extremely well optimized.

There is no more equal level comparison right now between both arch's on one platform, now. So yeah, ARM v9 at best will be tying with x86 Intel's. But still might be losing to AMD's designs.

I wonder what will Richie Rich say about those scores of A14Z under MacOS in GB5...


A solution Apple hacked up for developers (i.e. which will never be sold) using a two year out of date core, running a pre-release OS and a pre-release Rosetta produces results under x86 and you are going to argue it means "ARM still has a large disadvantage to x86". You should go into politics, you obviously have a pretty strong ability to twist reality into whatever form serves your pre-existing biases.

If you've ever used pre-release stuff, and from this it is apparent you have not, it is always loaded down with a lot of debugging code that saps performance. That's beyond the fact it is a two year old core, running pre-release software, and emulating x86. You think this means redacted for the performance of the Macs Apple will release in December? Wow, you really are clueless.

We have a zero tolerance policy for profanity in the tech sub-forums.
Don't do it again.

Iron Woode

Super Moderator
 
Last edited by a moderator:

Thunder 57

Platinum Member
Aug 19, 2007
2,647
3,706
136
There are some license limits. I doubt you can use ARM's IP for modification and using for another ISA. AMD did GREAT MISTAKE by canceling their K12 ARM core. No doubt about that today. Back in 2015 I considered that as good move but not today. Keller was right again. The thing is that Nvidia is licensing Cortex A78 cores so Nvidia can finally beat AMD with Cortex CPUs. Isn't that funny? I guess some AMD hard core fans could have brain stroke from that.

I dare say (again) that AMD wouldn't exist today if they focused on K12 instead of Zen. You must think Lisa Su is an idiot. Sure, maybe Keller wanted to pursue K12 but AMD didn't have the resources to do both at the time.

No. Apple is the CPU tech leader. They have about 5 years of advantage to x86 now. They will offer double performance per watt in laptops, the new A14 will be the most powerfull CPU in ST and will beat Zen3@4.6 GHz easily.

Hahaha :rolleyes: . Lets wait until we see what Apple puts out before we make such bold statements. As always, you state unknown things as fact. People here don't care for that.

ARM is the ISA leader now. Upcoming ARMv9 and SVE2 2048-bit capable SIMDs. Fugaku super computers thanks to SVE can destroy GPU based super computers. And you will have smartphones with ARMv9 SVE2 in H2 2021.


Intel and AMD will have to fight very hard for survival. With the current speed of their development they are dead already IMO.

Thanks for another laugh. You are like a broken record. 6xALU, SMT4, SVE2 2048. I guess GPU's are dead since we can all just use SVE now? Oh, and AMD and Intel are already dead and just living on borrowed time. Sounds about right.
 
  • Haha
  • Like
Reactions: pcp7 and Tlh97

Glo.

Diamond Member
Apr 25, 2015
5,657
4,409
136
A solution Apple hacked up for developers (i.e. which will never be sold) using a two year out of date core, running a pre-release OS and a pre-release Rosetta produces results under x86 and you are going to argue it means "ARM still has a large disadvantage to x86". You should go into politics, you obviously have a pretty strong ability to twist reality into whatever form serves your pre-existing biases.

If you've ever used pre-release stuff, and from this it is apparent you have not, it is always loaded down with a lot of debugging code that saps performance. That's beyond the fact it is a two year old core, running pre-release software, and emulating x86. You think this means redacted for the performance of the Macs Apple will release in December? Wow, you really are clueless.
Butthurt or outraged, hmm? ;)

Again, Shadow of the Tomb Raider was running on this very silicon with better performance than Renoir can deliver in 1080p. This games example people used, to prove how Rosetta 2 was advanced, and efficient, after the keynote.

Now they turn tables to prove its not optimized, yet?

I guess, Rosetta 2 has to be actually pretty efficient in translating the code, after all, hmm?
 
Last edited by a moderator:
  • Like
Reactions: Tlh97

marcUK2

Member
Sep 23, 2019
74
39
61
If the rumoured 50% increase in fpu performance is real for zen3, we will find out in a few months, and I'm sure that levels the field to arm ipc somewhat in stuff that actually matters.

I'm not so knowledgeable on arm, apart from efficiency, are they strong in integer/float/vector ?
 

JasonLD

Senior member
Aug 22, 2017
485
445
136
Butthurt or outraged, hmm? ;)

Again, Shadow of the Tomb Raider was running on this very silicon with better performance than Renoir can deliver in 1080p. This games example people used, to prove how Rosetta 2 was advanced, and efficient, after the keynote.

Now they turn tables to prove its not optimized, yet?

I guess, Rosetta 2 has to be actually pretty efficient in translating the code, after all, hmm?

75% of native performance looks pretty impressive for emulation. A14 based Apple Macs could be looking at performance almost as good as previous generation Macs on non-native Apps at equal amount of cores.
 
  • Like
Reactions: Tlh97 and Etain05

marcUK2

Member
Sep 23, 2019
74
39
61
I remember g4,...I had one, way faster than x86 because of altivec.....for a few months, then it turned into a farce. Admittedly not apples fault, but not great for the platform. Nowadays I think processors are so fast, it's kind of irrelevant for most things. I don't get what Apple stands to really gain here. Loss of bootcamp. Annoying developers, questionable performance, questionable long term development...
maybe it's just a sales trick...I guess they did this twice already, and the faithful upgrade, might be a boost in sales during the recession, or could be a complete flop depending how bad it gets.
no doubt they could have got a hell of a deal from AMD if they were annoyed with Intel, saving them whatever they may save by fabbing their own chips. And tbh, renoir, 4000 and 5000 is surely beyond most peoples laptop requirements. And losing 16core 3950x imacs, zen 3 32/64 core threadrippers this year...could really have given the mac a good kick up the performance charts
 

gdansk

Golden Member
Feb 8, 2011
1,973
2,353
136
I remember g4,...I had one, way faster than x86 because of altivec.....for a few months, then it turned into a farce. Admittedly not apples fault, but not great for the platform. Nowadays I think processors are so fast, it's kind of irrelevant for most things. I don't get what Apple stands to really gain here. Loss of bootcamp. Annoying developers, questionable performance, questionable long term development...
maybe it's just a sales trick...I guess they did this twice already, and the faithful upgrade, might be a boost in sales during the recession, or could be a complete flop depending how bad it gets.
no doubt they could have got a hell of a deal from AMD if they were annoyed with Intel, saving them whatever they may save by fabbing their own chips. And tbh, renoir, 4000 and 5000 is surely beyond most peoples laptop requirements. And losing 16core 3950x imacs, zen 3 threadrippers this year, could really have given the mac a good kick up the performance charts
I think they gain a lot in the long run. They are no longer beholden to Motorola or IBM like they were in the PowerPC days. If TSMC hypothetically falls behind, they can change their contract manufacturer. They are already designing and verifying the A cores for their iPhones so it isn't a massive investment to design larger chips for laptops and desktops. They ship enough units that additional R&D cost will be repaid by lower unit costs. Plus they'll eliminate the GPU cost as well, by using their integrated graphics (at least in laptops). The cost savings are substantial and they will no longer have to deal with Intel's designs not having the characteristics they desire (namely in cooling solutions).
 

Doug S

Platinum Member
Feb 8, 2020
2,201
3,405
136
75% of native performance looks pretty impressive for emulation. A14 based Apple Macs could be looking at performance almost as good as previous generation Macs on non-native Apps at equal amount of cores.

Well that depends on how representative Geekbench is in this case, but yes 75% of native is more than good enough since I feel pretty certain Apple waited as long as they did to go ARM to insure every ARM Mac was faster (in native) than the x86 Mac it will replace.
 

Thala

Golden Member
Nov 12, 2014
1,355
653
136
Well that depends on how representative Geekbench is in this case, but yes 75% of native is more than good enough since I feel pretty certain Apple waited as long as they did to go ARM to insure every ARM Mac was faster (in native) than the x86 Mac it will replace.

Indeed. The very fact, that it outperforms Intel CPUs under emulation is an indication what is ahead of us!
I never thought that i will ever experience this, one SoC under pure SW emulation outperforms a native SoC under similar technology and TDP constraints. This is after all the years doing translation with HW support like Transmeta - and today we are getting this performance by pure SW emulation!
 
Last edited:
  • Like
Reactions: Etain05

Thala

Golden Member
Nov 12, 2014
1,355
653
136
I'm not implying they're lying. I'm saying that your numbers are not inline with their projections. On paper, I do not see anyway that A78 would be higher IPC than Zen2. Same number of ALU and AGU, half the FPU, equal 4 way decode, smaller caches. I argue that any compiler than can use AVX2 instructions will achieve higher IPC on Zen2 than on A78.

If both were x86-64 i could see some validity to your argument. But x86-64 never was and never will be as efficient with respect to resource usage as ARMv8.
 

marcUK2

Member
Sep 23, 2019
74
39
61
Indeed. The very fact, that it outperforms Intel CPUs under emulation is an indication what is ahead of us!
I never thought that i will every experience this, one SoC under pure SW emulation outperforms a native SoC under similar technology and TDP constraints.
Is it emulation though? I've read in several places that rosetta2 actually recompiles into arm on install. Now I'm sure it's not as good as a real compile from source, but it might be 95% + as good.
 
  • Like
Reactions: Tlh97

Doug S

Platinum Member
Feb 8, 2020
2,201
3,405
136
Indeed. The very fact, that it outperforms Intel CPUs under emulation is an indication what is ahead of us!
I never thought that i will every experience this, one SoC under pure SW emulation outperforms a native SoC under similar technology and TDP constraints.

I kind of doubt Rosetta was truly using SW emulation in this case. To reach 75% of native I'll bet it was using the "translation at install time" static translation Rosetta is capable of. The only way you could get anywhere 75% with a JIT would be if the application in question was substantially inner loop dominated. While I don't doubt Geekbench has some components that basically are (i.e. compression or encryption tests) there are others like Clang and SQLite which are going to suck on a JIT.
 

Glo.

Diamond Member
Apr 25, 2015
5,657
4,409
136
Three things that we should all be excited from Apple's own Silicon.

1) Use of ARMv9 architecture.
2) LPDDR5 and its bandwidth. 6400 MHz is giving 128 GB of memory bandwidth.
3) Apple essentially making a developer platform for ARM architecture, with way higher adoption, than ever before.

And for only those three reasons, and the fact, that the most beneficial OS from all of this is going to be Linux, I for one am Rooting for Apple silicon team.
 

SarahKerrigan

Senior member
Oct 12, 2014
339
468
136

800 Pts in Geekbench v5 on MacOS Big Sur single core score, 2600 pts Multicore score. A14Z, latest, and greatest from Apple. 4C/4C design.

2020 MacBook Air with 2C/4T:
1005 Pts, single threaded, 2000 pts multithreaded score.

Essentially means that ARM still has large disadvantage to x86, even in Apple designs.

Secondly, the scores in iOS platform, are extremely skewed by the platform's performance. On MacOs, the scores in GB5 are lower than on iOS. Which means, that simply iOS platform is extremely well optimized.

There is no more equal level comparison right now between both arch's on one platform, now. So yeah, ARM v9 at best will be tying with x86 Intel's. But still might be losing to AMD's designs.

I wonder what will Richie Rich say about those scores of A14Z under MacOS in GB5...

Since that means that Rosetta is getting within 25% of native performance, that is frankly a phenomenal number. I expect most apps to be worse. That's basically okay; dynamic translation is firmly a game of good-enough.
 
  • Like
Reactions: Tlh97 and Etain05

gdansk

Golden Member
Feb 8, 2011
1,973
2,353
136
If both were x86-64 i could see some validity to your argument. But x86-64 never was and never will be as efficient with respect to resource usage as ARMv8.
Resource usage in this sense is die area, transistors and perhaps power. The front end of both designs can decode and issue a similar number of micro-ops. x86-64 instructions decode to more micro-ops but this isn't an issue as these instructions correspond to multiple ARMv8 instructions. On paper there is no way to expect A78 to have 15% higher IPC than both Skylake and Zen 2.
 
Last edited:
  • Like
Reactions: Tlh97

SarahKerrigan

Senior member
Oct 12, 2014
339
468
136
Resource usage in this sense is die area, transistors and perhaps power. The front end of both designs can decode and issue a similar number of micro-ops. x86-64 instructions decode to more micro-ops but this isn't an issue as these instructions correspond to multiple ARMv8 instructions. On paper there is no way to expect A78 to have higher IPC than both Skylake and Zen 2.

I've seen N1 have higher iso-clock perf than SKL-SP and Zen2 in my testing, and that's a narrower design in some respects than A78 is. It's not always immediately obvious from a glance over a frontend how a core is going to perform. Power8 was an 8-wide monster and I found it generally roughly matched Haswell at iso clock, or did a little worse; N1 is comparatively narrow but seems to do great. 64+64K L1 in particular helps a lot, according to the profiling I've done.

None of this is really germane to Apple, though.
 

gdansk

Golden Member
Feb 8, 2011
1,973
2,353
136
I've seen N1 have higher iso-clock perf than SKL-SP and Zen2 in my testing, and that's a narrower design in some respects than A78 is. It's not always immediately obvious from a glance over a frontend how a core is going to perform. Power8 was an 8-wide monster and I found it generally roughly matched Haswell at iso clock, or did a little worse; N1 is comparatively narrow but seems to do great. 64+64K L1 in particular helps a lot, according to the profiling I've done.

None of this is really germane to Apple, though.
A78 has an either 64 or 32 KiB 4 way which isn't going to help in most workloads compared to Zen 2's 32 KiB 8 way associative cache. It's germane to the subject of some supposed instruction set superiority. There is almost none.