Solved! ARM Apple High-End CPU - Intel replacement

Page 30 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Richie Rich

Senior member
Jul 28, 2019
470
229
76
There is a first rumor about Intel replacement in Apple products:
  • ARM based high-end CPU
  • 8 cores, no SMT
  • IPC +30% over Cortex A77
  • desktop performance (Core i7/Ryzen R7) with much lower power consumption
  • introduction with new gen MacBook Air in mid 2020 (considering also MacBook PRO and iMac)
  • massive AI accelerator

Source Coreteks:
 
  • Like
Reactions: vspalanki
Solution
What an understatement :D And it looks like it doesn't want to die. Yet.


Yes, A13 is competitive against Intel chips but the emulation tax is about 2x. So given that A13 ~= Intel, for emulated x86 programs you'd get half the speed of an equivalent x86 machine. This is one of the reasons they haven't yet switched.

Another reason is that it would prevent the use of Windows on their machines, something some say is very important.

The level of ignorance in this thread would be shocking if it weren't depressing.
Let's state some basics:

(a) History. Apple has never let backward compatibility limit what they do. They are not Intel, they are not Windows. They don't sell perpetual compatibility as a feature. Christ, the big...

OriAr

Member
Feb 1, 2019
63
35
91
So 1st Arm Macbook would use A14x. I wonder if 14" MBP will use Tigerlake or A series SOC.
First ARM Mac will be almost certainly MBA (Or a new consumer focused product, and to put it bluntly, a Facebook machine, which is why the new iPad Pro makes me think they might not even do that).

If there was a MBP coming this fall you'd have heard the software developers working on software for it already.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
And why not? Look how A76 optimized for server workload & cache & memory controllers perform: it's IPC and multicore scores improve a lot, IPC increases 30% and so on. Don't except Apple chip to perform any different. So probably desktop-class version of Apple core will have better IPC and much improved multicore speed against phone-chip.

That's my whole point. The Graviton 2 is substantially different from the A13 because its not narrowly optimized for single threaded burst performance like the A13 is. My whole problem with this thread is how certain people have been implying that the A13 core could be akin to a drop in solution that is successful across a wide variety of workloads just as it is. To scale the A13 up to be successful in more diverse and multithreaded workloads would probably require some serious architectural changes, which would from what I've read in this thread, lower the single thread/IPC performance substantially.
 
Last edited:
  • Like
Reactions: lightmanek

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
At the other end you have a server chip from AWS that is competitive against Intel and AMD chips.

How competitive is it really though? The benchmarks were only limited to Spec2006, and the competition was a 32 core first generation Zen CPU, and an 28 core Intel Cascade Lake CPU that is based on Skylake architecture from nearly 5 years ago.

Yes this chip and other ARM CPU derivatives will get much better with each iteration, but AMD and Intel aren't going to be standing still either.
 

Hitman928

Diamond Member
Apr 15, 2012
5,177
7,628
136
How competitive is it really though? The benchmarks were only limited to Spec2006, and the competition was a 32 core first generation Zen CPU, and an 28 core Intel Cascade Lake CPU that is based on Skylake architecture from nearly 5 years ago.

Yes this chip and other ARM CPU derivatives will get much better with each iteration, but AMD and Intel aren't going to be standing still either.

The only numbers I think we have for Spec2017 for Graviton2 are from Anandtech and they didn't run Spec2017 for Rome but there are plenty of published and verified results but using AOCC instead of GCC. Also it needs to be noted that Graviton2 was running as a cloud server whereas I'm sure this Rome test was running bare metal. Taking that into consideration, I compared the numbers below.

https://www.anandtech.com/show/15578/cloud-clash-amazon-graviton2-arm-against-intel-and-amd/7
Rome 7742​
Graviton2​
Rome vs Graviton2​
500.perlbench_r​
310.58​
174.4​
178.09%​
502.gcc_r​
334.34​
176.9​
189.00%​
505.mcf_r​
442.24​
103.1​
428.94%​
520.omnetpp_r​
159.59​
85.6​
186.43%​
523.xalancbmk_r​
374.43​
131.4​
284.95%​
525.x264_r​
845.29​
304.4​
277.69%​
531.deepsjeng_r​
368.39​
202.7​
181.74%​
541.leela_r​
359.44​
204.4​
175.85%​
548.exchange2_r​
1000.03​
385.7​
259.28%​
557.xz_r​
232.59​
114.7​
202.78%​
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,478
14,432
136
The only numbers I think we have for Spec2017 for Graviton2 are from Anandtech and they didn't run Spec2017 for Rome but there are plenty of published and verified results but using AOCC instead of GCC. Also it needs to be noted that Graviton2 was running as a cloud server whereas I'm sure this Rome test was running bare metal. Taking that into consideration, I compared the numbers below.

https://www.anandtech.com/show/15578/cloud-clash-amazon-graviton2-arm-against-intel-and-amd/7
Rome 7742​
Graviton2​
Rome vs Graviton2​
500.perlbench_r​
310.58​
174.4​
178.09%​
502.gcc_r​
334.34​
176.9​
189.00%​
505.mcf_r​
442.24​
103.1​
428.94%​
520.omnetpp_r​
159.59​
85.6​
186.43%​
523.xalancbmk_r​
374.43​
131.4​
284.95%​
525.x264_r​
845.29​
304.4​
277.69%​
531.deepsjeng_r​
368.39​
202.7​
181.74%​
541.leela_r​
359.44​
204.4​
175.85%​
548.exchange2_r​
1000.03​
385.7​
259.28%​
557.xz_r​
232.59​
114.7​
202.78%​
So, do I read that Rome is 2-4 times faster than graviton2 ???? Maybe this will shut Richie Rich up.....
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
There's some decent caveats here, but yeah, even taking those into account, it's going to be much faster.

And Zen 3 should be much more potent. I've always said that Zen 3 would be AMD's true break away moment, if it's ever going to happen. Zen 2 was playing catch up with Intel, and luckily for AMD, Intel screwed up badly with their 10nm node so it will make it easier for Zen 3 to really do some damage.

Zen 3 is going to be a monster! :sunglasses:

ARM has their work cut out for them if they want to catch up with x86-64.
 

Richie Rich

Senior member
Jul 28, 2019
470
229
76
So, do I read that Rome is 2-4 times faster than graviton2 ???? Maybe this will shut Richie Rich up.....
523.scalancbmk .... 32c Zen1 ….. 53.7
523.scalancbmk .... 64c Zen2 … 374.4 ……….. that 7x more than Zen1 and 3.5x more per core

505.mcf.... 32c Zen1 ….. 73.2
505.mcf.... 64c Zen2 … 442.2 ……….. that 6x more than Zen1 and 3x more per core

I didn't noticed that Zen2 has >200% higher IPC than Zen1. I thought it's about 15%.
According to Andrei's test G2 is about 1.7x faster while having 2xmore cores. That's expectable.
If Rome would be 2-4 times faster than G2 then also Rome would be 3.4-6.8 times faster than Zen1 Naples. And this is impossible.

Do you still believe those numbers are correct and fully comparable?


@Carfax83
I agree Zen3 is gonna be much better than Zen2 however A78 will have higher IPC. And after that new ARMv9 core line up with SVE2 2048-bit vectors. Well, you can see it isn't ARM who needs to catch up. Since A77 delivers 8% higher IPC by than Zen2, ARM became super dangerous for x86 world.
 

DrMrLordX

Lifer
Apr 27, 2000
21,582
10,784
136
That's my whole point. The Graviton 2 is substantially different from the A13 because its not narrowly optimized for single threaded burst performance like the A13 is. My whole problem with this thread is how certain people have been implying that the A13 core could be akin to a drop in solution that is successful across a wide variety of workloads just as it is. To scale the A13 up to be successful in more diverse and multithreaded workloads would probably require some serious architectural changes, which would from what I've read in this thread, lower the single thread/IPC performance substantially.

If you look at Graviton2, even it struggles with performance whenever too many cores are engaged on the same workload. Anandtech went well out of their way to point out that Graviton2 performs better running multiple, low-resource VM instances.
 

DrMrLordX

Lifer
Apr 27, 2000
21,582
10,784
136
  • Like
Reactions: lightmanek

Hitman928

Diamond Member
Apr 15, 2012
5,177
7,628
136
523.scalancbmk .... 32c Zen1 ….. 53.7
523.scalancbmk .... 64c Zen2 … 374.4 ……….. that 7x more than Zen1 and 3.5x more per core

505.mcf.... 32c Zen1 ….. 73.2
505.mcf.... 64c Zen2 … 442.2 ……….. that 6x more than Zen1 and 3x more per core

I didn't noticed that Zen2 has >200% higher IPC than Zen1. I thought it's about 15%.
According to Andrei's test G2 is about 1.7x faster while having 2xmore cores. That's expectable.
If Rome would be 2-4 times faster than G2 then also Rome would be 3.4-6.8 times faster than Zen1 Naples. And this is impossible.

Do you still believe those numbers are correct and fully comparable?


@Carfax83
I agree Zen3 is gonna be much better than Zen2 however A78 will have higher IPC. And after that new ARMv9 core line up with SVE2 2048-bit vectors. Well, you can see it isn't ARM who needs to catch up. Since A77 delivers 8% higher IPC by than Zen2, ARM became super dangerous for x86 world.

The compiler can make a huge difference on certain tests. Look at the published results for Spec2006 against Andrei's tests on Zen1. The libquantum score increases by 1700% on Zen1 by using an AMD optimized compiler. If you look at the published Spec2017 results, they all use AMD's open compiler, no one uses GCC.

As far as I'm aware, GCC is basically performance equal to ARM's own compiler (could be wrong, I don't follow ARM that much).

Edit: I should add that later versions of GCC I believe incorporate more of AMD's optimizations. Additionally, some tests (like libquantum) are very memory bound and will vary greatly depending on how the memory of the system is configured so that test in a cloud instance could show really low performance compared to a bare metal test. I'd love to see some bare metal tests of Graviton2 but unfortunately I don't think Amazon wants this.
 
Last edited:
  • Like
Reactions: lightmanek

Nothingness

Platinum Member
Jul 3, 2013
2,371
713
136
There's some decent caveats here, but yeah, even taking those into account, it's going to be much faster.
I see two caveats: AOCC is like icc, a SPEC compiler and making any comparison with it is bound to be useless; Rome does turbo in single thread. The second point matters, especially when some (rightly) complain that SPEC isn't representative of server workloads, single-thread is even less representative.

EDIT: my bad, you quoted MT results, great!

The compiler can make a huge difference on certain tests. Look at the published results for Spec2006 against Andrei's tests on Zen1. The libquantum score increases by 1700% on Zen1 by using an AMD optimized compiler. If you look at the published Spec2017 results, they all use AMD's open compiler, no one uses GCC.
Why would system makers use a compiler that gets worse results when they can use a compiler that targets SPEC? Oh wait.

Why didn't you make a SPEC 2006 comparison where you have gcc results? And why not post this in the Graviton thread?

EDIT: I saw @amrnuke posted his results for SPEC 2006 Graviton2 vs 7742 with gcc. And that definitely paints a different picture from what you get, though Rome still has the lead by 30%.
 
Last edited:

Hitman928

Diamond Member
Apr 15, 2012
5,177
7,628
136
I see two caveats: AOCC is like icc, a SPEC compiler and making any comparison with it is bound to be useless; Rome does turbo in single thread. The second point matters, especially when some (rightly) complain that SPEC isn't representative of server workloads, single-thread is even less representative.

EDIT: my bad, you quoted MT results, great!


Why would system makers use a compiler that gets worse results when they can use a compiler that targets SPEC? Oh wait.

Why didn't you make a SPEC 2006 comparison where you have gcc results? And why not post this in the Graviton thread?

EDIT: I saw @amrnuke posted his results for SPEC 2006 Graviton2 vs 7742 with gcc. And that definitely paints a different picture from what you get, though Rome still has the lead by 30%.

The latest versions of GCC have the Rome optimizations included now anyway. I agree that some of the results might be a little optimistic using AOCC, but using GCC8 also makes things a little pessimistic for Rome (see libquantum results with GCC8 showing Rome regressed in perf compared to Naples). The reason I used the scores I did is because they are published and verified by Spec. This is also the reason I didn't use Spec2006, by the time Rome came out, Spec2006 was EOL and no published Spec2006 results exist for Rome. If Amazon would allow Graviton2 to have published Spec results we could make a better comparison but so far they won't (and I doubt they ever will).

Even if we take Ampere's numbers and do some basic calculations to adjust them for Graviton2 (since they're using the same Arm design) you get that Rome is ~50% faster than Graviton2, and that's with them de-rating Rome's score assuming an older version of GCC and not even the highest score published for the Epyc 7742.

Ampere 3.3 GHz, 80 core N1 CPU est. Spec score = 1.04 * Epyc 7742.

Graviton2 = Ampere est. Spec score * (2.5/3.3) * (64/80) = 0.63 * Epyc 7742

Or in other words, Epyc 7742 is 58.7% faster than Ampere, maybe closer to 50% given the lack of multi-score scaling with N1. That's using Ampere's own numbers which are probably pretty optimistic for the Arm design in comparison. So is Epyc 2x - 4x faster than Graviton2? Probably not. But it's also probably faster than what we get using Ampere's numbers so it could be 2x as fast or very close to it.
 
  • Like
Reactions: USER8000

Thala

Golden Member
Nov 12, 2014
1,355
653
136
So, do I read that Rome is 2-4 times faster than graviton2 ???? Maybe this will shut Richie Rich up.....

Not suprised you are not questioning numbers as long as they follow your agenda. Or perhaps you did not notice how the compiler is different? Or maybe the number of HW threads is different by a factor of 2? Is the gatecount or power somewhat equal? - probably not.

The latest versions of GCC have the Rome optimizations included now anyway.

The problem is not the Rome optimization, the problem are the SPEC optimizations. In any case when comparing architectures always use the same compiler, everything else is speculation.
Why do you compare a 64 thread CPU with a 128 thread CPU anyway? Is it not within reasonable expectation that the later is faster in multi-threaded workloads?
 
Last edited:

Nothingness

Platinum Member
Jul 3, 2013
2,371
713
136
The latest versions of GCC have the Rome optimizations included now anyway. I agree that some of the results might be a little optimistic using AOCC, but using GCC8 also makes things a little pessimistic for Rome (see libquantum results with GCC8 showing Rome regressed in perf compared to Naples). The reason I used the scores I did is because they are published and verified by Spec. This is also the reason I didn't use Spec2006, by the time Rome came out, Spec2006 was EOL and no published Spec2006 results exist for Rome. If Amazon would allow Graviton2 to have published Spec results we could make a better comparison but so far they won't (and I doubt they ever will).
I agree we should use latest compiler (as long it's the same) and SPEC 2017. But what you're doing here is like comparing AMD vs Intel on benchmarks where it's been shown that Intel cheated. Don't you remember icc and AMD fans rightly crying Intel was cheating?

Here is what AOCC and icc do on SPECrate2017 on a 7601 and a Gold 6148:


Cavium ThunderX2 SPEC Int Rate Peak Compiler Optimized Results

.Cavium ThunderX2 SPEC Int Rate Peak Gcc7

Vendor compilers should not be used on such benchmarks when doing cross-vendor comparisons. Period. And this has nothing to do with ARM vs x86.

Even if we take Ampere's numbers and do some basic calculations to adjust them for Graviton2 (since they're using the same Arm design) you get that Rome is ~50% faster than Graviton2, and that's with them de-rating Rome's score assuming an older version of GCC and not even the highest score published for the Epyc 7742.

Ampere 3.3 GHz, 80 core N1 CPU est. Spec score = 1.04 * Epyc 7742.

Graviton2 = Ampere est. Spec score * (2.5/3.3) * (64/80) = 0.63 * Epyc 7742

Or in other words, Epyc 7742 is 58.7% faster than Ampere, maybe closer to 50% given the lack of multi-score scaling with N1. That's using Ampere's own numbers which are probably pretty optimistic for the Arm design in comparison. So is Epyc 2x - 4x faster than Graviton2? Probably not. But it's also probably faster than what we get using Ampere's numbers so it could be 2x as fast or very close to it.
Sorry but these are again wild guessing (you have no way to know how Ampere interconnect and memory controllers will behave) and meaningless computations.
 
  • Like
Reactions: Tlh97 and Thala

Hitman928

Diamond Member
Apr 15, 2012
5,177
7,628
136
I agree we should use latest compiler (as long it's the same) and SPEC 2017. But what you're doing here is like comparing AMD vs Intel on benchmarks where it's been shown that Intel cheated. Don't you remember icc and AMD fans rightly crying Intel was cheating?

Here is what AOCC and icc do on SPECrate2017 on a 7601 and a Gold 6148:


Cavium ThunderX2 SPEC Int Rate Peak Compiler Optimized Results

.Cavium ThunderX2 SPEC Int Rate Peak Gcc7

Vendor compilers should not be used on such benchmarks when doing cross-vendor comparisons. Period. And this has nothing to do with ARM vs x86.


Sorry but these are again wild guessing (you have no way to know how Ampere interconnect and memory controllers will behave) and meaningless computations.

The problem with using old GCC versions is that AMD doesn't upstream their architecture optimizations for GCC so you also get a very unfair comparison. The best comparison, again, would be the same compiler but with the latest and best so each company has their optimizations included, but we don't have that because neither Ampere or Amazon have really allowed for that (yet).

As far as the numbers go compared to Ampere, sure, it's obviously not perfect, but if you throw that out then we're left with basically nothing and the best you can say is that they both are server CPUs and that 64 core Graviton2 on 7nm is faster than 32 core Zen1 on 14 nm. Might as well not try to make any comparison to Rome at all.
 

Thala

Golden Member
Nov 12, 2014
1,355
653
136
The problem with using old GCC versions is that AMD doesn't upstream their architecture optimizations for GCC so you also get a very unfair comparison. The best comparison, again, would be the same compiler but with the latest and best so each company has their optimizations included, but we don't have that because neither Ampere or Amazon have really allowed for that (yet).

Thist just means, we cannot conclude yet. It does not mean we should start juggling around with unreasonable results.
In addition when comparing architectures, you need a meaningful metric. Just comparing a 128 thread implementation against a 64 thread implementation with respect to absolute performance is moot.
 
  • Like
Reactions: Nothingness

Hitman928

Diamond Member
Apr 15, 2012
5,177
7,628
136
Thist just means, we cannot conclude yet. It does not mean we should start juggling around with unreasonable results.
In addition when comparing architectures, you need a meaningful metric. Just comparing a 128 thread implementation against a 64 thread implementation with respect to absolute performance is moot.

1) I never said my numbers were a conclusion, on the contrary I said that they contained lots of caveats but wanted to show what we get with the numbers we have understanding there's caveats either way we do it.

2) So should we compare a 2 rack solution of Graviton2 versus a 1 rack solution of Epyc since that's what it would take to reach thread parity between the two?
 

Nothingness

Platinum Member
Jul 3, 2013
2,371
713
136
1) I never said my numbers were a conclusion, on the contrary I said that they contained lots of caveats but wanted to show what we get with the numbers we have understanding there's caveats either way we do it.
Indeed you made it clear. But people jumped to your data and made utterly stupid statements as if that was the Truth because that fits their beliefs.

It's sometimes much better not to provide data rather than juggling with computations (and I plead guilty as I sometimes do that myself).

2) So should we compare a 2 rack solution of Graviton2 versus a 1 rack solution of Epyc since that's what it would take to reach thread parity between the two?
IMHO the best comparison would be if AWS propose Rome and Andrei updated his review with it.
 

Hitman928

Diamond Member
Apr 15, 2012
5,177
7,628
136
I agree we should use latest compiler (as long it's the same) and SPEC 2017. But what you're doing here is like comparing AMD vs Intel on benchmarks where it's been shown that Intel cheated. Don't you remember icc and AMD fans rightly crying Intel was cheating?

Here is what AOCC and icc do on SPECrate2017 on a 7601 and a Gold 6148:


Thanks for linking that. When we look at the Spec2017 GCC results, TX2 sure looks really strong beating the 7601 by a decent margin and crushing the Gold 6148. How did the TX2 fair in the STH test suite though (also using GCC)?

Cavium-ThunderX2-OpenSSL-Sign-Benchmarks.jpg

Cavium-ThunderX2-OpenSSL-Verify-Benchmarks.jpg

Cavium-ThunderX2-c-ray-8K-benchmark-comparison-stack.jpg

Cavium-ThunderX2-7zip-compression-benchmarks.jpg


Cavium-ThunderX2-UnixBench-dhrystone-2-multi.jpg


Hmm. So should we just throw away Spec entirely or. . .
 

Nothingness

Platinum Member
Jul 3, 2013
2,371
713
136
Thanks for linking that. When we look at the Spec2017 GCC results, TX2 sure looks really strong beating the 7601 by a decent margin and crushing the Gold 6148. How did the TX2 fair in the STH test suite though (also using GCC)?

Cavium-ThunderX2-OpenSSL-Sign-Benchmarks.jpg

Cavium-ThunderX2-c-ray-8K-benchmark-comparison-stack.jpg

Cavium-ThunderX2-7zip-compression-benchmarks.jpg


Cavium-ThunderX2-UnixBench-dhrystone-2-multi.jpg


Hmm. So should we just throw away Spec entirely or. . .
Are you trying to compare microbenchmarks and domain specific benchmarks with SPEC? Really?
 

Hitman928

Diamond Member
Apr 15, 2012
5,177
7,628
136
Are you trying to compare microbenchmarks and domain specific benchmarks with SPEC? Really?

So we can't compare a collection of individual benchmarks to form a custom suite, we have to stick to Spec's collection of tests?

The reality is these CPUs will be tested by customers on their own optimized setups with their own actual flow being tested. Everything else is just talking points but as just consumers on a consumer forum, that's all we really have, right? The only way we'd have any info to go off of would be someone to release their internal testing (which almost no one will do).

Look, I'm not trying to downplay what ARM is doing in the server space, they've made a ton of progress and are starting to become a real threat to x86, just trying to bring some perspective compared to the marketing from ARM partners which is all we have to go off of because they haven't (maybe won't) release test systems for independent reviewers to publish their results. My personal opinion is that ARM isn't quite there yet with this generation, but the next generation could be a whole different story, especially with Intel continuing to struggle to put anything really competitive out in this space. The next generation or two may come down to how valuable system admins see sticking with x86 would be and using AMD versus switching to an ARM ecosystem.
 
Last edited: