Solved! ARM Apple High-End CPU - Intel replacement

Richie Rich · Oct 14, 2019

There is a first rumor about Intel replacement in Apple products:

ARM based high-end CPU
8 cores, no SMT
IPC +30% over Cortex A77
desktop performance (Core i7/Ryzen R7) with much lower power consumption
introduction with new gen MacBook Air in mid 2020 (considering also MacBook PRO and iMac)
massive AI accelerator

Source Coreteks:

naukkis · May 22, 2020

Glo. said:
100+ Benchmarks Of Amazon's Graviton2 64-Core CPU Against AMD's EPYC 7742 - Phoronix

www.phoronix.com

I hope this ends this debate whether x86 CPUs or ARM CPUs are faster, and I hope we won't see anymore that ARM is better ISA than x86, for High Performance Computing.

It's pretty obvisous, 7742 is about as big as AMD could make it, ~600mm2 7nm + 450mm2 14nm, 256mb L3 and 225w tdp - and it ends up being only average 33% faster than quite affordable Graviton2 ~400mm2 ~100W chip with only 32mb L3 in same 7nm process.

lobz · May 22, 2020

naukkis said:
It's pretty obvisous, 7742 is about as big as AMD could make it, ~600mm2 7nm + 450mm2 14nm, 256mb L3 and 225w tdp - and it ends up being only average 33% faster than quite affordable Graviton2 ~400mm2 ~100W chip with only 32mb L3 in same 7nm process.

Where did you find that 33% average? The review finds 51%. Also, an estimated power draw holds zero ground against a TDP in my eyes.
Until it's not possible to directly compare actual power consumption while both CPUs run the same workload, everything is just an assumption about the real efficiency difference, and certainly not 'pretty obvious'.
Still don't know where you've seen the 33% though, I'm curious, since you replied to this specific review.

Hitman928 · May 22, 2020

lobz said:
Where did you find that 33% average? The review finds 51%. Also, an estimated power draw holds zero ground against a TDP in my eyes.
Until it's not possible to directly compare actual power consumption while both CPUs run the same workload, everything is just an assumption about the real efficiency difference, and certainly not 'pretty obvious'.
Still don't know where you've seen the 33% though, I'm curious, since you replied to this specific review.

I like the circle chart that they do as it gives a better picture than the individual mean numbers. The tricky thing is that it's logarithmic so keep that in mind when viewing it.

soresu · May 22, 2020

Doug S said:
You can't just decide to "discount them" because Apple doesn't sell their cores on the open market. The point isn't that Apple doesn't sell server chips on the open market, the point is that Apple proves it is possible to design ARM cores that perform much better than ARM designed cores. If they can do it, someone else can do it too.

I can decide to think whatever I want where they are concerned, that's the virtue of not buying Apple in the first place!

FREEDOM!!

I discount them because their entire ecosystem is an Apple centric walled garden that makes nVidia look positively open by comparison (and boi do I hate them).

I discount them because those better cores are essentially worthless for being stuck inside that walled garden - I'm well aware that they are potentially better than ARM's big cores and have been since A6 or 7, not that I have seen much in the way of serious non benchmark SW run on them to truly judge that difference.

The entire thing is like the equivalent of putting an F22 jet engine into a basic prop plane that you can never fly outside of the airport boundaries - in short a depressing waste of potential that only makes it more annoying when experienced coders (like a Dolphin emu coder) wax lyrical about it when it does them absolutely no good because they can't make use of it outside of jailbroken hardware.

Yes the potential is there, but it is not currently in the market.

The cost of developing such a core to compete with performance derived from years of uArch R&D at Apple is not an insignificant thing whatever their pedigree, there are at least a few that crashed and burned in just the last few years, and Nuvia having former Apple Ax engineers on staff does not automatically guarantee them success.

The greatest irony would be for a new company like Nuvia to finally finish their new core, only for a Matterhorn derivative to meet or exceed it - and therein lies the problem with investment in new companies making ARM cores, ARM themselves now have SoftBank backing them to go wherever they want on R&D.

soresu · May 22, 2020

naukkis said:
It's pretty obvisous, 7742 is about as big as AMD could make it, ~600mm2 7nm + 450mm2 14nm, 256mb L3 and 225w tdp - and it ends up being only average 33% faster than quite affordable Graviton2 ~400mm2 ~100W chip with only 32mb L3 in same 7nm process.

I assume that average smooths over the SIMD heavy loads like rendering and video encoding?

Given Zen2's AVX2 performance that should certainly show a wider gap.

IntelUser2000 · May 22, 2020

lobz said:
Where did you find that 33% average? The review finds 51%. Also, an estimated power draw holds zero ground against a TDP in my eyes.

That 51% difference in performance will potentially enable AMD to make a part that's close in TDP to that Graviton 2 part.

AMD doesn't have such a comparison, but Intel's side does. The Xeon Platinum 8280 has 28 cores, 2.7GHz Base/4GHz Turbo with 205W TDP. But you could drop to a Xeon Gold 6262 with 24 cores 1.9GHz Base/3.6GHz Turbo with only 135W TDP. So a 50% drop in performance will get it much closer.

AMD/Intel also has features that'll increase power/die area that Graviton 2 doesn't. Multiple socket support? I/O?

coercitiv · May 23, 2020

Doug S said:
At that point one should admit they are biased towards x86 and won't accept any evidence that runs counter to their bias.

Maybe it's time some admit they're biased towards unicorns too.

Yes, Apple's current core has the best PPC money can buy in a human made device. It's done, discussed, set in holy forum stone. However, what does Apple's accomplishment with ARM cores prove about x86 potential? Because the way I see it, so far Apple has proven that faster CPU cores can be built.Their result says little about which ISA is superior, if any.

As for "discounting" Apple, it's only natural for this to happen as the argument repeatedly comes back to it's origin: extrapolating from a dual-core phone CPU to a many-core HPC silicon as the unicorn of ARM potential. It's almost as if some enthusiastic forumites want everybody to make a bet on the future of computing and those conservative enough to ask for patience and future product based evidence are labeled naysayers and x86 biased.

And the apex in this argument is this theory that it takes two to make a world class CPU: a genius to design the core and a monkey to follow the manual for the uncore. This is the insulting part, because it is used as proof for ease of scaling in HPC while at the same time ignoring the fact that Apple stands alone at the top of ARM ST performance. It's very very quiet up there, nobody else came even close. They're one of a kind, yet some insist that all ARM designs are on a convergent top performance path while x86 designs... are not.

Personally I'm more confident in the future of ARM than x86, but not because of "genetic" performance potential, but rather due to the business model behind it: it has faster evolution pace due to multiple entities being allowed to iterate on the designs. However, not all signs are encouraging, as multiple phone SoC makers are seemingly abandoning custom R&D for a more centralized and cost effective approach. Different start, but not necessarily a different ending.

DrMrLordX · May 23, 2020

soresu said:
Especially considering that Graviton2 is based on what will soon be on the 2 generation old core A76 - I'm expecting A78 to be announced next week.

At least to date, server ARM SoCs seem to favor older ARM cores. It took awhile to get A76 (or what is essentially A76) implemented and publicly-available in a server SoC. A77 still isn't represented at all.

soresu said:
Liquid cooling should allow you to do even better.

Not really, unless your liquid cooling setup sinks heat outdoors or into the ground (geocooling). If your radiators are indoors, your AC system still has to handle the heat.

Hitman928 said:
I like the circle chart that they do as it gives a better picture than the individual mean numbers. The tricky thing is that it's logarithmic so keep that in mind when viewing it.

The circle chart shows how lopsided is the comparison between Graviton2 and Rome. That one side of the circle represents an absolute massacre. One of the problems I would have with Graviton2 (and potentially server ARM in general) is that it's too niche for a broad set of workloads. It's hardware you target at specific applications. And I don't think it's all due to NEON vs. AVX2 either. That Linux kernel compile benchmark . . . woof.

One wonders if Apple's chips would show similarly-lopsided performance deltas in workloads that are common on Apple's own laptops.

Doug S · May 23, 2020

lobz said:
And I can bench 950 kg, I just don't feel like showing it.

That's a moronic comparison. Taking your example within my claim is basically like saying "it is obviously possible to bench press 950 kg if we can see someone doing it in front of our face". There isn't any "don't feel like showing it" here. Apple designed ARM cores are significantly outperforming ARM Ltd designed cores right in front of our faces. Are you now going to argue that the A77 cores in SD865 and so forth actually perform better than Apple's, but you just can't show it because you can't run benchmarks you prefer (i.e. any benchmark that shows Apple cores doing better than ARM Ltd designed cores must be flawed?)

Some people are stupid and will have their head in the sand even if Apple starts selling ARM based Macs that beat x86 Macs and x86 Windows on machines on the much wider variety of benchmarks they'll be able to run. Because there will be benchmarks they can't run because applications haven't been ported or whatever, and those will obviously be the ones that matter.

lobz · May 23, 2020

Doug S said:
That's a moronic comparison. Taking your example within my claim is basically like saying "it is obviously possible to bench press 950 kg if we can see someone doing it in front of our face". There isn't any "don't feel like showing it" here. Apple designed ARM cores are significantly outperforming ARM Ltd designed cores right in front of our faces. Are you now going to argue that the A77 cores in SD865 and so forth actually perform better than Apple's, but you just can't show it because you can't run benchmarks you prefer (i.e. any benchmark that shows Apple cores doing better than ARM Ltd designed cores must be flawed?)

Some people are stupid and will have their head in the sand even if Apple starts selling ARM based Macs that beat x86 Macs and x86 Windows on machines on the much wider variety of benchmarks they'll be able to run. Because there will be benchmarks they can't run because applications haven't been ported or whatever, and those will obviously be the ones that matter.

What may be obvious for you, can easily look moronic for others (I'm referring to extrapolating server CPU designing capabilities from the success of designing a 2-core CPU that's mainly used for single threaded integer workload with zero need for a quick and wide interconnect, and calling that subjective extrapolation obvious).

Doug S · May 23, 2020

coercitiv said:
However, what does Apple's accomplishment with ARM cores prove about x86 potential? Because the way I see it, so far Apple has proven that faster CPU cores can be built.Their result says little about which ISA is superior, if any.

I wasn't making any claims about which ISA is superior, I was making claims AGAINST those who claimed Graviton2's poor showing somehow provides proof that the x86 ISA is superior.

I don't think it is possible for ISA to matter anymore. Back in the days when chips had far fewer than a million transistors it mattered because the overhead of additional work required to decode a more complex and irregular CISC ISA like x86 meant fewer transistors available for execution. That's why RISC chips dominated at the time. Now that we have chips with billions of transistors, instruction decode is a tiny fraction of 1% of them and is meaningless in the overall picture.

Once you have instructions decoded they are all competing on equal footing and it is the implementation that matters - how many instructions can be in flight at once, how many loads/stores can happen at the same time, how good is its branch prediction, what clock rate is it able to run at, and so forth. I don't think it is possible for "ARM" to beat "x86" as an ISA, or vice versa. It is only possible for a particular implementation (Intel's, AMD's, Apple's, etc.) to beat another implementation. That's why the Graviton results don't tell us anything useful about which ISA is superior, it only demonstrates that the Graviton2 (and its ARM Ltd designed cores) are a poor implementation when compared to Intel and AMD implementations.

DrMrLordX · May 23, 2020

Doug S said:
I don't think it is possible for ISA to matter anymore.

Please remind the OP of this fact.

Doug S · May 23, 2020

DrMrLordX said:
Please remind the OP of this fact.

You mean Richie Rich? I already tried to tell him that IPC is not a reasonable comparison between CPUs when they operate at very different clock rates, he lives in his world just as those who believe x86 has some magical inherent superiority over ARM live in theirs.

In the real world what matters as far as the subject of "Apple replacing x86 Macs with ARM Macs" is how Apple's ARM CPU core directly compares with Intel's x86 CPU core. Now people can dispute all they want that SPEC2006 and GB5 are somehow overstating Apple CPU performance or understating Intel x86 CPU performance, but the room for dispute on both sides will vanish if Apple actually does it and more direct comparisons become possible.

soresu · May 23, 2020

DrMrLordX said:
Not really, unless your liquid cooling setup sinks heat outdoors or into the ground (geocooling). If your radiators are indoors, your AC system still has to handle the heat.

Didn't you hear about those datacenters that re use radiated heat to warm the buildings? I'm sure that in a large enough datacenter that you could probably even combine the waterflow with thermophotovoltaic power generation to even save some money on those cooling costs.

DrMrLordX said:
At least to date, server ARM SoCs seem to favor older ARM cores. It took awhile to get A76 (or what is essentially A76) implemented and publicly-available in a server SoC. A77 still isn't represented at all.

Going by last year I expected N2 in february (assuming it is A77 derived), I'm not sure if the pandemic crisis has affected ARM's release schedules or not yet.

I guess we will see if A78/G78 is not announced this month.

I think that there is more to each Neoverse gen than just melding the same uncore into a new CPU core - to stay competitive they are probably developing an uncore roadmap in parallel with the Cortex Axx roadmap, so I would expect improvements beyond just IO or core IPC alone in N2.

soresu · May 23, 2020

Doug S said:
Are you now going to argue that the A77 cores in SD865 and so forth actually perform better than Apple's

Going by raw performance, no obviously - going by performance per watt, yes.

Apple's big cores are certainly performant, but at a cost to power - not much for integer, but certainly significant for FP compared to A77.

OTOH A55 is left in the dust at the moment perf/watt wise, and not even that much of an improvement over the fast aging A53 after 4.5 years - I've seen few SoC's yet that use A55 on it's own, so I hope its consumer successor is a much bigger improvement to perf/watt (A65/E1 already succeeded it handsomely elsewhere).

Considering how few ARM SBC SoC's use big cores I do hope that SoftBank is driving ARM to push the little cores forward more aggressively than before.

Progression from A5 -> A7 -> A53 took place over far less time than A53 -> A55 with not so much to show for it.

DrMrLordX · May 24, 2020

Doug S said:
he lives in his world just as those who believe x86 has some magical inherent superiority over ARM live in theirs.

Maybe if you look at how people react to positions like his instead of imaging that people have some belief in the "magical inherent superiority" of x86, people's posts will start to make more sense.

soresu said:
Didn't you hear about those datacenters that re use radiated heat to warm the buildings? I'm sure that in a large enough datacenter that you could probably even combine the waterflow with thermophotovoltaic power generation to even save some money on those cooling costs.

That's NORAD-style cooling. Unless you are going to use heat pumps to achieve temps higher than what you get from server exhaust, the best you'll ever get from that is ambient heat, and you're fighting the server room AC systems at that point. If you build in a subterranean setting then it could make sense, but you won't get hot water from the servers or anything like that.

Doug S · May 24, 2020

soresu said:
Going by raw performance, no obviously - going by performance per watt, yes.

Apple's big cores are certainly performant, but at a cost to power - not much for integer, but certainly significant for FP compared to A77.

Performance per watt is equally as meaningless as IPC. If you want to increase your performance per watt, clock down by 50%, see your performance drop by 40-45%, and see your performance per watt increase by 3-4x. If Apple's SoCs were clocked down so their performance matched A77, they'd use less power than A77 (just look at those frequency/voltage curves for A13 - Apple can reduce power a LOT with a pretty small decrease in frequency)

soresu · May 24, 2020

Doug S said:
Performance per watt is equally as meaningless as IPC. If you want to increase your performance per watt, clock down by 50%, see your performance drop by 40-45%, and see your performance per watt increase by 3-4x. If Apple's SoCs were clocked down so their performance matched A77, they'd use less power than A77 (just look at those frequency/voltage curves for A13 - Apple can reduce power a LOT with a pretty small decrease in frequency)

You seem to think that is an Apple only trick, you can likely apply that same process to the ARM cores and end up back in the same spot.

It's like saying Navi 10 is awesome at perf/watt when it's never used at a low enough frequency to make a difference.

It doesn't matter what frequency it could be used at for perf/watt, it matters what frequency it IS used at in shipping products - Apple makes that choice to favor raw performance and the battery life suffers as a result, these are still mostly used in mobile devices after all.

soresu · May 24, 2020

DrMrLordX said:
That's NORAD-style cooling. Unless you are going to use heat pumps to achieve temps higher than what you get from server exhaust, the best you'll ever get from that is ambient heat, and you're fighting the server room AC systems at that point. If you build in a subterranean setting then it could make sense, but you won't get hot water from the servers or anything like that.

Depends on the radiator engineering and thermophotovoltaic cell tuning I guess - but on a sizable datacenter with thousands of processors that is a lot of waste heat going nowhere costing you money as it does.

It might be state of the art now, but as a datacenter operator thinking about cooling and server power costs you would have to be insane to ignore the possibilities of taking advantage of that waste heat to defray those costs, at the very least I would imagine that you could recoup the cooling power costs.

DrMrLordX · May 24, 2020

Harvesting heat for power is all about delta T - the difference between the cold side and hot side of your heat engine, regardless of whether you're using the Carnot cycle or a TEG. If you are geocooling - that is, if the earth is your heatsink - then you can get your cold side down to 4C and maybe harvest a decent amount of your waste heat as power. And under those circumstances your ambients may be 4C, so you can recoup 100% of that power to avoid using a heater of any kind (assuming you have humans in the facility; if you don't, then there's no benefit).

We're getting a bit off-topic though. Current ARM offerings don't seem to be chewing up enough heat for something like NORAD/Cheyenne Mountain to be on the table.

Doug S · May 25, 2020

soresu said:
You seem to think that is an Apple only trick, you can likely apply that same process to the ARM cores and end up back in the same spot.

It's like saying Navi 10 is awesome at perf/watt when it's never used at a low enough frequency to make a difference.

It doesn't matter what frequency it could be used at for perf/watt, it matters what frequency it IS used at in shipping products - Apple makes that choice to favor raw performance and the battery life suffers as a result, these are still mostly used in mobile devices after all.

If it was possible to overclock Qualcomm SD865 and perform as well as A13, you can be 100% sure that more than one Android OEM would do so. They don't, because the SD865 at its default clock is already running at pretty much the same point in the voltage/frequency curve as Apple - if not slightly higher in fact. The whole curve for the SD865 is below Apple's, because it has fewer active transistors (due to Apple's cores being larger)

The latest iPhones rank near the very top in battery life in Anandtech's testing, despite having smaller batteries than most Android flagships. Because Apple's SoCs don't run at the top speed all the time, they aren't burning up more power than SD865 all the time - they only need to run at full speed for short bursts when that higher performance is required. So despite being "less efficient" in a meaningless performance per watt benchmark, and having a smaller battery, they actually last as long or longer than just about every Android flagship. It is almost as if being able to go fast for less time when needed, while still going "fast enough" when running at a lower clock rate, leads to more real world efficiency. Fancy that!

DrMrLordX · May 25, 2020

@Doug S

Qualcomm released an "overclocked" version of the SD855 called the SD855+. I have one. All it does is stay at max clocks longer.

soresu · May 25, 2020

Doug S said:
If it was possible to overclock Qualcomm SD865 and perform as well as A13, you can be 100% sure that more than one Android OEM would do so. They don't, because the SD865 at its default clock is already running at pretty much the same point in the voltage/frequency curve as Apple - if not slightly higher in fact. The whole curve for the SD865 is below Apple's, because it has fewer active transistors (due to Apple's cores being larger)

Apple's core isn't 'overclocked' either, and that isn't what I meant in the first place - at this point I'm beginning to think that you are deliberately misinterpreting my statements.

soresu · May 26, 2020

Cortex A78 and X1 uArch's announcement in new Anandtech articles. Link here.

Also Mali G78. Link here.

I'll be back once I've read a bit.

A78 is 7% IPC gain from from A77, X1 is +30% IPC gain from A77.

A78 continues down the path of high performance while keeping power and area trim, while X1 (codename Hera, wife of Zeus hint hint) is cutting loose of the old restrictions to power and area focusing only on max performance.

So it seems as another (Thala) intimated ARM have diverged big core uArch's in favor of big and really big - what this means for little cores I don't know, considering the coming Exynos uses A78 and A76 it may eventually be X2 and A79 with little cores only used for very minimum background processes.

X1 also seems aimed at creating choice for custom implementations too - so I would expect Samsung and Qualcomm to spit out their own variants likely.

X1 is 5 wide apparently, and descrribed as a "super charged A78 design", also having double the NEON units, so AV1 encoded video should definitely not be a problem for this core with the current level of dav1d ARMv8-A optimisation.

trexfromouterspace · May 26, 2020

soresu said:
A78 is +20% IPC from A77, X1 is +30% IPC from A77.

A78 is doesn't have +20% IPC over the A77. What the article says is that an A78 has "20% sustained performance" over an A77, but that figure also includes the process-dependent gains you get going from 7nm to 5nm.

From the figures on Page 4, a 3 GHz A78 (5nm) has +20% performance compared to a 2.6 GHz A77 (7nm), while a 2.1 GHz A78 has the same performance as a 2.3GHz A77 at half the energy. So the IPC gain is actually somewhere in the 4-9% range.

Solved! ARM Apple High-End CPU - Intel replacement

Senior member

Golden Member

Platinum Member

Diamond Member

Diamond Member

Diamond Member

Elite Member

Diamond Member

Lifer

Diamond Member

Platinum Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Diamond Member

Lifer

Diamond Member

Lifer

Diamond Member

Diamond Member

Member