Solved! ARM Apple High-End CPU - Intel replacement

Richie Rich · Oct 14, 2019

There is a first rumor about Intel replacement in Apple products:

ARM based high-end CPU
8 cores, no SMT
IPC +30% over Cortex A77
desktop performance (Core i7/Ryzen R7) with much lower power consumption
introduction with new gen MacBook Air in mid 2020 (considering also MacBook PRO and iMac)
massive AI accelerator

Source Coreteks:

soresu · May 19, 2020

DrMrLordX said:
Kunpeng is server, no? 64c and all that?

4/8C in this instance.

The Huawei URL seemingly lists them under servers, but the webpage says desktop board.

Link here.

Doug S · May 19, 2020

soresu said:
I keep seeing these weird mentions of Apple cores in the context of servers in this thread - if they were going to do servers seriously, they would have done so long ago with x86 CPU's.

I'm not sure if they did so before, but they certainly don't do so now for the wider market (ie outside of Apple).

If they aren't doing so now for x86, they aren't going to suddenly turn around and start making server chips out of the blue - they are still a consumer and workstation oriented company. Even then the Mac Pro's don't even have the best workstation platform on the market (EPYC 2) so it's all moot for that score.

I agree that Apple has shown no interest in the server market, and other than possibly making server chips for their own cloud I don't see any role.

However, there's little practical difference between a server CPU and a workstation CPU. Both have lots of cores, need plenty of memory & I/O bandwidth, etc. and have higher level requirements for reliability (i.e. stuff like ECC memory)

My point all along has been that Apple's CURRENT cores at their CURRENT clock rates would do quite well in a workstation (or a cloud server if Apple ever went that way which we have no indication they will)

Obviously the memory and I/O need to be improved a lot to supply dozens of cores instead of just 2 or 4, and the inter core communication fabric is more complex, but making a world class CPU core is a lot harder than making a world class uncore so anyone trying to argue that Apple would somehow be unable to do this is arguing in an alternative reality.

You stick 32 of Apple's big cores in a Mac Pro and give it a bunch of memory channels and PCIe lanes and it will be a beast for pretty much anything you throw at it. Could they do better if they designed a new core from scratch for this? Probably, but we'd have to see if it was better "enough" to make it worth doing. I certainly see no reason why it would matter in anything aside from the Macbook Pro, iMac Pro and Mac Pro.

Your point about Apple not using the best workstation CPU currently is something I've also tried to highlight multiple times. Apple doesn't need to beat the best x86 CPUs, or the best Intel CPUs. They need to beat the CPUs that in the x86 version being converted to ARM. So they need to beat whatever CPU is in the x86 Macbook Air with an ARM Macbook Air, which the A14/A14X would easily accomplish this year. And down the road when they did an ARM Mac Pro, they'd only need to beat the performance of the last x86 Mac Pro they sold. Hell, Apple has sometimes been several years out of date on their Macs compared to Intel's latest and greatest and it hasn't really hurt them.

IntelUser2000 · May 19, 2020

x86 isn't underperforming solely because of the ISA - its because x86 vendors(both of them) severely missed their execution.

The easiest point of proof is looking at how well the iGPU performs looking the perf/mm2 and perf/watt. The ARM GPUs do significantly better than both AMD/Intel's efforts. And Apple is at the top.*

If they had the same ISA there would have been other excuses.

*For Apple due to vertical integration they have execution advantages that aren't available to other vendors. Doesn't mean vertical integration itself is a magic bullet, but with the whole group doing so well, vertical integration is quite an advantage for them.

soresu · May 20, 2020

IntelUser2000 said:
The easiest point of proof is looking at how well the iGPU performs looking the perf/mm2 and perf/watt. The ARM GPUs do significantly better than both AMD/Intel's efforts. And Apple is at the top.

Ahem, IMG Tec is standing behind you looking very angry indeed.....

GPU and CPU are not comparable targets - I mean at all.

By AMD's own admission they would ideally use different optimised process nodes for CPU and GPU components within an APU, but obviously cannot do so in a monolithic design.

It does stand to reason that HW optimised for a mobile first use case would emphasize area and power efficiency - it took them years to catch up to x86 however, and the benefits are as yet not such a massive advantage truth be told.

The 105W of an N1 64C ref design is certainly impressive - but lacking SMT of EPYC 2, and still not dramatically more than 2x the power efficiency per core, to say nothing of the SIMD limitations of N1 vs EPYC 2 also.

Apple/PowerVR is "at the top" because they took a base IMG Tec design and tricked it out exactly as they wanted it, rather than using a synthesizable licensed OTS core that anybody can just drop in to an existing SoC design layout, coupled with having explicit control over the platform it is used on, not entirely dissimilar from game console full system stack optimisation. Fully controlling the stack has its perks.

The ARM community has the benefit of not suffering from the more direct and caustic competition that the main x86 vendors do - you can say that Apple competes with Snapdragon until you are blue in the face, but that doesn't make it any more true, because they are not competing on the same platform, and people who buy Apple are fairly unlikely to buy Android phones in general in my experience.

Huawei Kirin and Samsung Exynos do compete with Snapdragon, and now all 3 SoC brands will employ ARM Ltd CPU cores going forward - so in the end ARM/SoftBank receive all the financial benefit from their custom, just as they receive a benefit from the ISA license to Apple, unlike with Intel and AMD who have to cross license from each other to maintain license equilibrium post x64, so likely neither has any particular advantage from it anymore (I bet Intel curses the name Itanium now).

DrMrLordX · May 20, 2020

soresu said:
4/8C in this instance.

The Huawei URL seemingly lists them under servers, but the webpage says desktop board.

Link here.

Oh I remember that board! I didn't know it was under the Kunpeng lineup. I was thinking of this:

404 Page - Hisilicon

Error 404 description

www.hisilicon.com

awesomedeluxe · May 20, 2020

Ignoring software concerns, I guess I'm left with two questions about this transition.

How fast can Apple make single-core performance?

Intel boosts to stupid clock speeds to keep single-core performance high. ARM isn't that efficient at high clock speeds, and part of its success in the server market has been due to server workloads being so well-suited to sprawling core counts. But you can't just shove 32 cores into a 16" laptop and say you're good to go. All those cores are not going to do a damn thing to speed up Excel, Photoshop, or Chrome.

How fast can Apple make the GPU?

I mean, they can certainly pay AMD to make a fast GPU. But then what is the point? You are not making a fanless machine with an AMD GPU inside. I can hardly imagine thinking an A14X tied to an RDNA 2-or-3 part is worth the hassle over whatever APU AMD is offering at the time.

But we don't really understand how Apple's GPUs work. The A12Z has... 8 cores... and what are those, exactly? Are these still clusters of shader cores? If so, how many are there? Would a larger notebook just have a supermassive die size to accommodate more GPU cores, or is that going to go in a separate part?

I think in the best case, your Macbook Air and your MBP 13 could be a rousing success. But I'm concerned that the 16" will never be able to compete with similarly-sized machines with Intel or AMD hardware.

Doug S · May 20, 2020

awesomedeluxe said:
ARM isn't that efficient at high clock speeds

No, that's not true at all. There's nothing special about "ARM" that makes it less efficient at high clock speeds. This is 100% a design decision, there's nothing stopping Apple or anyone else from designing an ARM core that targets frequencies of 5 GHz+. It just doesn't make sense in a cell phone where power draw is paramount - targeting more moderate frequencies and doing more per cycle saves power because power draw increases superlinearly versus frequency.

awesomedeluxe said:
How fast can Apple make the GPU?

Is that really relevant? Apple isn't using the on chip GPU on all its current Macs, so why would it do so for ARM Macs? Seems like the logical course would be that the "Pro" machines use a discrete GPU from AMD or Nvidia, while the consumer line uses Apple's on chip GPU. I don't know how it compares to Intel's, but GPU performance scales pretty well with more units/transistors/etc. especially at the lower end so if it isn't fast enough "make it bigger"

awesomedeluxe · May 20, 2020

Doug S said:
No, that's not true at all. There's nothing special about "ARM" that makes it less efficient at high clock speeds. This is 100% a design decision, there's nothing stopping Apple or anyone else from designing an ARM core that targets frequencies of 5 GHz+. It just doesn't make sense in a cell phone where power draw is paramount - targeting more moderate frequencies and doing more per cycle saves power because power draw increases superlinearly versus frequency.

...

Is that really relevant? Apple isn't using the on chip GPU on all its current Macs, so why would it do so for ARM Macs? Seems like the logical course would be that the "Pro" machines use a discrete GPU from AMD or Nvidia, while the consumer line uses Apple's on chip GPU. I don't know how it compares to Intel's, but GPU performance scales pretty well with more units/transistors/etc. especially at the lower end so if it isn't fast enough "make it bigger"

Thanks for your reply. I am wondering if you have any good reading material on ARM cores at higher clock speeds. My understanding was that, like all chips, clockspeed has an exponential relationship with power usage, but that ARM chips saw particularly poor return on clockspeed increase relative to power usage beyond a certain threshold.

As for the GPU, I think it is definitely relevant. Even in the base MBP16 available now, you are looking at a part that will likely consume more power than the CPU when it is on. You can't substantially redesign that machine using a part like that. Pairing it with an A14XYZ is not going to change the fact that this other part of the machine over here gets really really hot and needs space and fans to cool it down. If you are using a MBP16 that comes standard with an A14Y2K and some RDNA 3 part, what is the upside to this transition? You're left with a chassis that still has plenty of room to accommodate a Zen / *Lake part, so you're just being dragged through this for nothing?

I think Apple has to be willing to make a big GPU if they want to make "the best computer possible" in the 16" space. You are certainly right about how GPUs scale - I just think it remains to be seen how, if they decided shove 16 of their mystery GPU cores in there, they would stack up against the competition.

soresu · May 20, 2020

awesomedeluxe said:
You are not making a fanless machine with an AMD GPU inside.

The new Exynos SoC's coming with RDNA derived GPU say otherwise - they will certainly be in future Samsung Galaxy devices.

Perhaps even this "Whitechapel" custom SoC being developed by Samsung for Google devices will have it too, using either Android or Fuchsia, could be either after this much time for Fuchsia to gestate.

Though I'm still inclined to believe that Fuchsia will replace the main Java foundation while keeping the Android name to maintain branding continuity - ala the NT kernel replacing the old legacy one for Win 95-Me as the Windows main kernel.

The fact that the Fuchsia team is working on it's own Android Runtime coupled with the focus on OpenGL ES to Vulkan work at the ANGLE project suggests that Fuchsia will have very high compatibility with Android apps, at least for apps that follow the more recent design rules for Android anyway - the recent stipulation that developers only submit new apps and updates that are ARMv8/A64 compatible may lessen the burden of compatibility some.

awesomedeluxe · May 21, 2020

soresu said:
The new Exynos SoC's coming with RDNA derived GPU say otherwise - they will certainly be in future Samsung Galaxy devices.

Interesting stuff -- I wasn't aware of that. Hard to guess what such a GPU looks like since Renoir uses Vega and there are no RDNA parts on SoCs, but...

techspot said:
The post also says the RDNA chip still needs work, especially in the area of power consumption, but the companies should have the it ready by that 2021 date.

Not surprising -- Navi was designed around the needs of the PS5 and XSX after all. And of course, we know what RDNA looks like scaled up, because that's the 50W TDP part that's in the MBP16 right now.

For a fanless design, I think the maximum TGP of a discrete GPU would be around 15W. And this would still be a triumph of computer engineering. Frankly, nVidia is in a much better position to provide such a part - the MX330 already has a low power version that meets this requirement and is not that much worse than the 5300M. This disparity could easily be fixed by a modern manufacturing process. But of course, nVidia won't do this, and Apple won't work with nVidia.

We have not really heard anything about Apple working on a discrete GPU, but I don't see how they get around doing so if they want the 16" to be a compelling product. Just, I don't know, put 20 A14 GPU cores somewhere with some HBM2E stacks and pray on it.

Maybe the real conclusion is that the 16" is just going to be the ugly stepsister - some Apple CPU with too many cores matched with an AMD GPU that triples the machine's power consumption whenever it turns on.

soresu · May 21, 2020

awesomedeluxe said:
And of course, we know what RDNA looks like scaled up, because that's the 50W TDP part that's in the MBP16 right now.

Clockspeed/voltage has a dramatic effect on power efficiency - if you could run even Navi 10 at Polaris 10 clockspeeds you would notice a dramatic increase in power efficiency from RX 5700 XT, which is pushing the envelope of what the chip can do in order to stay power/perf competitive against the RTX 2070.

Given the current Samsung/RDNA GPU bench figures suggest a mere 2 TFLOP performance at most (2x SD 855) - coupled with the greater power efficiency of RDNA2, I would be extremely surprised if they do not hit their efficiency targets by the time it is finished considering it still has 9 to 12 months to mature.

Bare in mind that this is also their first truly mobile GPU since Imageon from ATI was sold to Qualcomm (becoming Adreno's foundation), so some teething problems are to be expected somewhat.

Glo. · May 22, 2020

100+ Benchmarks Of Amazon's Graviton2 64-Core CPU Against AMD's EPYC 7742 - Phoronix

www.phoronix.com

I hope this ends this debate whether x86 CPUs or ARM CPUs are faster, and I hope we won't see anymore that ARM is better ISA than x86, for High Performance Computing.

awesomedeluxe · May 22, 2020

Glo. said:
100+ Benchmarks Of Amazon's Graviton2 64-Core CPU Against AMD's EPYC 7742 - Phoronix

www.phoronix.com

I hope this ends this debate whether x86 CPUs or ARM CPUs are faster, and I hope we won't see anymore that ARM is better ISA than x86, for High Performance Computing.

Certainly persuasive. We have very few examples of ARM competing with x86 in like-kind environments. This even uses the same number of cores, which is great!

Still, unless I am missing something... the caveat here is power, right? The word "power" comes up in that article exactly 0 times. AMD lists the TDP of the EPYC 7742 at 225W, while the estimated TDP of Amazon's Graviton2 is 80-110W. This is super relevant in most of the machines Apple will be designing.

x86 cores are certainly better than ARM cores at handling multiple tasks. But that might matter less if you are designing a machine that's thermal constrained such that you can literally fit twice as many ARM cores.

And of course, while AMD has recently edged out Intel to become the performance leader in many x86 segments, no one comes close to Apple in ARM design.

Glo. · May 22, 2020

awesomedeluxe said:
Still, unless I am missing something... the caveat here is power, right? The word "power" comes up in that article exactly 0 times. AMD lists the TDP of the EPYC 7742 at 225W, while the estimated TDP of Amazon's Graviton2 is 80-110W. This is super relevant in most of the machines Apple will be designing.

Its 80-110W for 32 core design.

Edit. I mistaken it for that 80 core ARM CPU from Ampere which is 210W. Estimation is 80-110W for Graviton's 64 core design.

awesomedeluxe · May 22, 2020

Glo. said:
Its 80-110W for 32 core design.

Edit. I mistake it for that 80 core ARM CPU from Ampere.

It would be helpful if Amazon just gave us the number. Here's where anandtech gives that estimate for the 64 core part.

name99 · May 22, 2020

awesomedeluxe said:
x86 cores are certainly better than ARM cores at handling multiple tasks.

Uh, wot???
Please justify this claim on technical grounds.

Hitman928 · May 22, 2020

awesomedeluxe said:
It would be helpful if Amazon just gave us the number. Here's where anandtech gives that estimate for the 64 core part.

It's right about 95 W - 100 W, maybe a hair under. There's plenty of documentation to calculate it out to a much smaller range than what Andrei gave. I posted about it here, here, and here.

Doug S · May 22, 2020

Glo. said:
100+ Benchmarks Of Amazon's Graviton2 64-Core CPU Against AMD's EPYC 7742 - Phoronix

www.phoronix.com

I hope this ends this debate whether x86 CPUs or ARM CPUs are faster, and I hope we won't see anymore that ARM is better ISA than x86, for High Performance Computing.

No, that only ends any debate on whether Graviton2 or AMD/Intel x86 CPUs are faster. It says nothing about "ARM CPUs" in general any more than the performance of Bulldozer said something about "x86 CPUs" in general.

DrMrLordX · May 22, 2020

awesomedeluxe said:
But that might matter less if you are designing a machine that's thermal constrained such that you can literally fit twice as many ARM cores.

Server rooms are always thermally constrained at some level. The question is whether or not your application(s) can benefit from adding more cores. If Graviton2 is indeed a 110w SoC or less, I could deploy twice as many sockets for Graviton2 as EPYC 7742 assuming scaling is there. There are also issues like VM response time to consider. Sometimes your application requires higher-frequency cores at the expense of efficiency, and that's why both AMD and Intel provide server CPUs to fill that niche.

soresu · May 22, 2020

Doug S said:
No, that only ends any debate on whether Graviton2 or AMD/Intel x86 CPUs are faster. It says nothing about "ARM CPUs" in general any more than the performance of Bulldozer said something about "x86 CPUs" in general.

Especially considering that Graviton2 is based on what will soon be on the 2 generation old core A76 - I'm expecting A78 to be announced next week.

At the very least you get a 20% IPC boost from A77 (integer IPC boost which is typically weaker than FP), and probably somewhere between 10-20% for A78 on top of that.

Not to mention that ARM is also long overdue for an overhaul to their SIMD performance, hopefully with SVE2 on Matterhorn next year, albeit I'd expect AMD to move towards AVX512 at least a year before ARM does 512 bit SVE2 on their big cores.

soresu · May 22, 2020

DrMrLordX said:
Server rooms are always thermally constrained at some level. The question is whether or not your application(s) can benefit from adding more cores. If Graviton2 is indeed a 110w SoC or less, I could deploy twice as many sockets for Graviton2 as EPYC 7742 assuming scaling is there.

Liquid cooling should allow you to do even better.

I'm honestly surprised that the physical set up around liquid cooling loops look so clunky even in servers where space is at a premium inside server racks.

Doug S · May 22, 2020

soresu said:
Especially considering that Graviton2 is based on what will soon be on the 2 generation old core A76 - I'm expecting A78 to be announced next week.

At the very least you get a 20% IPC boost from A77 (integer IPC boost which is typically weaker than FP), and probably somewhere between 10-20% for A78 on top of that.

Not to mention that ARM is also long overdue for an overhaul to their SIMD performance, hopefully with SVE2 on Matterhorn next year, albeit I'd expect AMD to move towards AVX512 at least a year before ARM does 512 bit SVE2 on their big cores.

Also not to mention that it is possible to do much better than ARM does core-wise, as Apple demonstrates.

soresu · May 22, 2020

Doug S said:
Also not to mention that it is possible to do much better than ARM does core-wise, as Apple demonstrates.

I discount them, it's pointless to bicker over Apple cores when they don't play with others at all. If you don't buy Apple products (as I do) they are effectively an interesting/glamorous non entity.

If they ever decide to license the core I'll sing a merrier tune mind you - and on that day I will dance a jig for the flying pigs saluting Lucifer's frozen backside in the seventh circle of hell.

Part of the magic of the ARM license ecosystem is that anyone can tap in (well usually Huawei mutters...), unfortunately sometimes that ends with companies that did not properly consider the market before doing so.

I think it's very unlikely to happen anytime soon, but it would be interesting to see if ARM will go open in the next 15 years as POWER has.

Doug S · May 22, 2020

soresu said:
I discount them, it's pointless to bicker over Apple cores when they don't play with others at all. If you don't buy Apple products (as I do) they are effectively an interesting/glamorous non entity.

If they ever decide to license the core I'll sing a merrier tune mind you - and on that day I will dance a jig for the flying pigs saluting Lucifer's frozen backside in the seventh circle of hell.

Part of the magic of the ARM license ecosystem is that anyone can tap in (well usually Huawei mutters...), unfortunately sometimes that ends with companies that did not properly consider the market before doing so.

I think it's very unlikely to happen anytime soon, but it would be interesting to see if ARM will go open in the next 15 years as POWER has.

You can't just decide to "discount them" because Apple doesn't sell their cores on the open market. The point isn't that Apple doesn't sell server chips on the open market, the point is that Apple proves it is possible to design ARM cores that perform much better than ARM designed cores. If they can do it, someone else can do it too.

People trying to make arguments that Graviton2 benchmarks somehow "prove" that x86 is better than ARM are bad enough, but deciding the evidence of Apple's cores don't count because "they don't play well with others" is beyond ridiculous. At that point one should admit they are biased towards x86 and won't accept any evidence that runs counter to their bias.

lobz · May 22, 2020

Doug S said:
Also not to mention that it is possible to do much better than ARM does core-wise, as Apple demonstrates.

And I can bench 950 kg, I just don't feel like showing it.

Solved! ARM Apple High-End CPU - Intel replacement

Senior member

Diamond Member

Diamond Member

Elite Member

Diamond Member

Lifer

Member

Diamond Member

Member

Diamond Member

Member

Diamond Member

Diamond Member

Member

Diamond Member

Member

Senior member

Diamond Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Platinum Member