Solved! ARM Apple High-End CPU - Intel replacement

Page 41 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Doug S

Member
Feb 8, 2020
78
91
51
I keep seeing these weird mentions of Apple cores in the context of servers in this thread - if they were going to do servers seriously, they would have done so long ago with x86 CPU's.

I'm not sure if they did so before, but they certainly don't do so now for the wider market (ie outside of Apple).

If they aren't doing so now for x86, they aren't going to suddenly turn around and start making server chips out of the blue - they are still a consumer and workstation oriented company. Even then the Mac Pro's don't even have the best workstation platform on the market (EPYC 2) so it's all moot for that score.
I agree that Apple has shown no interest in the server market, and other than possibly making server chips for their own cloud I don't see any role.

However, there's little practical difference between a server CPU and a workstation CPU. Both have lots of cores, need plenty of memory & I/O bandwidth, etc. and have higher level requirements for reliability (i.e. stuff like ECC memory)

My point all along has been that Apple's CURRENT cores at their CURRENT clock rates would do quite well in a workstation (or a cloud server if Apple ever went that way which we have no indication they will)

Obviously the memory and I/O need to be improved a lot to supply dozens of cores instead of just 2 or 4, and the inter core communication fabric is more complex, but making a world class CPU core is a lot harder than making a world class uncore so anyone trying to argue that Apple would somehow be unable to do this is arguing in an alternative reality.

You stick 32 of Apple's big cores in a Mac Pro and give it a bunch of memory channels and PCIe lanes and it will be a beast for pretty much anything you throw at it. Could they do better if they designed a new core from scratch for this? Probably, but we'd have to see if it was better "enough" to make it worth doing. I certainly see no reason why it would matter in anything aside from the Macbook Pro, iMac Pro and Mac Pro.

Your point about Apple not using the best workstation CPU currently is something I've also tried to highlight multiple times. Apple doesn't need to beat the best x86 CPUs, or the best Intel CPUs. They need to beat the CPUs that in the x86 version being converted to ARM. So they need to beat whatever CPU is in the x86 Macbook Air with an ARM Macbook Air, which the A14/A14X would easily accomplish this year. And down the road when they did an ARM Mac Pro, they'd only need to beat the performance of the last x86 Mac Pro they sold. Hell, Apple has sometimes been several years out of date on their Macs compared to Intel's latest and greatest and it hasn't really hurt them.
 

IntelUser2000

Elite Member
Oct 14, 2003
6,673
1,235
126
x86 isn't underperforming solely because of the ISA - its because x86 vendors(both of them) severely missed their execution.

The easiest point of proof is looking at how well the iGPU performs looking the perf/mm2 and perf/watt. The ARM GPUs do significantly better than both AMD/Intel's efforts. And Apple is at the top.*

If they had the same ISA there would have been other excuses.

*For Apple due to vertical integration they have execution advantages that aren't available to other vendors. Doesn't mean vertical integration itself is a magic bullet, but with the whole group doing so well, vertical integration is quite an advantage for them.
 
  • Like
Reactions: coercitiv

soresu

Golden Member
Dec 19, 2014
1,124
380
136
The easiest point of proof is looking at how well the iGPU performs looking the perf/mm2 and perf/watt. The ARM GPUs do significantly better than both AMD/Intel's efforts. And Apple is at the top.
Ahem, IMG Tec is standing behind you looking very angry indeed.....

GPU and CPU are not comparable targets - I mean at all.

By AMD's own admission they would ideally use different optimised process nodes for CPU and GPU components within an APU, but obviously cannot do so in a monolithic design.

It does stand to reason that HW optimised for a mobile first use case would emphasize area and power efficiency - it took them years to catch up to x86 however, and the benefits are as yet not such a massive advantage truth be told.

The 105W of an N1 64C ref design is certainly impressive - but lacking SMT of EPYC 2, and still not dramatically more than 2x the power efficiency per core, to say nothing of the SIMD limitations of N1 vs EPYC 2 also.

Apple/PowerVR is "at the top" because they took a base IMG Tec design and tricked it out exactly as they wanted it, rather than using a synthesizable licensed OTS core that anybody can just drop in to an existing SoC design layout, coupled with having explicit control over the platform it is used on, not entirely dissimilar from game console full system stack optimisation. Fully controlling the stack has its perks.

The ARM community has the benefit of not suffering from the more direct and caustic competition that the main x86 vendors do - you can say that Apple competes with Snapdragon until you are blue in the face, but that doesn't make it any more true, because they are not competing on the same platform, and people who buy Apple are fairly unlikely to buy Android phones in general in my experience.

Huawei Kirin and Samsung Exynos do compete with Snapdragon, and now all 3 SoC brands will employ ARM Ltd CPU cores going forward - so in the end ARM/SoftBank receive all the financial benefit from their custom, just as they receive a benefit from the ISA license to Apple, unlike with Intel and AMD who have to cross license from each other to maintain license equilibrium post x64, so likely neither has any particular advantage from it anymore (I bet Intel curses the name Itanium now).
 
Last edited:
  • Like
Reactions: Tlh97

awesomedeluxe

Junior Member
Feb 12, 2020
23
3
41
Ignoring software concerns, I guess I'm left with two questions about this transition.

How fast can Apple make single-core performance?

Intel boosts to stupid clock speeds to keep single-core performance high. ARM isn't that efficient at high clock speeds, and part of its success in the server market has been due to server workloads being so well-suited to sprawling core counts. But you can't just shove 32 cores into a 16" laptop and say you're good to go. All those cores are not going to do a damn thing to speed up Excel, Photoshop, or Chrome.

How fast can Apple make the GPU?

I mean, they can certainly pay AMD to make a fast GPU. But then what is the point? You are not making a fanless machine with an AMD GPU inside. I can hardly imagine thinking an A14X tied to an RDNA 2-or-3 part is worth the hassle over whatever APU AMD is offering at the time.

But we don't really understand how Apple's GPUs work. The A12Z has... 8 cores... and what are those, exactly? Are these still clusters of shader cores? If so, how many are there? Would a larger notebook just have a supermassive die size to accommodate more GPU cores, or is that going to go in a separate part?

I think in the best case, your Macbook Air and your MBP 13 could be a rousing success. But I'm concerned that the 16" will never be able to compete with similarly-sized machines with Intel or AMD hardware.
 

Doug S

Member
Feb 8, 2020
78
91
51
ARM isn't that efficient at high clock speeds
No, that's not true at all. There's nothing special about "ARM" that makes it less efficient at high clock speeds. This is 100% a design decision, there's nothing stopping Apple or anyone else from designing an ARM core that targets frequencies of 5 GHz+. It just doesn't make sense in a cell phone where power draw is paramount - targeting more moderate frequencies and doing more per cycle saves power because power draw increases superlinearly versus frequency.

How fast can Apple make the GPU?
Is that really relevant? Apple isn't using the on chip GPU on all its current Macs, so why would it do so for ARM Macs? Seems like the logical course would be that the "Pro" machines use a discrete GPU from AMD or Nvidia, while the consumer line uses Apple's on chip GPU. I don't know how it compares to Intel's, but GPU performance scales pretty well with more units/transistors/etc. especially at the lower end so if it isn't fast enough "make it bigger" :)
 

awesomedeluxe

Junior Member
Feb 12, 2020
23
3
41
No, that's not true at all. There's nothing special about "ARM" that makes it less efficient at high clock speeds. This is 100% a design decision, there's nothing stopping Apple or anyone else from designing an ARM core that targets frequencies of 5 GHz+. It just doesn't make sense in a cell phone where power draw is paramount - targeting more moderate frequencies and doing more per cycle saves power because power draw increases superlinearly versus frequency.

...

Is that really relevant? Apple isn't using the on chip GPU on all its current Macs, so why would it do so for ARM Macs? Seems like the logical course would be that the "Pro" machines use a discrete GPU from AMD or Nvidia, while the consumer line uses Apple's on chip GPU. I don't know how it compares to Intel's, but GPU performance scales pretty well with more units/transistors/etc. especially at the lower end so if it isn't fast enough "make it bigger" :)
Thanks for your reply. I am wondering if you have any good reading material on ARM cores at higher clock speeds. My understanding was that, like all chips, clockspeed has an exponential relationship with power usage, but that ARM chips saw particularly poor return on clockspeed increase relative to power usage beyond a certain threshold.

As for the GPU, I think it is definitely relevant. Even in the base MBP16 available now, you are looking at a part that will likely consume more power than the CPU when it is on. You can't substantially redesign that machine using a part like that. Pairing it with an A14XYZ is not going to change the fact that this other part of the machine over here gets really really hot and needs space and fans to cool it down. If you are using a MBP16 that comes standard with an A14Y2K and some RDNA 3 part, what is the upside to this transition? You're left with a chassis that still has plenty of room to accommodate a Zen / *Lake part, so you're just being dragged through this for nothing?

I think Apple has to be willing to make a big GPU if they want to make "the best computer possible" in the 16" space. You are certainly right about how GPUs scale - I just think it remains to be seen how, if they decided shove 16 of their mystery GPU cores in there, they would stack up against the competition.
 

soresu

Golden Member
Dec 19, 2014
1,124
380
136
You are not making a fanless machine with an AMD GPU inside.
The new Exynos SoC's coming with RDNA derived GPU say otherwise - they will certainly be in future Samsung Galaxy devices.

Perhaps even this "Whitechapel" custom SoC being developed by Samsung for Google devices will have it too, using either Android or Fuchsia, could be either after this much time for Fuchsia to gestate.

Though I'm still inclined to believe that Fuchsia will replace the main Java foundation while keeping the Android name to maintain branding continuity - ala the NT kernel replacing the old legacy one for Win 95-Me as the Windows main kernel.

The fact that the Fuchsia team is working on it's own Android Runtime coupled with the focus on OpenGL ES to Vulkan work at the ANGLE project suggests that Fuchsia will have very high compatibility with Android apps, at least for apps that follow the more recent design rules for Android anyway - the recent stipulation that developers only submit new apps and updates that are ARMv8/A64 compatible may lessen the burden of compatibility some.
 

awesomedeluxe

Junior Member
Feb 12, 2020
23
3
41
The new Exynos SoC's coming with RDNA derived GPU say otherwise - they will certainly be in future Samsung Galaxy devices.
Interesting stuff -- I wasn't aware of that. Hard to guess what such a GPU looks like since Renoir uses Vega and there are no RDNA parts on SoCs, but...
techspot said:
Not surprising -- Navi was designed around the needs of the PS5 and XSX after all. And of course, we know what RDNA looks like scaled up, because that's the 50W TDP part that's in the MBP16 right now.

For a fanless design, I think the maximum TGP of a discrete GPU would be around 15W. And this would still be a triumph of computer engineering. Frankly, nVidia is in a much better position to provide such a part - the MX330 already has a low power version that meets this requirement and is not that much worse than the 5300M. This disparity could easily be fixed by a modern manufacturing process. But of course, nVidia won't do this, and Apple won't work with nVidia.

We have not really heard anything about Apple working on a discrete GPU, but I don't see how they get around doing so if they want the 16" to be a compelling product. Just, I don't know, put 20 A14 GPU cores somewhere with some HBM2E stacks and pray on it.

Maybe the real conclusion is that the 16" is just going to be the ugly stepsister - some Apple CPU with too many cores matched with an AMD GPU that triples the machine's power consumption whenever it turns on.
 

soresu

Golden Member
Dec 19, 2014
1,124
380
136
And of course, we know what RDNA looks like scaled up, because that's the 50W TDP part that's in the MBP16 right now.
Clockspeed/voltage has a dramatic effect on power efficiency - if you could run even Navi 10 at Polaris 10 clockspeeds you would notice a dramatic increase in power efficiency from RX 5700 XT, which is pushing the envelope of what the chip can do in order to stay power/perf competitive against the RTX 2070.

Given the current Samsung/RDNA GPU bench figures suggest a mere 2 TFLOP performance at most (2x SD 855) - coupled with the greater power efficiency of RDNA2, I would be extremely surprised if they do not hit their efficiency targets by the time it is finished considering it still has 9 to 12 months to mature.

Bare in mind that this is also their first truly mobile GPU since Imageon from ATI was sold to Qualcomm (becoming Adreno's foundation), so some teething problems are to be expected somewhat.
 
  • Like
Reactions: Tlh97

awesomedeluxe

Junior Member
Feb 12, 2020
23
3
41

I hope this ends this debate whether x86 CPUs or ARM CPUs are faster, and I hope we won't see anymore that ARM is better ISA than x86, for High Performance Computing.
Certainly persuasive. We have very few examples of ARM competing with x86 in like-kind environments. This even uses the same number of cores, which is great!

Still, unless I am missing something... the caveat here is power, right? The word "power" comes up in that article exactly 0 times. AMD lists the TDP of the EPYC 7742 at 225W, while the estimated TDP of Amazon's Graviton2 is 80-110W. This is super relevant in most of the machines Apple will be designing.

x86 cores are certainly better than ARM cores at handling multiple tasks. But that might matter less if you are designing a machine that's thermal constrained such that you can literally fit twice as many ARM cores.

And of course, while AMD has recently edged out Intel to become the performance leader in many x86 segments, no one comes close to Apple in ARM design.
 

Glo.

Diamond Member
Apr 25, 2015
3,792
1,767
136
Still, unless I am missing something... the caveat here is power, right? The word "power" comes up in that article exactly 0 times. AMD lists the TDP of the EPYC 7742 at 225W, while the estimated TDP of Amazon's Graviton2 is 80-110W. This is super relevant in most of the machines Apple will be designing.
Its 80-110W for 32 core design.

Edit. I mistaken it for that 80 core ARM CPU from Ampere which is 210W. Estimation is 80-110W for Graviton's 64 core design.
 
Last edited:

Doug S

Member
Feb 8, 2020
78
91
51

I hope this ends this debate whether x86 CPUs or ARM CPUs are faster, and I hope we won't see anymore that ARM is better ISA than x86, for High Performance Computing.
No, that only ends any debate on whether Graviton2 or AMD/Intel x86 CPUs are faster. It says nothing about "ARM CPUs" in general any more than the performance of Bulldozer said something about "x86 CPUs" in general.
 
  • Like
Reactions: scannall

DrMrLordX

Lifer
Apr 27, 2000
15,494
4,281
136
But that might matter less if you are designing a machine that's thermal constrained such that you can literally fit twice as many ARM cores.
Server rooms are always thermally constrained at some level. The question is whether or not your application(s) can benefit from adding more cores. If Graviton2 is indeed a 110w SoC or less, I could deploy twice as many sockets for Graviton2 as EPYC 7742 assuming scaling is there. There are also issues like VM response time to consider. Sometimes your application requires higher-frequency cores at the expense of efficiency, and that's why both AMD and Intel provide server CPUs to fill that niche.
 

soresu

Golden Member
Dec 19, 2014
1,124
380
136
No, that only ends any debate on whether Graviton2 or AMD/Intel x86 CPUs are faster. It says nothing about "ARM CPUs" in general any more than the performance of Bulldozer said something about "x86 CPUs" in general.
Especially considering that Graviton2 is based on what will soon be on the 2 generation old core A76 - I'm expecting A78 to be announced next week.

At the very least you get a 20% IPC boost from A77 (integer IPC boost which is typically weaker than FP), and probably somewhere between 10-20% for A78 on top of that.

Not to mention that ARM is also long overdue for an overhaul to their SIMD performance, hopefully with SVE2 on Matterhorn next year, albeit I'd expect AMD to move towards AVX512 at least a year before ARM does 512 bit SVE2 on their big cores.
 
  • Like
Reactions: Tlh97

soresu

Golden Member
Dec 19, 2014
1,124
380
136
Server rooms are always thermally constrained at some level. The question is whether or not your application(s) can benefit from adding more cores. If Graviton2 is indeed a 110w SoC or less, I could deploy twice as many sockets for Graviton2 as EPYC 7742 assuming scaling is there.
Liquid cooling should allow you to do even better.

I'm honestly surprised that the physical set up around liquid cooling loops look so clunky even in servers where space is at a premium inside server racks.
 

Doug S

Member
Feb 8, 2020
78
91
51
Especially considering that Graviton2 is based on what will soon be on the 2 generation old core A76 - I'm expecting A78 to be announced next week.

At the very least you get a 20% IPC boost from A77 (integer IPC boost which is typically weaker than FP), and probably somewhere between 10-20% for A78 on top of that.

Not to mention that ARM is also long overdue for an overhaul to their SIMD performance, hopefully with SVE2 on Matterhorn next year, albeit I'd expect AMD to move towards AVX512 at least a year before ARM does 512 bit SVE2 on their big cores.
Also not to mention that it is possible to do much better than ARM does core-wise, as Apple demonstrates.
 

soresu

Golden Member
Dec 19, 2014
1,124
380
136
Also not to mention that it is possible to do much better than ARM does core-wise, as Apple demonstrates.
I discount them, it's pointless to bicker over Apple cores when they don't play with others at all. If you don't buy Apple products (as I do) they are effectively an interesting/glamorous non entity.

If they ever decide to license the core I'll sing a merrier tune mind you - and on that day I will dance a jig for the flying pigs saluting Lucifer's frozen backside in the seventh circle of hell.

Part of the magic of the ARM license ecosystem is that anyone can tap in (well usually Huawei mutters...), unfortunately sometimes that ends with companies that did not properly consider the market before doing so.

I think it's very unlikely to happen anytime soon, but it would be interesting to see if ARM will go open in the next 15 years as POWER has.
 
Last edited:
  • Like
Reactions: Tlh97

Doug S

Member
Feb 8, 2020
78
91
51
I discount them, it's pointless to bicker over Apple cores when they don't play with others at all. If you don't buy Apple products (as I do) they are effectively an interesting/glamorous non entity.

If they ever decide to license the core I'll sing a merrier tune mind you - and on that day I will dance a jig for the flying pigs saluting Lucifer's frozen backside in the seventh circle of hell.

Part of the magic of the ARM license ecosystem is that anyone can tap in (well usually Huawei mutters...), unfortunately sometimes that ends with companies that did not properly consider the market before doing so.

I think it's very unlikely to happen anytime soon, but it would be interesting to see if ARM will go open in the next 15 years as POWER has.
You can't just decide to "discount them" because Apple doesn't sell their cores on the open market. The point isn't that Apple doesn't sell server chips on the open market, the point is that Apple proves it is possible to design ARM cores that perform much better than ARM designed cores. If they can do it, someone else can do it too.

People trying to make arguments that Graviton2 benchmarks somehow "prove" that x86 is better than ARM are bad enough, but deciding the evidence of Apple's cores don't count because "they don't play well with others" is beyond ridiculous. At that point one should admit they are biased towards x86 and won't accept any evidence that runs counter to their bias.
 

ASK THE COMMUNITY