Solved! ARM Apple High-End CPU - Intel replacement

Page 20 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Richie Rich

Senior member
Jul 28, 2019
470
229
76
There is a first rumor about Intel replacement in Apple products:
  • ARM based high-end CPU
  • 8 cores, no SMT
  • IPC +30% over Cortex A77
  • desktop performance (Core i7/Ryzen R7) with much lower power consumption
  • introduction with new gen MacBook Air in mid 2020 (considering also MacBook PRO and iMac)
  • massive AI accelerator

Source Coreteks:
 
  • Like
Reactions: vspalanki
Solution
What an understatement :D And it looks like it doesn't want to die. Yet.


Yes, A13 is competitive against Intel chips but the emulation tax is about 2x. So given that A13 ~= Intel, for emulated x86 programs you'd get half the speed of an equivalent x86 machine. This is one of the reasons they haven't yet switched.

Another reason is that it would prevent the use of Windows on their machines, something some say is very important.

The level of ignorance in this thread would be shocking if it weren't depressing.
Let's state some basics:

(a) History. Apple has never let backward compatibility limit what they do. They are not Intel, they are not Windows. They don't sell perpetual compatibility as a feature. Christ, the big...

lobz

Platinum Member
Feb 10, 2017
2,057
2,856
136
It's not about whether last level cache is L2, L3 or L4, it's still shared last level cache - slow compared to core's private caches. Apple could easily extract more performance from their core by introducing middle level cache like Intel and AMD have done, by something like 1MB private L2 cache which would have about half latency of last level cache. In phone SOC that performance can't be extracted within power limits so it isn't implemented but for higher performing higher power environment it is easy way to increase performance.
I'm no match for spin doctors. I'm also not spending my Sunday linking articles to you. Let's just say you're right.
 

naukkis

Senior member
Jun 5, 2002
702
571
136
I'm no match for spin doctors. I'm also not spending my Sunday linking articles to you. Let's just say you're right.

Apple A13 cache arrangement is pretty much like with Intel core duo, even L2 latency seems to be same. 3-level cache architecture just performs better although also uses more transistors. IF Apple will aim higher performance with their cores they most likely will switch to 3-level cache architecture too.
 
  • Like
Reactions: Richie Rich

lobz

Platinum Member
Feb 10, 2017
2,057
2,856
136
Apple A13 cache arrangement is pretty much like with Intel core duo, even L2 latency seems to be same. 3-level cache architecture just performs better although also uses more transistors. IF Apple will aim higher performance with their cores they most likely will switch to 3-level cache architecture too.
You're pretty much talking to yourself at this point, because I 100% disagree with you on the cache front.
 

soresu

Platinum Member
Dec 19, 2014
2,617
1,812
136
Sadly RK3588 seems delayed to Q3+ 2020, but maybe with a new GPU uArch backing up the A76/A55 CPU cores:

Rockchip-Processor-Roadmap-2020-1.jpg

It says "Natt" GPU, probably meaning Nátt corresponding to Nótt, the Norse goddess of night - though this may simply refer to a lesser GPU core like G57 based on the Valhall uArch.
 

Nothingness

Platinum Member
Jul 3, 2013
2,371
713
136
Not sure what you're getting at here. Isn't all the lavish praise directed towards the Apple A series on account of its much higher IPC, comparative to x86 CPUs?
Yes the IPC is impressive, but it's far from telling the whole story. What matters is the performance including frequency. My point is that yes A13 is crushing the competition from the IPC point of view but also is competitive against some high-end x86 chips (at least for ST as @lobz rightly noted). That's what I find impressive.
 

naukkis

Senior member
Jun 5, 2002
702
571
136
You're pretty much talking to yourself at this point, because I 100% disagree with you on the cache front.

By what? I response to claim that Apples cpu have some unfair advantage in cache. It doesn't, it have similar cache subsystem that x86 cpu's abandoned years ago for better performing more leveled solution. L1 cache size advantage is coming from different virtual memory page sizes, all those Intel, AMD and Apple cpu's use similar L1-cache layout, 8-way associated VIVT cache as it seems to be some kind of sweet spot. With x86's 4KiB pages that makes 32KiB caches, Apple didn't have to mind so much about backward compatibility so they increased page size to 16KiB and get 128KiB L1 caches with same cache arrangement.
 

avAT

Junior Member
Feb 16, 2015
24
10
81
So according to you, Apple is going to dump x86-64 and go all in on ARM in roughly 5 years. How are they going to tackle compatibility with all of the x86 software?

I hope this is the year that Apple makes their plans (or lack of plans) clear, so the same discussion can stop going around in endless circles.

Last time Apple switched architectures it took 6 months of pre-announcement and then another 6 months of product rollouts. So, 1 year. Any software that wasn't ready was left to emulation (Rosetta), such as Photoshop which didn't officially support Intel for an additional 8 months and MS Office (16 months).

There are good arguments that it would be harder this time and good arguments that it would be easier this time. Only Apple knows what they will do and can change their minds up until the minute they actually make an announcement.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Yes the IPC is impressive, but it's far from telling the whole story. What matters is the performance including frequency. My point is that yes A13 is crushing the competition from the IPC point of view but also is competitive against some high-end x86 chips (at least for ST as @lobz rightly noted). That's what I find impressive.

Yeah but lets get real here. Intel's current desktop lineup is based on a core that is almost 5 years old. Almost 5 years is an eternity in the tech industry. And what of AMD? AMD has basically just attained parity with Intel more or less, depending on the type of workload. If AMD is really going to bypass Intel, Zen 3 will be their defining moment. Personally I'm hoping for another double digit IPC increase, and significantly reduced memory latency.

Though that's not to say that I'm defending Intel's ineptitude or inability to execute. All I'm saying is that if Intel had executed according to their usual pace, we probably wouldn't even be having this discussion.
 

Nothingness

Platinum Member
Jul 3, 2013
2,371
713
136
Yeah but lets get real here. Intel's current desktop lineup is based on a core that is almost 5 years old. Almost 5 years is an eternity in the tech industry. And what of AMD? AMD has basically just attained parity with Intel more or less, depending on the type of workload. If AMD is really going to bypass Intel, Zen 3 will be their defining moment. Personally I'm hoping for another double digit IPC increase, and significantly reduced memory latency.

Though that's not to say that I'm defending Intel's ineptitude or inability to execute. All I'm saying is that if Intel had executed according to their usual pace, we probably wouldn't even be having this discussion.
I believe that we have reached some kind of plateau and that we will never see speed increases as in the past (note I'm talking about speed, not IPC). A sign of that is that all micro architectures have converged and are quite similar. Of course there are many details that can make a chip faster and a few processes left before we hit a physic wall but I don't expect large speedups any more.

Of course this is all speculation and I could be very wrong, only time will tell. The fact is that if I look at the current landscape the Apple chip is really close to the fastest Intel/AMD for single thread tasks (at least for the ones that were measured). The other fact is that it doesn't bring me any benefit as there's no usable computer with that chip.
 
  • Like
Reactions: Carfax83

Nothingness

Platinum Member
Jul 3, 2013
2,371
713
136
By what? I response to claim that Apples cpu have some unfair advantage in cache. It doesn't, it have similar cache subsystem that x86 cpu's abandoned years ago for better performing more leveled solution. L1 cache size advantage is coming from different virtual memory page sizes, all those Intel, AMD and Apple cpu's use similar L1-cache layout, 8-way associated VIVT cache as it seems to be some kind of sweet spot. With x86's 4KiB pages that makes 32KiB caches, Apple didn't have to mind so much about backward compatibility so they increased page size to 16KiB and get 128KiB L1 caches with same cache arrangement.
You didn't mean VIVT here :)
 

soresu

Platinum Member
Dec 19, 2014
2,617
1,812
136
@soresu

That's really disappointing. I'm glad I didn't hold out for an SBC based on that chip. Even though it would be much easier to use than this phone . . .
Bare in mind that there are still 3.5 to 4 months till the next big round of ARM announcements, it could be that Natt/Nott is G78 and Rockchip are simply bound by NDA to stay quiet, like nVidia using the Hercules codename in place of the obvious A78 core for Tegra Oren in their announcement.
 

DrMrLordX

Lifer
Apr 27, 2000
21,582
10,785
136
Bare in mind that there are still 3.5 to 4 months till the next big round of ARM announcements, it could be that Natt/Nott is G78 and Rockchip are simply bound by NDA to stay quiet, like nVidia using the Hercules codename in place of the obvious A78 core for Tegra Oren in their announcement.

Possibly. It might even be A77 instead of A76 if we're lucky. But I am not holding my breath.
 
  • Like
Reactions: soresu

lobz

Platinum Member
Feb 10, 2017
2,057
2,856
136
By what? I response to claim that Apples cpu have some unfair advantage in cache. It doesn't, it have similar cache subsystem that x86 cpu's abandoned years ago for better performing more leveled solution. L1 cache size advantage is coming from different virtual memory page sizes, all those Intel, AMD and Apple cpu's use similar L1-cache layout, 8-way associated VIVT cache as it seems to be some kind of sweet spot. With x86's 4KiB pages that makes 32KiB caches, Apple didn't have to mind so much about backward compatibility so they increased page size to 16KiB and get 128KiB L1 caches with same cache arrangement.
Good for you. I can't express myself more clearly.
 

Richie Rich

Senior member
Jul 28, 2019
470
229
76
Possibly. It might even be A77 instead of A76 if we're lucky. But I am not holding my breath.
Huawei should make Kirin 1000 at 5nm TSMC... but among Cortex cores only Hercules supports 5nm. Does it mean that Kirin 1000 will be the first A78/Hercules SoC?


OK there's the hard job of making it work with 4x the cores, with more memory bandwidth, with the IO that's missing compared to server and desktop chips, with software multithreading… but then?
Hard job is creating high IPC core like Apple's A13 - scalar performance always was a holy graile of processor architecture.
Copy paste and connect cores together is much easier job. You can see that on licensed Cortex cores. There are so many companies creating their own SoC configurations. Even start up company like Cavium with ThunderX was able to route 48 cores without any problem.
 

soresu

Platinum Member
Dec 19, 2014
2,617
1,812
136
Huawei should make Kirin 1000 at 5nm TSMC... but among Cortex cores only Hercules supports 5nm. Does it mean that Kirin 1000 will be the first A78/Hercules SoC?
That's not the way ARM works, look at A72 on 28nm HKMG processes (RK3399, MT8173, RPi 4) instead of the 16/14nm finFET it was touted at on Kirin 950/955.

ARM makes their core IP's to be used on several processes, it allows them to maximise possible licensees and marketshare.

Likewise they will port cores to newer processes after the initial core announcement, as with Kirin 970/A73 on 10nm, or Kirin 990/A76 on 7nm+.
Copy paste and connect cores together is much easier job. You can see that on licensed Cortex cores.
Weeelll, no.

It's not easier because their synthesizable cores have to appeal to multiple licensees with differing needs - Apple can do whatever they like because it's their own products the core is going in, a captive market.

Also, until Softbank acquired them ARM had a much lower cap on possible R&D funds compared to a leviathan like Apple.
 
Last edited:

soresu

Platinum Member
Dec 19, 2014
2,617
1,812
136
Even start up company like Cavium with ThunderX was able to route 48 cores without any problem.
Cavium did not use ARM licensed core IP.

ThunderX 1st gen used their own custom 2 issue core, 2nd gen used the Broadcom Vulcan core, and 3rd gen (under Marvell management) is Vulcan derivative called Triton.
 

DrMrLordX

Lifer
Apr 27, 2000
21,582
10,785
136
Huawei should make Kirin 1000 at 5nm TSMC... but among Cortex cores only Hercules supports 5nm. Does it mean that Kirin 1000 will be the first A78/Hercules SoC?

Ehhhhh I dunno. Might be a stretch. Regardless, Huawei isn't making any of their recent cores availabe in the United States in the form of HiKey SBCs anyway, so unless I'm going to buy a Huawei phone, I doubt it's going to matter to me much anyway. Not that Huawei is bothered by such things (they aren't).
 

Richie Rich

Senior member
Jul 28, 2019
470
229
76
I mean this roadmap:
Arm_Client_Compute_CPU_roadmap_lo-res_FINAL.JPG


This is the source:

In 2019, the CPU codenamed ‘Hercules’ will be available to Arm partners. ‘Hercules’, also based on DynamIQ technology, will be optimized for both the latest 5nm and 7nm nodes. ‘Hercules’ continues the trajectory of increased compute performance, while also improving power and area efficiency by 10 percent (in addition to the efficiency gains achievable from the 5nm process node).


When first 5nm users will be Apple and Huawei.... according to this roadmap it looks like new Kirin will (or have to) use new Hercules core. Nvidia will use the Hercules/A78 core in Orin too (but not on 5nm). It looks like Huawei is pushing development on the edge in compare to Qualcomm with his 7nm A77 (Snapdragon 865). Do you thing that Huawei did dieshrink of A77 to 5nm instead A78?

Wouldn't be nice to have 16-core A78 laptop SoC instead of 8cx (8-core A76)? On 5nm it would be almost same size like 8cx...
 

DrMrLordX

Lifer
Apr 27, 2000
21,582
10,785
136
Wouldn't be nice to have 16-core A78 laptop SoC instead of 8cx (8-core A76)? On 5nm it would be almost same size like 8cx...

Sure! Though I'd rather have it in a bare SBC than a laptop for my own purposes. Either way, unless Qualcomm decides to sell us such a chip, I doubt we'll see one in the 'States anytime soon.
 

soresu

Platinum Member
Dec 19, 2014
2,617
1,812
136
Do you thing that Huawei did dieshrink of A77 to 5nm instead A78?
HiSilicon jumped straight to A76 with the Kirin 980 after using A73 with K960 and K970.

It would not surprise me if they did the same this coming generation by jumping to A78.
Wouldn't be nice to have 16-core A78 laptop SoC instead of 8cx (8-core A76)? On 5nm it would be almost same size like 8cx...
You are forgetting power consumption - 5nm does not bestow such a great advantage over 7nm in that arena to give you 2x the cores without a significant power increase.

Maybe at N5P or N3.

Even so it's pointless to make such a chip without an abundance of MT ARM64 software to take advantage of a 16C machine - sadly this is still a sticking point for ARM.

Also SD 8cx is a 4C/4C A76/A55 big little CPU configuration, not 8C A76.
 
Last edited:
  • Like
Reactions: Tlh97

Thala

Golden Member
Nov 12, 2014
1,355
653
136
HiSilicon jumped straight to A76 with the Kirin 980 after using A73 with K960 and K970.

It would not surprise me if they did the same this coming generation by jumping to A78.

You are forgetting power consumption - 5nm does not bestow such a great advantage over 7nm in that arena to give you 2x the cores without a significant power increase.
Also SD 8cx is a 4C/4C A76/A55 big little CPU configuration, not 8C A76.

SQ1 is 7W TDP including GPU (a 2.3TFlop GPU mind you?). I dont think it is unreasonable to assume a 16 core A78 SoC within 15W TDP at 5nm - possibly at lowered clocks when running all-core.
 
  • Like
Reactions: Tlh97 and Etain05

Nothingness

Platinum Member
Jul 3, 2013
2,371
713
136
First Cortex-A77 Geekbench scores are starting to appear in Geekbench: type 3341 (=0xD0D Cortex-A77 CPU ID) in the search window.

For instance a Samsung SM-G986N gets 925/3230. Frequency was about 2.8 GHz during the run.

That'd be, for single thread, about 20% above Cortex-A76. And (keeping in mind cross OS comparisons can be misleading) about the speed of Apple A11.