Solved! ARM Apple High-End CPU - Intel replacement

Page 46 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

name99

Member
Sep 11, 2010
159
144
116
I suspect that the Dev machine is likely quite I/O hobbled. While I have no factual evidence to back this up, it strikes me as odd for Apple to have integrated the kind of storage and ram throughput that a desktop system would normally have into the A12Z, thus limiting its performance in several metrics as opposed to what a production system that ships later this year might have. I suspect that we might see low end laptops ship with dual channel LPDDR4X (about half the width of an equivalent dual channel DDR4 volume laptop from AMD and Intel has as LPDDR4X has half the bits per channel) and may also only include two PCIe lanes for the NVME SSD. Higher end MacBook Pro systems might come with their higher end SoC that has twice the memory channels and perhaps 16 external PCIe lanes, 8 for a dGPU and 2 X 4 for NVME SSDs.

The Dev platform, being based on the A12Z, is likely memory bandwidth constrained and storage I/O constrained in a way that a production system won't be, meaning that it may not be a good measuring stick for achievable performance for shipping systems while still giving usable performance for dev work.
Even for this initial dev machine it's not bad as you seem to expect :)

Storage:
- you do realize that the internal SSDs on any mac with a T2 (so MacBook Pro, iMac Pro, mac mini, even Mac Pro) are fed by the APPLE SSD Controller on the T2 chip, right? And they hit around 2.8GB/s read or write, last time I checked -- certainly fast enough that no-one's complaining. And T2 is based on an A10; who knows what they did to boost the flash controller in the A11 and A12?

Memory bandwidth:
- well let's look at the numbers. GB5 no longer bothers to test bandwidth because it's no longer an especially useful number, especially in isolation. But we can look at the GB4 numbers

So that's 31GB/s for iPod Pro (using a single memory channel) vs 38.5 for iMac. (The numbers are basically the same for the 2019 iMac, 40 rather than 38.5).
Not a catastrophe even today, and Apple can likely hit the iMac numbers in the A14X/Z just by using LP-DDR5, let alone moving to two memory channels...
 
  • Like
Reactions: Richie Rich

marcUK2

Member
Sep 23, 2019
40
32
51
A12Z/X on first generation 7nm was ~25% smaller in terms of die size than Renoir on a more refined node, regardless of transistor count. That doesn't mean the extra size of Renoir won't be well spent on a desktop application, but it is far from an apples to apples comparison.

OTOH it's not even funny how far ahead ARM cores are in the low-power space. An x86 analogue that matches the A13 and snapdragon 865 in performance & power doesn't exist, indeed if the recent benchmarks of Intel's Lakefield are any indication x86 has a long way to go to make their ARM counterparts worry.

There is no guarantee that Intel & AMD will be able to keep up their cadence of improvements as well, Zen 2 & Ice Lake are not exactly impressive architectures when compared to Apple's in-house work, and given the next few years will be small refinements to said architecture. That's not a high bar for Apple's team to stay ahead and be competitive for at least the next 5 years or so.
But both A12 and Renoir have 10 billion transistors and both are SOC, so if the A12 is using 25% less die area when both are on 7nm, only suggests that they are packing the transistors more densely, which negatively affects clockspeed and positively effects efficirency. Therefore we can be sure why A series chips have a 2GHZ clockspeed deficiency over x86, that cant be overcome in this arrangement.

OTOH, as you say....yes more efficient, but at half the clockspeed! Ramp up that A12 to 5GHZ, if that is even possible and show me the TDP then!

Now that x86 is fundamentally RISC, and A12 is RISC, and the microcode cracker has insignificant impact on performance, there really is not ever going to be a great difference between A or x86, given the same process and transistor count. Sure one can optimize for this, the other for that, but there are compromises, in either going for effeciency or raw performance. Its not likely that one will have both...and even if one day ARM is 2x the overall than x86. Whats to stop AMD licencing ARM, and bolting an x86 uOps decoder onto it, to get the same performance?

I suspect that AMD and Intel will be all out for the next 5 years, ZEN 3 and ZEN 4/5 guarantee that coz Intel is being majorly embarassed, even if it isnt effecting their bottom line yet. Hopefully Apple can keep up, I hope they do, I detest the company, but ARM and technology is cool.
 

Richie Rich

Senior member
Jul 28, 2019
421
196
76
But both A12 and Renoir have 10 billion transistors and both are SOC, so if the A12 is using 25% less die area when both are on 7nm, only suggests that they are packing the transistors more densely, which negatively affects clockspeed and positively effects efficirency. Therefore we can be sure why A series chips have a 2GHZ clockspeed deficiency over x86, that cant be overcome in this arrangement.
You are absolutely correct here. x86 gains performance via high freq with narrow core just like P4 did and Apple's A12 gains performance via very wide advanced architecture using its 6xALUs at low freq (like K8 did with 3xALU vs. 2xALU P4). It's clear which approach is the winning one.

OTOH, as you say....yes more efficient, but at half the clockspeed! Ramp up that A12 to 5GHZ, if that is even possible and show me the TDP then!
By using high frequency library Apple A12/A13 could achieve 3,8-4 GHz probably. And you can bet that it would be more efficient than x86 cores. The efficiency behind ARM cores is due to a lot of hard work done in smart efficient design for mobile enviroment. If you search for power hungry ARM design you can find Nvidia Denver or Samsung Mongoose cores - those are slow (IPC) and power hungry. Partly because Samsung team consisted of dragged ex-AMD team (responsible for Bobcat and Jaguar cores). Much worse team behind terrible BD later created Zen1. IMHO Samsung Mongoose is much better design than Zen1 but in much tough mobile enviroment it was a big fail. While much worse Zen1 is celebrated as a great design in x86 world. Clash of those two world will be really epic.

Now that x86 is fundamentally RISC, and A12 is RISC, and the microcode cracker has insignificant impact on performance, there really is not ever going to be a great difference between A or x86, given the same process and transistor count. Sure one can optimize for this, the other for that, but there are compromises, in either going for effeciency or raw performance. Its not likely that one will have both...and even if one day ARM is 2x the overall than x86. Whats to stop AMD licencing ARM, and bolting an x86 uOps decoder onto it, to get the same performance?
There are some license limits. I doubt you can use ARM's IP for modification and using for another ISA. AMD did GREAT MISTAKE by canceling their K12 ARM core. No doubt about that today. Back in 2015 I considered that as good move but not today. Keller was right again. The thing is that Nvidia is licensing Cortex A78 cores so Nvidia can finally beat AMD with Cortex CPUs. Isn't that funny? I guess some AMD hard core fans could have brain stroke from that.


I suspect that AMD and Intel will be all out for the next 5 years, ZEN 3 and ZEN 4/5 guarantee that coz Intel is being majorly embarassed, even if it isnt effecting their bottom line yet. Hopefully Apple can keep up, I hope they do, I detest the company, but ARM and technology is cool.
No. Apple is the CPU tech leader. They have about 5 years of advantage to x86 now. They will offer double performance per watt in laptops, the new A14 will be the most powerfull CPU in ST and will beat Zen3@4.6 GHz easily.

ARM is the ISA leader now. Upcoming ARMv9 and SVE2 2048-bit capable SIMDs. Fugaku super computers thanks to SVE can destroy GPU based super computers. And you will have smartphones with ARMv9 SVE2 in H2 2021.


Intel and AMD will have to fight very hard for survival. With the current speed of their development they are dead already IMO.
 

marcUK2

Member
Sep 23, 2019
40
32
51
No. Apple is the CPU tech leader. They have about 5 years of advantage to x86 now. They will offer double performance per watt in laptops, the new A14 will be the most powerfull CPU in ST and will beat Zen3@4.6 GHz easily.

ARM is the ISA leader now. Upcoming ARMv9 and SVE2 2048-bit capable SIMDs. Fugaku super computers thanks to SVE can destroy GPU based super computers. And you will have smartphones with ARMv9 SVE2 in H2 2021.


Intel and AMD will have to fight very hard for survival. With the current speed of their development they are dead already IMO.
I know your a bit of a fanboy, and thats OK, I dont have enough time to research this, so you can have the argument. My point is I dont really care if a laptop is super efficient or has 20 hours batterylife...becuase I dont really use laptops that much.

What I do care about is absolute performance, and a 64core Zen 4 is obviously a given in the near future, and I dont care if it makes 250W. If Apple is only going to produce high efficency chips with great PPC, thats great, but will there be equivalent Aseries chips that stomp all over Zen 4 threadrippers??
<edit> And costs less than $50000 LOL
 

LightningZ71

Senior member
Mar 10, 2017
351
243
86
Even for this initial dev machine it's not bad as you seem to expect :)

Storage:
- you do realize that the internal SSDs on any mac with a T2 (so MacBook Pro, iMac Pro, mac mini, even Mac Pro) are fed by the APPLE SSD Controller on the T2 chip, right? And they hit around 2.8GB/s read or write, last time I checked -- certainly fast enough that no-one's complaining. And T2 is based on an A10; who knows what they did to boost the flash controller in the A11 and A12?

Memory bandwidth:
- well let's look at the numbers. GB5 no longer bothers to test bandwidth because it's no longer an especially useful number, especially in isolation. But we can look at the GB4 numbers

So that's 31GB/s for iPod Pro (using a single memory channel) vs 38.5 for iMac. (The numbers are basically the same for the 2019 iMac, 40 rather than 38.5).
Not a catastrophe even today, and Apple can likely hit the iMac numbers in the A14X/Z just by using LP-DDR5, let alone moving to two memory channels...
Oh, I didn't expect it to be dog slow or anything. I expect that it'll perform quite well, especially given what we see the iPad pro being capable of. My concern is more with people seeing the performance of the dev platform and thinking that that's all the full dress version of the platform will have! From what we know about the A12X/Z, they are balanced implementations, given the I/O capabilities that they need for their intended market. With the heavy focus that the targeted platforms had on ST performance and managing light background tasks, I doubt that massive I/O bandwidth and Memory bandwidth was needed beyond what's there. When given the ability to go 4+4 with higher power limits, and with a greater emphasis on multi-threading, I feel that Apple will outfit the A14X/Z whatever they call it, with the capability to have a higher bandwidth RAM solution and a faster storage subsystem. That's the point that I was trying to make.
 
  • Like
Reactions: scannall

LightningZ71

Senior member
Mar 10, 2017
351
243
86
I know your a bit of a fanboy, and thats OK, I dont have enough time to research this, so you can have the argument. My point is I dont really care if a laptop is super efficient or has 20 hours batterylife...becuase I dont really use laptops that much.

What I do care about is absolute performance, and a 64core Zen 4 is obviously a given in the near future, and I dont care if it makes 250W. If Apple is only going to produce high efficency chips with great PPC, thats great, but will there be equivalent Aseries chips that stomp all over Zen 4 threadrippers??
<edit> And costs less than $50000 LOL
But, is there really a NEED to make something like that? Precious few actually purchase the big XEON Mac Pros. How would the Return On Investment even look for something like that implemented in the fashion of A14? We know that the majority of the target market for MAc Pros are using it for A/V work, most of which can be offloaded to dedicated GPUs. The CPU cores don't often come into play for anything that's very heavy or time sensitive. The biggest issue there is storage. Local storage shouldn't be a problem for them, as that's just I/O configuration and not overly complicated to integrate. For anything that does require massive amounts of CPU horsepower, what can't be shipped off to a cloud? Why can't a Mac Pro be a virtualization portal instead? What makes it Pro is the local IO and storage capbilities, and Apple sells CPU horsepower as a service via the cloud. Apple has said that bootcamp is going away, so no booting into Windows anymore.

I really don't see Apple doing anything more potent for their desktop products than a 4+4 big.big chip based on their current evolving architecture, but enhanced with a health PCIe bus implementation to support a bunch of external I/O, like thunderbolt ports, USB 3.2/4 ports, and PCIe slots for cards and storage.
 

marcUK2

Member
Sep 23, 2019
40
32
51
But, is there really a NEED to make something like that? Precious few actually purchase the big XEON Mac Pros. How would the Return On Investment even look for something like that implemented in the fashion of A14? We know that the majority of the target market for MAc Pros are using it for A/V work, most of which can be offloaded to dedicated GPUs. The CPU cores don't often come into play for anything that's very heavy or time sensitive. The biggest issue there is storage. Local storage shouldn't be a problem for them, as that's just I/O configuration and not overly complicated to integrate. For anything that does require massive amounts of CPU horsepower, what can't be shipped off to a cloud? Why can't a Mac Pro be a virtualization portal instead? What makes it Pro is the local IO and storage capbilities, and Apple sells CPU horsepower as a service via the cloud. Apple has said that bootcamp is going away, so no booting into Windows anymore.

I really don't see Apple doing anything more potent for their desktop products than a 4+4 big.big chip based on their current evolving architecture, but enhanced with a health PCIe bus implementation to support a bunch of external I/O, like thunderbolt ports, USB 3.2/4 ports, and PCIe slots for cards and storage.
So Apple are going to make great low/mid range machines with great efficient chips with acceptable performance for most average people. Well ok if that floats the boat, but if A series cannot beat x86 in raw performance regardless of how it is done, how does this prove that ARM is vastly superior beyond a limited application?

15W intel machines already offer great acceptable performance for most people, who really dont think much at all about it all anyway Should I care if Apple can do it with 8W???
 

Richie Rich

Senior member
Jul 28, 2019
421
196
76
I know your a bit of a fanboy, and thats OK, I dont have enough time to research this, so you can have the argument. My point is I dont really care if a laptop is super efficient or has 20 hours batterylife...becuase I dont really use laptops that much.

What I do care about is absolute performance, and a 64core Zen 4 is obviously a given in the near future, and I dont care if it makes 250W. If Apple is only going to produce high efficency chips with great PPC, thats great, but will there be equivalent Aseries chips that stomp all over Zen 4 threadrippers??
<edit> And costs less than $50000 LOL
BTW I'm AMD fan but I struggle to be fan of something what is terribly technically outdated in compare to 6xALU monster from fruit company.

There is few hints about Apple's server chip:
  • Apple needs a huge server power to run all their cloud services.
  • Apple is in lawsuite with Nuvia server start up (ex-Apple Gerard Williams) which doesn't make sense unless there is a clash Nuvia vs. Apple server CPU
  • Apple is going to move whole line up to ARM including Mac Pros with Intel XEONs in two years

All these suggests that Apple is cooking probably big server grade CPU. At latest 2022 on A16 uarch. NUMA or chiplet design or interposer/EMIB design? Who knows. Expensive? You bet. But probably the best performance you can get.


Regarding Zen4 core: New Cortex X1 has 40% higher PPC/IPC than Zen2. Do you really think that Zen3 and Zen4 can chase down those 40% IPC deficit? Unless Zen3 is 6xALU beast with SMT4 based on Keller's work, I really doubt. Even then there is still huge power hungriness directly causing a thermal throttling. However if AMD did shortened a bit critical paths in their stage design, then Zen3 could be the last AMD CPU which can get close to 5GHz. Zen4 on 5nm will suffer from local overheating even more.
 

gdansk

Senior member
Feb 8, 2011
475
131
116
Regarding Zen4 core: New Cortex X1 has 40% higher PPC/IPC than Zen2. Do you really think that Zen3 and Zen4 can chase down those 40% IPC deficit? Unless Zen3 is 6xALU beast with SMT4 based on Keller's work, I really doubt. Even then there is still huge power hungriness directly causing a thermal throttling. However if AMD did shortened a bit critical paths in their stage design, then Zen3 could be the last AMD CPU which can get close to 5GHz. Zen4 on 5nm will suffer from local overheating even more.
X1 is advertised "up to" 3.3GHz. On 5nm. Zen 2 is already hitting 4.7GHz. On 7nm. Even if it was 40% higher IPC (and it won't be based on the decode width and number of execution units) then it is still behind current generation Zen 2 chips (3.3 x 1.4 = 4.62 which is less than 4.7).
 

Glo.

Diamond Member
Apr 25, 2015
3,877
1,809
136

800 Pts in Geekbench v5 on MacOS Big Sur single core score, 2600 pts Multicore score. A14Z, latest, and greatest from Apple. 4C/4C design.

2020 MacBook Air with 2C/4T:
1005 Pts, single threaded, 2000 pts multithreaded score.

Essentially means that ARM still has large disadvantage to x86, even in Apple designs.

Secondly, the scores in iOS platform, are extremely skewed by the platform's performance. On MacOs, the scores in GB5 are lower than on iOS. Which means, that simply iOS platform is extremely well optimized.

There is no more equal level comparison right now between both arch's on one platform, now. So yeah, ARM v9 at best will be tying with x86 Intel's. But still might be losing to AMD's designs.

I wonder what will Richie Rich say about those scores of A14Z under MacOS in GB5...
 

IvanKaramazov

Junior Member
Jun 29, 2020
5
6
36
Glo, unless I'm mistaken those Geekbench results are under x86 emulation. So even if we're extremely generous to Apple and assume they're achieving 70% efficiency under emulation, that would suggest native arm results of around 1100 and 3700. Wouldn't it?
 

Glo.

Diamond Member
Apr 25, 2015
3,877
1,809
136
Glo, unless I'm mistaken those Geekbench results are under x86 emulation. So even if we're extremely generous to Apple and assume they're achieving 70% efficiency under emulation, that would suggest native arm results of around 1100 and 3700. Wouldn't it?
That is IF it is under emulation, and if it is with only 70% efficiency.

What I always said is that iOS is extremely well optimized for those CPUs, alongside the compiler, and the compiler in iOS had trmendous affect on performance of the apps.
 

IvanKaramazov

Junior Member
Jun 29, 2020
5
6
36
Right, we don't know the efficiency of Rosetta 2 which makes comparisons right now iffy. However, considering those scores are emulated, using only the 4 big cores (the iPad in Geekbench engages the 4 little cores at the same time, unlike here), and apparently slightly underclocked compared to the iPad version, these seem like reasonable results for a 2-year old CPU.
 
  • Like
Reactions: Lodix

Glo.

Diamond Member
Apr 25, 2015
3,877
1,809
136
Yep. Presentation of Shadow of the Tomb Raider was done on A12Z, on the very Mac Mini development kit.

Vega 6 from Renoir averaged in 1080p, in medium preset 14 FPS. In Apple's presentation, on Apple Silicon it was running at at least 30 FPS because it was so smooth.

So I'd say Rosetta is doing pretty efficient job...
 
Last edited:

IvanKaramazov

Junior Member
Jun 29, 2020
5
6
36
Yep. Presentation of Shadow of the Tomb Raider was done on A14Z, on the very Mac Mini development kit.
Small point, but it's the A12Z. It's the current iPad SoC but the CPU is unchanged from the 2018 A12X. The A14 line, some variant of which will power the first ARM Macs, doesn't release until the fall, and represents 2 years of architectural improvements and a die shrink over the A12Z in the DTK.
 
  • Like
Reactions: Etain05

Glo.

Diamond Member
Apr 25, 2015
3,877
1,809
136
Small point, but it's the A12Z. It's the current iPad SoC but the CPU is unchanged from the 2018 A12X. The A14 line, some variant of which will power the first ARM Macs, doesn't release until the fall, and represents 2 years of architectural improvements and a die shrink over the A12Z in the DTK.
My brain farts today way too often. Why do I even think about A14...? o_O

Yes, A12Z. Crap :(
 

Richie Rich

Senior member
Jul 28, 2019
421
196
76
X1 is advertised "up to" 3.3GHz. On 5nm. Zen 2 is already hitting 4.7GHz. On 7nm. Even if it was 40% higher IPC (and it won't be based on the decode width and number of execution units) then it is still behind current generation Zen 2 chips (3.3 x 1.4 = 4.62 which is less than 4.7).
Up to 3.3 GHz for mobile devices. They aim for smartphone 3GHz with X1 on 5nm. But Ampere Altra server CPU comes at 3.0 Ghz with 3.3 GHz turbo on 7nm. There is rumour about Snapdragon 8CX at 3.1 GHz at 7nm. I guess if they really want to push max out of 5nm those 15% frequency uplift is equal to at least 3.0 -> 3.45 GHz or 3.1 -> 3.56 GHz (equal to 3.56*1.4= 5GHz Zen2) or turbo maximum 3.3 -> 3.8 GHz. Together with 40% IPC uplift it's 3.8 * 1.4 = 5.32 GHz Zen2. Cheap Chinese HiSilicon ARM desktop coming this winter and beating Zen2? AMD should pray for more bans for Huawei/HiSilicon.

Another thing is that all core freq for Zen2 is around 4.2 GHz. Thats far far away from 4.7 GHz.


Glo, unless I'm mistaken those Geekbench results are under x86 emulation. So even if we're extremely generous to Apple and assume they're achieving 70% efficiency under emulation, that would suggest native arm results of around 1100 and 3700. Wouldn't it?
A12 is about 1100 pts in GeekBench5 at 2.54 GHz, resulting in 441 pts/GHz.
This roseta2 leak is 830 pts @ 2.4 GHz, resulting in 346 pts/GHz .... which is 78% performance of native A12.

Isn't that too much? Either clock is wrong detected or it's tested on A14. A13 has 502 pts/GHz so projected A14 should have around 570 pts/GHz. That would result in 60% performance of native A14 which sounds reasonable.
 

ASK THE COMMUNITY