Solved! ARM Apple High-End CPU - Intel replacement

Richie Rich · Oct 14, 2019

There is a first rumor about Intel replacement in Apple products:

ARM based high-end CPU
8 cores, no SMT
IPC +30% over Cortex A77
desktop performance (Core i7/Ryzen R7) with much lower power consumption
introduction with new gen MacBook Air in mid 2020 (considering also MacBook PRO and iMac)
massive AI accelerator

Source Coreteks:

Richie Rich · Mar 18, 2020

Nothingness said:
Oh yes infinity is the limit

I invite you to search for ILP wall.

Also adding transistors means more power something that is not always acceptable.

ILP wall is valid for in-order CPU uarch only. In OoO CPU there is massive reordering and parallel speculative execution, loading etc. CPU is limited by its OoO engine window size and how efficient predictors are. So for OoO CPU there is no IPC/ILP wall. However I don't say it's easy to get more IPC.

More transistors doesn't mean more power. Apple is good example how despite almost double transistors A13 Lightning core can be twice as efficient as Zen2. Apple must be able to power gate a lot of parts inside core when not being used. There is no other way. Something like Cortex core when detecting high miss prediction rate it minimizes speculative execution to save an energy. And imagine Apple is at least 4 years ahead in development than everybody else...

eek2121 said:
Let’s take a look at the A13 for a moment:

2x2.6 GHz big cores @ 5-6 watts of power. Multiply that by 4 to get an 8 core 8 thread - 20-24 watts Wait! There isn’t any hyperthreading, no DDR4, No PCIE, Actually, come to think of it, that smartphone SoC is missing every single major feature that modern machines have. Before you know it, Apple has blown past a 45 watt TDP.

ARM CPUs, including those from Apple look very attractive until you get into the nitty gritty of it. To scale up any ARM CPU just means you’ll end up with a similar perf/watt to an x86 CPU. AMD and Intel aren’t sandbagging, they have to deal with the laws of physics just like Apple/Intel do.

Regarding A13:
You are right that 2xA13 Ligtning has 5-6 Watts and 8xcore would consume 20-24 Watts. But Andrei measured system consumption including dual channel LPDDR4 .... so that 8-core A13 with 20-24 Watts would include 8-channel LPDDR4. And PCIe links power consumption is not that much and in reality it's just a fraction of CPU cores.
And best thing at the end: those 8xA13 Lightning cores at 2.6 GHz delivers performance equal to 8xZen2 at 4.7 GHz...... and show me Ryzen CPU which can run its all eight cores at 4.7 GHz simultaneously at 24W (even if you would find such a rare super chip it would consume 150W). APU Zen2 Renoir shows definitely better efficiency but again AMD selects the best low leakage dies for laptops and rest go to desktop later this year. Every Apple's A13 can reach such a performance which means there is some decent performance margin.

Regarding similarity between ARM and X86 when scaling up:
Did you see Andrei's test of Graviton2? I doubt.
32c/64t Zen1@2.9GHz vs 64c/64t A76@2.5GHz
- higher performance per thread despite lower frequency
- higher MT throughput
- lower power consumption by half (180W vs. 90W).
- even 14nm vs. 7nm cannot explain such a difference in power consumption
- cheap to manufacture due A76/N1 core being only 1.4mm2 in compare to Zen2 3.4mm2
....

soresu · Mar 18, 2020

Richie Rich said:
ILP wall is valid for in-order CPU uarch only. In OoO CPU there is massive reordering and parallel speculative execution, loading etc. CPU is limited by its OoO engine window size and how efficient predictors are. So for OoO CPU there is no IPC/ILP wall. However I don't say it's easy to get more IPC.

More transistors doesn't mean more power. Apple is good example how despite almost double transistors A13 Lightning core can be twice as efficient as Zen2. Apple must be able to power gate a lot of parts inside core when not being used. There is no other way. Something like Cortex core when detecting high miss prediction rate it minimizes speculative execution to save an energy. And imagine Apple is at least 4 years ahead in development than everybody else...

Regarding A13:
You are right that 2xA13 Ligtning has 5-6 Watts and 8xcore would consume 20-24 Watts. But Andrei measured system consumption including dual channel LPDDR4 .... so that 8-core A13 with 20-24 Watts would include 8-channel LPDDR4. And PCIe links power consumption is not that much and in reality it's just a fraction of CPU cores.
And best thing at the end: those 8xA13 Lightning cores at 2.6 GHz delivers performance equal to 8xZen2 at 4.7 GHz...... and show me Ryzen CPU which can run its all eight cores at 4.7 GHz simultaneously at 24W (even if you would find such a rare super chip it would consume 150W). APU Zen2 Renoir shows definitely better efficiency but again AMD selects the best low leakage dies for laptops and rest go to desktop later this year. Every Apple's A13 can reach such a performance which means there is some decent performance margin.

Regarding similarity between ARM and X86 when scaling up:
Did you see Andrei's test of Graviton2? I doubt.
32c/64t Zen1@2.9GHz vs 64c/64t A76@2.5GHz
- higher performance per thread despite lower frequency
- higher MT throughput
- lower power consumption by half (180W vs. 90W).
- even 14nm vs. 7nm cannot explain such a difference in power consumption
- cheap to manufacture due A76/N1 core being only 1.4mm2 in compare to Zen2 3.4mm2
....

Comparing Graviton2 against Epyc Rome for SIMD heavy app virtues would tell a different story.

This will be the case until SVE2 starts making it into ARM cores - as it is the closest comparison currently would be the A64FX chip, which is even more specialised in purpose than Graviton.

Nothingness · Mar 18, 2020

Richie Rich said:
ILP wall is valid for in-order CPU uarch only. In OoO CPU there is massive reordering and parallel speculative execution, loading etc. CPU is limited by its OoO engine window size and how efficient predictors are. So for OoO CPU there is no IPC/ILP wall. However I don't say it's easy to get more IPC.

Oh there's infinite parallelism in programs thanks to OoOE. Do you try to think before writing?

More transistors doesn't mean more power. Apple is good example how despite almost double transistors A13 Lightning core can be twice as efficient as Zen2. Apple must be able to power gate a lot of parts inside core when not being used. There is no other way. Something like Cortex core when detecting high miss prediction rate it minimizes speculative execution to save an energy. And imagine Apple is at least 4 years ahead in development than everybody else...

Yes but at some points to get more work done, you'll need more transistors to switch, which will increase power. There's no magic, just physics.

amrnuke · Mar 18, 2020

Richie Rich said:
More transistors doesn't mean more power.

This is true. Run a 19.2b transistor 3700X at 35W eco mode, and it'll draw less power than a 4.8b transistor 1700X.

Richie Rich said:
Apple is good example how despite almost double transistors A13 Lightning core can be twice as efficient as Zen2.

This is completely incorrect information.

Zen2 core is 3.64mm2 on N7
A13 big core is 2.61mm2 on N7P

N7P has minimal density benefits, while N7+ has 20% density benefit over N7.

So your statement "despite almost double transistors A13 Lightning core can be twice as efficient as Zen2" is just incorrect.

It is 100% false that an A13 Lightning core has double the transistors of a Zen2 core.

Eug · Mar 18, 2020

It will be interesting to see what the specs are for Apple A12Z.

Doug S · Mar 18, 2020

Since they've been on an every other year cadence for 'X' SoCs presumably there will be an A14X. I guess they weren't willing to wait until later in the year to launch these new iPads.

Going from 7 to 8 units in the GPU suggests the design probably had that all along for redundancy and this is the same chip, presumably benefitting speed/power wise from being fabbed on the updated N7P process instead of the A12X's original N7. Perhaps a 5-10% bump, nothing too earth shattering.

name99 · Mar 18, 2020

soresu said:
Comparing Graviton2 against Epyc Rome for SIMD heavy app virtues would tell a different story.

This will be the case until SVE2 starts making it into ARM cores - as it is the closest comparison currently would be the A64FX chip, which is even more specialised in purpose than Graviton.

Then don't compare Graviton!
If you think it's important to test heavyweight SIMD, compare against TX3...

AnandTech Forums: Technology, Hardware, Software, and Deals

Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

www.anandtech.com

These comments are truly idiotic -- people taking CPUs that were clearly designed for task A, pointing out that they're bad at task B, then claiming this proves something.

name99 · Mar 18, 2020

Nothingness said:
Oh there's infinite parallelism in programs thanks to OoOE. Do you try to think before writing?

Yes but at some points to get more work done, you'll need more transistors to switch, which will increase power. There's no magic, just physics.

The claim is not that infinite ILP is possible; it is that we have not yet reached the ILP limit and we (ie the human race) know of multiple as yet unimplemented, but practical, tricks for improving beyond what we currently have.

So why is the A14 not better?
(a) How do you know it's not? This is all based on a sketchy rumor. There've been plenty of these rumors in the past. I'd say they're about half accurate, half not. The fact that this one comes with ZERO accompanying data beyond the numbers at the top gives me no confidence --- it's as easy to grab a screenshot of all the sub-values as it is to grab just the headline values. But it's not as easy to fake those numbers...

(b) It is possible that even Apple has to switch at some point to a slightly less hectic schedule, something more like a tick-tock.
For example the A12Z. Why? Why not an A13X?
My guess is that what the A12Z is mostly advancing is something to do with packaging (in other words ways to scale the existing core designs beyond the current iPad setup). And those changes (which we can only guess at until we see tear downs --- maybe a chiplet design? Maybe the addition of HBM?) were sufficiently non-trivial that there wasn't time to fit them onto the A13 base, they're work that began with the A12 base and has taken this long to germinate.
Things take time, and the more you are doing things that are truly original, not just better versions of what was earlier done by Intel or IBM or ..., the longer they take.

name99 · Mar 18, 2020

amrnuke said:
This is true. Run a 19.2b transistor 3700X at 35W eco mode, and it'll draw less power than a 4.8b transistor 1700X.

This is completely incorrect information.

Zen2 core is 3.64mm2 on N7
A13 big core is 2.61mm2 on N7P

N7P has minimal density benefits, while N7+ has 20% density benefit over N7.

So your statement "despite almost double transistors A13 Lightning core can be twice as efficient as Zen2" is just incorrect.

It is 100% false that an A13 Lightning core has double the transistors of a Zen2 core.

It's not exactly true, but it's in the right direction.
What can be said is that Zen2 chiplets have about 52M tr/mm^2, and A13 has about 87.
Going beyond that is tough because we don't know the transistor count for A13 cores, and even if we did the whole thing would turn into a screaming match about what should or should not now be counted as part of the the "core". The L2? The L3? The NoC?

https://www.reddit.com/r/Amd/comments/ecca4s

name99 · Mar 18, 2020

Doug S said:
Since they've been on an every other year cadence for 'X' SoCs presumably there will be an A14X. I guess they weren't willing to wait until later in the year to launch these new iPads.

Going from 7 to 8 units in the GPU suggests the design probably had that all along for redundancy and this is the same chip, presumably benefitting speed/power wise from being fabbed on the updated N7P process instead of the A12X's original N7. Perhaps a 5-10% bump, nothing too earth shattering.

I don't buy that. What's the point in releasing new iPads with such minimally improved specs? The new camera and mics are nice, but the fraction of buyers who care is pretty small.
It's not like the existing ones looked bad by comparison, and the changes are not enough to induce upgrades, except by people who would have upgraded anyway (eg from an A10X or earlier) device.

And renaming the chip? If you're Apple you don't just rename things on a whim; renaming indicates a future direction (eg iPhone X indicated, and was obvious to me at the time, a split in the iPhone line between "iPhone" and a new "iPhone Pro" line).

To me the only thing about this that makes sense is that the A12Z is significantly different along some dimension that's relevant for ARM macs (likely packaging) and these iPads exist primarily as a learning-by-doing step to the ARM macs.

Doug S · Mar 18, 2020

name99 said:
To me the only thing about this that makes sense is that the A12Z is significantly different along some dimension that's relevant for ARM macs (likely packaging) and these iPads exist primarily as a learning-by-doing step to the ARM macs.

Why do they need a "learning by doing" step with an iPad? They can make ARM Macs anytime they want either now or in years past. What does it benefit them to release an iPad as such a step? That's pretty useless when compared to making an internal only or developer only Mac.

It is too late in the game to release such a thing based on A12, when they have had A14 chips in hand for months now. They probably have A14X (or whatever they may call the 'big brother' chip next time around) in hand by now too. They don't need many to stick them in Macs for a limited non-public release for "learning by doing".

I take the A12'Z' nomenclature to mean it is higher/better than A12X, but is still based on the A12. Why release a new iPad with such a small step? Because that's the only step they could do CPU wise right now. There were bigger improvements with the lidar, improved cameras, etc. and improved pricing for the flash. That's probably why the new model.

Nothingness · Mar 19, 2020

name99 said:
The claim is not that infinite ILP is possible; it is that we have not yet reached the ILP limit and we (ie the human race) know of multiple as yet unimplemented, but practical, tricks for improving beyond what we currently have.

I'm just telling that you can't use history of speedups to predict what speed A14 will get. There's a point at which this speedup will get lower. Yes even for Apple.

So why is the A14 not better?
(a) How do you know it's not? This is all based on a sketchy rumor. There've been plenty of these rumors in the past. I'd say they're about half accurate, half not. The fact that this one comes with ZERO accompanying data beyond the numbers at the top gives me no confidence --- it's as easy to grab a screenshot of all the sub-values as it is to grab just the headline values. But it's not as easy to fake those numbers...

Maynard, I never talked about A14. And if the leak is correct the IPC increase is >7% as I already wrote, which is quite good given where A13 already stands.

Eug · Mar 19, 2020

name99 said:
I don't buy that. What's the point in releasing new iPads with such minimally improved specs? The new camera and mics are nice, but the fraction of buyers who care is pretty small.

The new iPad Pro gets a 50% boost in RAM. That's pretty damn significant. It also gets Gigabit LTE and WiFi 6.

It's not like the existing ones looked bad by comparison, and the changes are not enough to induce upgrades, except by people who would have upgraded anyway (eg from an A10X or earlier) device.

Yeah, but that's almost always the case. The main target market of new devices is not usually those with the previous generation devices (unless that previous generation is from 3 years prior or something).

And renaming the chip? If you're Apple you don't just rename things on a whim; renaming indicates a future direction (eg iPhone X indicated, and was obvious to me at the time, a split in the iPhone line between "iPhone" and a new "iPhone Pro" line).

Nah. To me calling it A12Z just means it's faster/better than A12X. Since it's Apple's marketing division that chooses these names, I don't think we should read too much else into it. Even just the addition of a single extra GPU core could justify a name change for marketing purposes.

For the record, I'm predicting a 10-15% CPU performance boost (mainly due to clock speed) along with that additional GPU core, but I'm just pulling that number outta my ass.

To me the only thing about this that makes sense is that the A12Z is significantly different along some dimension that's relevant for ARM macs (likely packaging) and these iPads exist primarily as a learning-by-doing step to the ARM macs.

Apple needed to release a new iPad Pro, since the last release was back in 2018. Along with all the other upgrades, even with "just" A12Z, this 2020 release is a decent one. I suspect the A14X model won't be out until Q1 2021.

amrnuke · Mar 19, 2020

name99 said:
It's not exactly true, but it's in the right direction.
What can be said is that Zen2 chiplets have about 52M tr/mm^2, and A13 has about 87.
Going beyond that is tough because we don't know the transistor count for A13 cores, and even if we did the whole thing would turn into a screaming match about what should or should not now be counted as part of the the "core". The L2? The L3? The NoC?

Using those density measurements and applying it (blindly) to the cores based on their mm2, you come up with the following transistor counts:

Zen2: 191 million transistors per core
A13: 226 million transistors per core
A13 to Zen2 transistor count ratio is almost certainly far closer to 1 than it is 2.

I would normally give someone the benefit of the doubt (as you are doing with Richie Rich) that they just misspoke, but in his case, given his history, I must assume Richie Rich took the worst number relating to transistor density / transistor count / whatever else he could find, and just indiscriminately used it as a fact (either because he is ignorant, or malicious) and then labeled it incorrectly (again, either out of ignorance or malice).

In the end, we will likely never know the exact numbers, but as is the way with science, we can accept probabilities as solid theory if they are backed with good data and not disproven. I propose that my hypothesis has a more sound basis than his, as all of the information we have combats his statement that an A13 core is double the transistor count of a Zen2 core.

Eug · Mar 19, 2020

Eug said:
For the record, I'm predicting a 10-15% CPU performance boost (mainly due to clock speed) along with that additional GPU core, but I'm just pulling that number outta my ass.

It's seems like they may have not done anything at all for A12Z except activate a previously dormant GPU core. Identical CPU score in Antutu, and 8% boost for GPU.

全新iPad Pro安兔兔跑分曝光：稳稳的马甲？_热点资讯_安兔兔

昨日晚间，苹果悄然发布了全新一代iPad Pro，依然分为11英寸和12.9英寸两款。略显遗憾的是，新一代iPad Pro并未配备传闻中的A13X处理器，而是配备了A12Z。从官方描述来看，A12Z是在A12X的基础上增加了一个GPU核心，由八核CPU和八核GPU组成（A12X是八核CPU+7核GPU），抛开频率/工艺等未公布的因素不提，A12Z的性能相比A12X来说理论上会有小幅度的升级。当然全新一代iPad Pro也有令人欣喜的地方，比如从安兔兔的推测来看，新一代iPad Pro应该…

www.antutu.com

name99 · Mar 19, 2020

Doug S said:
Why do they need a "learning by doing" step with an iPad? They can make ARM Macs anytime they want either now or in years past. What does it benefit them to release an iPad as such a step? That's pretty useless when compared to making an internal only or developer only Mac.

It is too late in the game to release such a thing based on A12, when they have had A14 chips in hand for months now. They probably have A14X (or whatever they may call the 'big brother' chip next time around) in hand by now too. They don't need many to stick them in Macs for a limited non-public release for "learning by doing".

I take the A12'Z' nomenclature to mean it is higher/better than A12X, but is still based on the A12. Why release a new iPad with such a small step? Because that's the only step they could do CPU wise right now. There were bigger improvements with the lidar, improved cameras, etc. and improved pricing for the flash. That's probably why the new model.

The learning-by-doing is not the iPad, it's the SoC, and specifically its packaging.
You can try out a packaging idea (possibly but not necessarily chiplets) as the A12Z and see what the pain points were -- where is power higher than expected, where is there more traffic between one chiplet and another than expected -- and use that to inform your design for the "real" ARM mac SoCs.
eg suppose you have one chiplet that handles IO, memory, media, ISP, security; and a second (there could be multiple of these) that's CPU/GPU.
That means some degree of traffic from one to the other when watching movies or recording video, and so a higher energy usage. But how bad is it? Bad enough to worry about, or ignorable? And the NPU. Do we want one on each compute chiplet? Or a single one on the hub chiplet?

One can guess at these answers, but having a test platform is better.

name99 · Mar 19, 2020

amrnuke said:
Using those density measurements and applying it (blindly) to the cores based on their mm2, you come up with the following transistor counts:

Zen2: 191 million transistors per core
A13: 226 million transistors per core
A13 to Zen2 transistor count ratio is almost certainly far closer to 1 than it is 2.

I would normally give someone the benefit of the doubt (as you are doing with Richie Rich) that they just misspoke, but in his case, given his history, I must assume Richie Rich took the worst number relating to transistor density / transistor count / whatever else he could find, and just indiscriminately used it as a fact (either because he is ignorant, or malicious) and then labeled it incorrectly (again, either out of ignorance or malice).

In the end, we will likely never know the exact numbers, but as is the way with science, we can accept probabilities as solid theory if they are backed with good data and not disproven. I propose that my hypothesis has a more sound basis than his, as all of the information we have combats his statement that an A13 core is double the transistor count of a Zen2 core.

The problem is that blindly multiplying density by area is almost certainly seriously inaccurate. The AMD CPUs clock higher, meaning that a substantial fraction of their logic transistors need to be physically larger (probably achieved through more fins).
There just isn't enough info to know.

I prefer to read the claim not as literally true about *cores* (because there isn't enough data to make the point one way or another) but as pointing out an important difference between AMD (and Intel)'s design points and ARM design points, namely what I keep saying:
- that x86 are designing for speed. ARM are designing for IPC.
- this shows up, not least, in transistor density, because to achiever high GHz means physically larger transistors.

Could AMD pivot to a more brainiac design with smaller transistors, and so hit current performance with, say 4/3 more IPC at 3/4 GHz? Clearly the ARM ISA allows for this; I don't know the x86 constraints.
Firstly they have a lousy ISA to work with, which may limit various smarts ARM can throw at the problem (eg going forward something like value prediction is probably going to break their memory ordering model? and memory ordering may today constrain how aggressive they can be in their load/store queues?) Along with that there are all the known problems of course -- flags crap, stack crap, split registers crap.
Second they have a fan base a substantial fraction of which judge performance by GHz.
Third they don't have the luxury to redesign from scratch. God, can you imagine how much design then validation effort that would be to build an x86 from zero?

So their choices are understandable. But that's not the same thing as saying they're what would be optimal in a world of no constraints.

name99 · Mar 19, 2020

Eug said:
The new iPad Pro gets a 50% boost in RAM. That's pretty damn significant. It also gets Gigabit LTE and WiFi 6.

Yeah, but that's almost always the case. The main target market of new devices is not usually those with the previous generation devices (unless that previous generation is from 3 years prior or something).

Nah. To me calling it A12Z just means it's faster/better than A12X. Since it's Apple's marketing division that chooses these names, I don't think we should read too much else into it. Even just the addition of a single extra GPU core could justify a name change for marketing purposes.

For the record, I'm predicting a 10-15% CPU performance boost (mainly due to clock speed) along with that additional GPU core, but I'm just pulling that number outta my ass.

Apple needed to release a new iPad Pro, since the last release was back in 2018. Along with all the other upgrades, even with "just" A12Z, this 2020 release is a decent one. I suspect the A14X model won't be out until Q1 2021.

Apple doesn't need to release new iPads just because the old ones are old. They could have waited till Oct 2020 with either an A13X or an A14X - there have been two years between some iPad models, it's 2+ years between aTVs, it's 2+ years between many mac models.

Even if they wanted to prototype a new business model, the "speed-bumped" iPad Pro, they could have done so by keeping an A12X while adding the other stuff. What's the game plan going forward if you start an A12Z line? Will the next iPad Pro have in it an A14X or an A14Z?

Obviously what's being discussed here is psychology more than technology, but the psychology (treating the SoC renaming as no big deal, assuming that it makes sense to use a an A12X slightly rebranded rather than modify the A13 in the usual "X" style) seems to me strange, which is why I keep assuming there's a deeper additional goal to all this, behind the scenes.

name99 · Mar 19, 2020

Eug said:
It's seems like they may have not done anything at all for A12Z except activate a previously dormant GPU core. Identical CPU score in Antutu, and 8% boost for GPU.

全新iPad Pro安兔兔跑分曝光：稳稳的马甲？_热点资讯_安兔兔

昨日晚间，苹果悄然发布了全新一代iPad Pro，依然分为11英寸和12.9英寸两款。略显遗憾的是，新一代iPad Pro并未配备传闻中的A13X处理器，而是配备了A12Z。从官方描述来看，A12Z是在A12X的基础上增加了一个GPU核心，由八核CPU和八核GPU组成（A12X是八核CPU+7核GPU），抛开频率/工艺等未公布的因素不提，A12Z的性能相比A12X来说理论上会有小幅度的升级。当然全新一代iPad Pro也有令人欣喜的地方，比如从安兔兔的推测来看，新一代iPad Pro应该…

www.antutu.com

Nice catch!
For people who don't realize, the RAM config is not REALLY that new.
The A12X iPad Pro with 1TB comes with 6GB, so presumably Apple is now giving everyone that same 6GB. (Or even, maybe, now the 1TB model comes with 8GB?)

I don't find Antutu a very interesting benchmark (it's a system benchmark not a SoC benchmark), so I don't know it at all well. What does their "MEM" score mean?
One packaging difference, for example, is that the A12Z could come with LPDDR5 DRAM. I'm pretty sure the A13 memory controller could support DDR5, but it was not used in the iPhone because a large enough supply was not available in the months before September. Now, could the A12 memory controller have LPDDR5 support? Perhaps...

Presumably this means we still have 4+4 cores. But the packaging is the interesting part. Let's see what the teardown's reveal.

Eug · Mar 19, 2020

name99 said:
For people who don't realize, the RAM config is not REALLY that new.
The A12X iPad Pro with 1TB comes with 6GB, so presumably Apple is now giving everyone that same 6GB. (Or even, maybe, now the 1TB model comes with 8GB?)

Initial info suggests all models are 6 GB, with no 8 GB model.

One packaging difference, for example, is that the A12Z could come with LPDDR5 DRAM. I'm pretty sure the A13 memory controller could support DDR5, but it was not used in the iPhone because a large enough supply was not available in the months before September. Now, could the A12 memory controller have LPDDR5 support? Perhaps...

It would simply be cheaper and easier stay the same, cuz nobody would care either way. This is especially true since A12X devices were still the fastest mobile devices on the market. In fact, Apple never even bothers mentioning RAM in the specs.

eek2121 · Mar 19, 2020

Richie Rich said:
ILP wall is valid for in-order CPU uarch only. In OoO CPU there is massive reordering and parallel speculative execution, loading etc. CPU is limited by its OoO engine window size and how efficient predictors are. So for OoO CPU there is no IPC/ILP wall. However I don't say it's easy to get more IPC.

More transistors doesn't mean more power. Apple is good example how despite almost double transistors A13 Lightning core can be twice as efficient as Zen2. Apple must be able to power gate a lot of parts inside core when not being used. There is no other way. Something like Cortex core when detecting high miss prediction rate it minimizes speculative execution to save an energy. And imagine Apple is at least 4 years ahead in development than everybody else...

Regarding A13:
You are right that 2xA13 Ligtning has 5-6 Watts and 8xcore would consume 20-24 Watts. But Andrei measured system consumption including dual channel LPDDR4 .... so that 8-core A13 with 20-24 Watts would include 8-channel LPDDR4. And PCIe links power consumption is not that much and in reality it's just a fraction of CPU cores.
And best thing at the end: those 8xA13 Lightning cores at 2.6 GHz delivers performance equal to 8xZen2 at 4.7 GHz...... and show me Ryzen CPU which can run its all eight cores at 4.7 GHz simultaneously at 24W (even if you would find such a rare super chip it would consume 150W). APU Zen2 Renoir shows definitely better efficiency but again AMD selects the best low leakage dies for laptops and rest go to desktop later this year. Every Apple's A13 can reach such a performance which means there is some decent performance margin.

Regarding similarity between ARM and X86 when scaling up:
Did you see Andrei's test of Graviton2? I doubt.
32c/64t Zen1@2.9GHz vs 64c/64t A76@2.5GHz
- higher performance per thread despite lower frequency
- higher MT throughput
- lower power consumption by half (180W vs. 90W).
- even 14nm vs. 7nm cannot explain such a difference in power consumption
- cheap to manufacture due A76/N1 core being only 1.4mm2 in compare to Zen2 3.4mm2
....

So precursor (because this is a long post), I'm not bashing on ARM. I am slightly bashing the claims that the A13 is somehow faster than a general purpose desktop processor (for reasons I'll get into below) and I'm absolutely bashing the thought that ARM CPUs are somehow much more efficient. Let's get into it:

First, a response to some of your comments:

Regarding your comment about LPDDR4: LPDDR4 is a very different beast from DDR4. As a further aside from this, the Apple A13 can be (and very likely is) optimized around a fixed platform. This means that the memory controller is likely NOT a full memory controller, but is optimized to specifically work with the iPhone platform at hand. Furthermore, we don't know what the addressable memory limits are for this chip. It likely is not very high, given the emphasis on power vs. performance.
Regarding the Graviton 2, I read that article the day it came out: We don't have enough data to make a real comparison. That article was meant for comparison between cloud platforms on one provider alone. For example, one can look at the actual Zen 2/EPYC benchmarks and see that they are higher than the Graviton 2 benchmarks.
You seem to think clockspeed is a game changer here. Different platforms clock differently. Just because the A13 is clocked at 2.66 GHz does NOT mean it's more efficient than x86! Way back in the day Intel and AMD both traded places frequently when it came to clock speed, per/watt, IPC, and other metrics. For Apple to speed up their chip, they'd likely have to adjust the number of stages in the pipeline as well as make other changes. This can cause a negative impact on IPC even though overall performance will increase. In Apple's case, the A13 is a fixed platform, their next chip is rumored to be on 5nm. Apple is using node shrinks for performance increases.
Regarding the power consumption: AMD's 15 watt parts easily beat out the A13 by most performance metrics. The important takeaway here is that Apple's A13 is only faster for very specific workloads (source below). This says to me that the A13 has a vastly superior cache subsystem or maybe something else, but it in no way means that the A13 is a faster chip!
Chips do not, and will not scale up linearly. An 8 core, 8 thread A13 would have a 45 watt TDP. I've seen the A13 in my iPhone draw 7 watts of power and heat the phone up until it was uncomfortably hot, and that was in a GAME. While TDP isn't an indicator of power, the two are usually pretty close. Keep this in mind with the data I present below.
The A13 doesn't accelerate AVX, SSE, or a myriad of other instructions that current processors do.
The iPhone platform is built for power savings, not performance. The macOS platform is a completely different beast. That is why Apple is doing what it is doing now: Macbooks, Power Macs, etc. are all segmented. iPads are taking over the lower end. (Ironically, iPads don't have the A13; even the newest iPad is using an A12 variant).

Now lets look at some data.

Here is an early benchmark on Geekbench 5 (my favorite benchmark) of the 15 watt Ryzen 4800U: https://browser.geekbench.com/v5/cpu/1373084
Here is a random benchmark of the iPhone 11 Pro Max: https://browser.geekbench.com/v5/cpu/1498904

Notice the areas where the A13 is winning:

Text Compression
Navigation
HTML5
SQLite
PDF Rendering
Text Rendering
Clang
N-Body Physics
Face Detection
Horizon Detection
Image Inpainting
Ray Tracing
Speech recognition

In most cases, it's a pretty narrow victory. However, that's not what is alarming here. What IS alarming is the pattern that emerges. It's almost like those very specific mobile oriented workloads are being accelerated in some way! With the exception of the SQLite benchmark, N-Body Physics, and Ray Tracing, All of the functions above are used on a smartphone. As the developers of Geekbench have little control over how their benchmark is built, it's very likely that Apple is using a number of accelerators in both the CPU and GPU to accelerate these workloads, while the Ryzen 4800U is doing everything brute force with the exception of built in instruction sets.

So you are probably wondering about SQLite, Clang, N-Body Physics, and Ray Tracing. Let's address those now.

SQLite largely consists of taking plain text SQL statements and parsing them into machine readable format. For a platform that accelerates Text IO as hinted at above, it stands to me that the A13 would perform well at this workload. Interestingly enough, it doesn't perform all that much better than the 4800U.
Clang once again consists of parsing a bunch of text (1094 lines of it!), want to know what would really accelerate this process? The ability to accelerate pattern recognition and text parsing! I don't see any evidence of Geekbench actually running a linker. Their own document suggests that they are building for the AArch64 architecture, but they don't mention how much of it they do. Furthermore, the benchmark is listed as klines/sec. Interesting indeed. That suggests to me that they aren't going as far as generating code, but instead are only going as far as parsing the source code. We can speculate on this all day, but it's a very small win, regardless.
N-Body Physics is an interesting benchmark to have a win in. However, given the architecture of the A13, it doesn't really surprise me. Cache helps with this workload immensely, and it's likely the platform is accelerating things in some way, given the large "win" compared to other narrow wins.
Ray tracing is another interesting one. Yet another workload that builds and uses a tree of data (N-Body Physics uses an octree; the ray tracing benchmark uses a k-d tree). After analyzing these results, I realize that I know exactly what is going on here. The exact same mechanism that accelerates text rendering and compression is at work here as above. The one thing the A13 absolutely excels at is analyzing and parsing data. Why is this? We'll look at that in a moment.

What is more alarming to me is where the A13 loses. The AES-XTS benchmark I can understand -- Modern x86 processors all accelerate AES in some form or fashion, and Zen 2 has first class support. However, some benchmarks that the A13 should be good at, it loses in. HDR, is one example. Gaussian blur is another. Image compression is another. In each of those benchmarks, we begin to see a clear pattern. All of these benchmarks involve computing power to manipulate images. All of the mentioned benchmarks would benefit immensely from the both the FPU and traditional instruction sets such as SSE. Indeed, if we compare the numbers of the Ryzen 3900X (note that the 4800U will likely score somewhat lower) we discover that Ryzen wins most of the benchmarks involving floating point or multimedia (it wins most of the benchmarks, period. Links below.)

So why is it that the A13 excels at those very specific workloads above?

Cache is a big factor. Indeed, the A13 has awesome cache latency up to a queue depth of 4096. (coincidentally, all of the benchmarks in the GB5 suite are tiny enough that they are heavily influenced by cache, but that isn't the sole reason why the A13 performs well)
The compiler for anything iPhone related is XCode. Because of this, applications can be optimized, like the processor, to run in a fixed environment, taking full advantage of hardware. This is important for the next point.
The NPU could be playing a very big factor here. The NPU would excel at parsing octrees, k-d trees, and any other type of tree structure. Incidentally enough, that includes all of 4 of the benchmarks I singled out earlier.
Finally, AMX could very well have a hand in this as well. Unfortunately the iPhone is a (relatively) closed platform, so we don't know.

Don't get me wrong, ARM is getting there. However, anyone expecting magic overnight needs to calm their expectations. ARM isn't magic, and x86 isn't done.

iPhone 11 Review/Deep Dive: https://www.anandtech.com/show/14892/the-apple-iphone-11-pro-and-max-review/
Ryzen 3900X Review/Deep Dive: https://www.anandtech.com/show/14605/the-and-ryzen-3700x-3900x-review-raising-the-bar/
Geekbench 5 workload details: https://www.geekbench.com/doc/geekbench5-cpu-workloads.pdf

Please let me know if you see any typos, I typed this up in a relative hurry and barely skimmed over it for proof reading.

Carfax83 · Mar 19, 2020

eek2121 said:
Don't get me wrong, ARM is getting there. However, anyone expecting magic overnight needs to calm their expectations. ARM isn't magic, and x86 isn't done.

I'm sure the proponents of ARM and the Apple A series are going to bring up its performance in Spec. Geekbench is one thing, but Spec is another. What are your thoughts on the Spec performance?

name99 · Mar 19, 2020

Eug said:
Initial info suggests all models are 6 GB, with no 8 GB model.

It would simply be cheaper and easier stay the same, cuz nobody would care either way. This is especially true since A12X devices were still the fastest mobile devices on the market. In fact, Apple never even bothers mentioning RAM in the specs.

Initial info consists of what? One (real? fake?) entry in Antutu's database.
Does anything more legit exist? Certainly Apple's info doesn't say so.

Eug · Mar 19, 2020

name99 said:
Initial info consists of what? One (real? fake?) entry in Antutu's database.
Does anything more legit exist? Certainly Apple's info doesn't say so.

References to the various models in the code of iPadOS 13.4. (The Golden Master was just released the other day.)

BTW, one relatively reliable leaker claims that A14X will indeed come late this year, but will only be used in the cellular models, with an upgrade to 5G. All the WiFi models would remain A12Z.

I won't take his claim as gospel, but if that happened, that would be an interesting situation. A12Z and A14X models sold side by side. I guess that could make sense as an upsell method, since the the cellular models are considerably more expensive. Not only would you get cellular, but you'd get an SoC boost as well for that extra money. They've done something analogous to this with the MacBook Pros in previous years, so there is precedent.

The more interesting part of this to me though was that he implied that A14X was somehow required for 5G. Hmm...

Thala · Mar 19, 2020

eek2121 said:
Don't get me wrong, ARM is getting there. However, anyone expecting magic overnight needs to calm their expectations. ARM isn't magic, and x86 isn't done.

So you are comparing a <5W CPU vs a 15W CPU and discover that the later wins in few of the benchmarks? Wow, what a discovery...

Solved! ARM Apple High-End CPU - Intel replacement

Senior member

Senior member

Diamond Member

Diamond Member

Golden Member

Lifer

Diamond Member

Senior member

Senior member

Senior member

Senior member

Diamond Member

Diamond Member

Lifer

Golden Member

Lifer

Senior member

Senior member

Senior member

Senior member

Lifer

Diamond Member

Diamond Member

Senior member

Lifer

Golden Member