• Guest, The rules for the P & N subforum have been updated to prohibit "ad hominem" or personal attacks against other posters. See the full details in the post "Politics and News Rules & Guidelines."

News [Anandtech] Arm Announces Mobile Armv9 CPU Microarchitectures: Cortex-X2, Cortex-A710 & Cortex-A510

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Gideon

Golden Member
Nov 27, 2007
1,353
2,588
136
I just went back over those graphs from Anand's article you posted a few posts back.

Neither of them have A55 perf or energy consumption data on them?
I do agree that Andrei's graphs are needlessly complex. There are so many ways to present the same data much more readable way.

And no there isn't A55 on the graphs directly (I wish it was) but there is a direct quote under the graph from Iphone 12 review.

Andrei Frumusanu said:
I’ve included the efficiency cores in the chart here to showcase that they’re not weak at all. The performance showcased here roughly matches a 2.2GHz Cortex-A76 which is essentially 4x faster than the performance of any other mobile SoC today which relies on Cortex-A55 cores, all while using roughly the same amount of system power and having 3x the power efficiency.
And from his Iphone XS review from 2018:
Apple’s small cores in general are a lot more performant that one would think. I’ve gathered some incomplete SPEC numbers on Arm’s A55 (it takes ages!) and in general the performance difference here is 2-3x depending on the benchmark. In recent years I’ve felt that Arm’s little core performance range has become insufficient in many workloads, and this may also be why we’re going to see a lot more three-tiered SoCs (such as the Kirin 980) in the coming future. As it stands, the gap between the maximum performance of the little cores and the most efficient low performance point of the big continues to grow into one direction. All of which makes me wonder whether it’s still worth it to stay with an in-order microarchitecture for Arm's efficiency cores.
Considering his past record I have no reason to doubt his words, but ALAS no graphs that I'm aware of
 
  • Like
Reactions: Tlh97

soresu

Golden Member
Dec 19, 2014
1,650
841
136
I do agree that Andrei's graphs are needlessly complex. There are so many ways to present the same data much more readable way.

And no there isn't A55 on the graphs directly (I wish it was) but there is a direct quote under the graph from Iphone 12 review.



And from his Iphone XS review from 2018:


Considering his past record I have no reason to doubt his words, but ALAS no graphs that I'm aware of
The fact that he includes Apple's efficiency cores and not those on Snapdragon or Kirin SoC's relatively recently (I can't find any on 7nm) is reason enough to doubt his math at the very least.

There's a reason that school examiners insist that you show your working on math tests, otherwise you cannot prove how you got from A to B, or if someone else simply told you the answer.

The little cores determine a lot about the idle efficiency of any ARM mobile device post bigLittle, so it is insanely dubious to leave them off any graph when he is going out of his way to talk about power consumption/efficiency in the first place.

For a start, the rated power window for A55 was 100-250mw back when it was first on the ARM roadmaps at 10nm (not 100% sure this means 10nm as it's not specified while the A75/Prometheus it was announced with is).

arm_roadmap_cortex_futureict-cortex-a-roadmap-strategy-april-2015.png

Meanwhile the A13 and A14 'little' cores are drawing 440mw and 480mw on average for fp workloads, and 380mw and 440mw for int workloads.

Considering the average power delta there is a minimum of +76% for int workloads just on the highest end of that power envelope for A55 I'd be extremely interested in the full power and perf figures for A55.
 
Last edited:
  • Like
Reactions: Tlh97 and Thala

Thala

Golden Member
Nov 12, 2014
1,247
557
136
Meanwhile the A13 and A14 'little' cores are drawing 440mw and 480mw on average for fp workloads, and 380mw and 440mw for int workloads.
Precisely. An A55 does not consume remotely this amount of power. In one of the architectures i was working on, we had an A55 consuming 60mW@1.1 GHz 7nm TSMC under integer workloads. Now of course, the A55 when used as little cores are clocked higher and could indeed approach 200mw.

Considering his past record I have no reason to doubt his words, but ALAS no graphs that I'm aware of
As i said, i doubt these statements - in fact any reasonable person should do as well. I mean be realistic - a 3x efficiency difference would make ARM engineers total dilettantes - given their particular efficiency focus for these cores.
In addition Andrei's method have a weakness, because he is largely measuring system power and cannot easily isolate the cores.
 
Last edited:

insertcarehere

Senior member
Jan 17, 2013
382
239
116
As i said, i doubt these statements - in fact any reasonable person should do as well. I mean be realistic - a 3x efficiency difference would make ARM engineers total dilettantes - given their particular efficiency focus for these cores.
In addition Andrei's method have a weakness, because he is largely measuring system power and cannot easily isolate the cores.
Here's a graph with spec performance of A55 cores, they certainly look miles off whatever Apple was doing with A12 circa 2018.

With more improvements to the small A14 cores as well as the process boost (which SoCs w/ A55 didn't have because Samsung processes are worse), its very plausible for the A14 efficiency cores to be 3x as energy efficient as the A55s in S888. If the A510 cannot seriously close a lot of this gap they will be left behind again with Apple's yearly iterations.
 
Last edited:
  • Like
Reactions: Tlh97

naukkis

Senior member
Jun 5, 2002
433
289
136
Here's a graph with spec performance of A55 cores, they certainly look miles off whatever Apple was doing with A12 circa 2018
.
With more improvements to the small A14 cores as well as the process boost (which SoCs w/ A55 didn't have because Samsung processes are worse), its very plausible for the A14 efficiency cores to be 3x as energy efficient as the A55s in S888. If the A510 cannot seriously close a lot of this gap they will be left behind again with Apple's yearly iterations.
In those tests A55 cores are clocked way beyond their efficiency point where Apples little cores are spot on about 500Mhz lower clocks. Apple little cores are very comparable to A76-A78 designs, A55 and A510 are much more lower performance and power cores which Apple doesn't implement in their designs.

But have to wonder why other SOC-designers ruin their designs by clocking their SOC's too high, as seen with Apple designs similar efficiency can be have without little cores if medium and big cores are kept in sane clock levels instead pushing them far beyond their efficiency points. And actually they probably won't, for performance review like that those little cores has to locked and clocked high against their normal behavior.

But those high-performance test are irrelevant - if their efficiency is compared it should be done with low intensity test, high stressing low power cores is insane as device shouldn't do that but instead power on higher performance cores when high stress is needed. Low power cores should be utilized only when stress is low - nobody has tested that yet but I bet that in-order-cores should be way more efficient in those cases than higher performing out-of-order cores, and that's why ARM provides them, not to increase performance but lower power usage where possible. Those little cores performance target should always be just lowest possible power, whatever performance they offer just second grade argument.
 
  • Like
Reactions: Tlh97 and moinmoin

soresu

Golden Member
Dec 19, 2014
1,650
841
136
Here's a graph with spec performance of A55 cores, they certainly look miles off whatever Apple was doing with A12 circa 2018.

With more improvements to the small A14 cores as well as the process boost (which SoCs w/ A55 didn't have because Samsung processes are worse), its very plausible for the A14 efficiency cores to be 3x as energy efficient as the A55s in S888. If the A510 cannot seriously close a lot of this gap they will be left behind again with Apple's yearly iterations.
There's something very odd going on there.

The average power for both implementations of A55 don't even close to match the projected 100-250mw power range for 10nm on either fp or int workloads.

Unless the lower 1800 mhz clock Qualcomm used for the SD 855 implementation is way out of the optimal power/frequency curve for A55 I would be inclined to question the accuracy of those power readings.

While the big core implementations might stray out of optimal territory it seems fundamentally pointless to design a mobile SoC with sub optimally clocked efficiency cores whose entire point is prolonging battery life for low and background load use cases, which are the majority for those not obsessively using their phone or tablet 24/7.

Also, though the A55 in the Exynos 9820 is 8.33% higher clocked than the A55 in SD 855 it actually performs noticeably worse on both fp and int workloads.

I know that cache size can affect the big Axx cores performance given how much people moan about it, but should it significantly affect the A55 performance too?

I also wonder what platform these tests were done on for Exynos and SD.

I would assume Android, and given how much people say that it is inefficient for a mobile OS, one does have to wonder if this is even close to an apples to apples comparison given how much control Apple has over the stack to optimize beyond the SoC itself for running benchmarks - especially given Apple's past willingness to pull the wool over their customers eyes on power/battery issues.
 

soresu

Golden Member
Dec 19, 2014
1,650
841
136
I've gone back and done some quick calculations.



Icestorm power.png

Assuming for now that the Anand A55 power data on the SD 855 is completely correct and that it uses the N7 TSMC process I inferred a 30% power reduction to N5 to normalize process node to the A14's efficiency core.

SD 855>888 A55 is ISO frequency already.

For int workloads that turns 320mW to 224mW, about 50.9% the average power of Icestorm in int workloads.

The int performance increase of IS over A55 is 20.03 from 5.42, not quite 3.7x but we'll round it up for ease of computation.

That's 3.7 x 0.509, about 1.88x for the efficiency difference between Icestorm and A55 on ISO process, unless my own math is completely out of whack?

A wide gulf to be sure, but far outside the exaggerated claims of Anand.

Again, all assuming that the power data for the A55 is accurate.

I'd like to see it measured using one of the S905X3+ SBC's on a pure Linux OS, minus the Android kludge.

It wouldn't be reflective of a battery constrained scenario, but at least it would be a pure A55 system to take readings from.

Fuchsia OS would also be an interesting power bench target now that it has finally launched - it will be interesting to see if Google have improved efficiency much over Android.

As an addendum I added the 20% efficiency gain ARM projected for A510 vs A55 at ISO process node.

Which gives us 1.505x efficiency for Icestorm over A510 on ISO process, though admittedly I'm not sure I calculated that correctly.

Edit: I used average power in mW rather than joules to calculate the int compute efficiency difference, so 1.88x becomes 1.93x - still well below 3x, let alone 4x efficiency.
 
Last edited:
  • Like
Reactions: Tlh97

insertcarehere

Senior member
Jan 17, 2013
382
239
116
In those tests A55 cores are clocked way beyond their efficiency point where Apples little cores are spot on about 500Mhz lower clocks. Apple little cores are very comparable to A76-A78 designs, A55 and A510 are much more lower performance and power cores which Apple doesn't implement in their designs.
Maybe because the A55 is such an inept core that Qualcomm, the licensee, figured that pushing to 1.8ghz is necessary to handle even background tasks.

And with Apple's case their "medium" cores easily beat out their high performance cores in energy efficiency when doing the same task, the same cannot be said for A55 vs its bigger breatheren. That's a problem considering it delivers much lower performance vs those bigger cores.

But have to wonder why other SOC-designers ruin their designs by clocking their SOC's too high, as seen with Apple designs similar efficiency can be have without little cores if medium and big cores are kept in sane clock levels instead pushing them far beyond their efficiency points. And actually they probably won't, for performance review like that those little cores has to locked and clocked high against their normal behavior.
The real question is why even bother including A55s in the SoC at all, a single A77/A78 at ~2ghz is just as fast as a 4x A55 cluster, probably takes less space on silicon, and would be more energy efficient as well.

But those high-performance test are irrelevant - if their efficiency is compared it should be done with low intensity test, high stressing low power cores is insane as device shouldn't do that but instead power on higher performance cores when high stress is needed. Low power cores should be utilized only when stress is low - nobody has tested that yet but I bet that in-order-cores should be way more efficient in those cases than higher performing out-of-order cores, and that's why ARM provides them, not to increase performance but lower power usage where possible. Those little cores performance target should always be just lowest possible power, whatever performance they offer just second grade argument.
Modern mobile SoCs (should) have very sophisticated power gating and DVFS mechanisms which favor race-to-idle scenarios. So that in effect every task is "high intensity" so that the core can go back to an idle, power-gated state. The smaller cores purpose would be scheduled with lower priority tasks with which efficiency is more important than performance. The trouble with the A55 is that there arent many of those tasks where it would be more efficient than the bigger cores (A77/A78/X1), so why even bother.

Case in point, the A14 has a much bigger, sophisticated OOO small core in icestorm but when implemented in the iPhone 12 battery life does not suffer when compared to Android flagships with much bigger batteries. So "lower intensity" tasks and idle battery life are plenty fine without an in-order design.
 
  • Like
Reactions: Tlh97

Thala

Golden Member
Nov 12, 2014
1,247
557
136
Maybe because the A55 is such an inept core that Qualcomm, the licensee, figured that pushing to 1.8ghz is necessary to handle even background tasks.
Thats not what i observe. The small cores are barely loaded in typical low performance situations (for example when they are running Windows services)
The more likely reason for these high clocking A55s is that under test conditions they are loaded with high performance tasks - for instance when running SPEC, like Andrei is doing. When running threads with unbounded load, OS will scale frequency to maximum.

Below a sample of background services running: Core 0-3: A55 - Windows background services Core 4: A76 - foreground (thats me working with the snipping tool) Core 5-7: A76 - power gated.background_service_load.PNG
 
Last edited:
  • Like
Reactions: moinmoin

soresu

Golden Member
Dec 19, 2014
1,650
841
136
Not sure if this is posted elsewhere but I randomly noticed an article on WikiChips about ARM stacking logic chips mentioning some Project Trishul.

A bit of digging revealed a powerpoint slide and an ARM blog entry:



Blog entry:

"3D stacking for next-generation high performance energy efficient systems"

The slide is probably talking about that early A72 stacked chip, but they are clearly not just stopping there and leaving it to partners to do what they will with it.
 
  • Like
Reactions: Thala

ASK THE COMMUNITY