News [Anandtech] Arm Announces Mobile Armv9 CPU Microarchitectures: Cortex-X2, Cortex-A710 & Cortex-A510

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Gideon

Golden Member
Nov 27, 2007
1,608
3,573
136
  • Like
Reactions: Tlh97 and NTMBK

Gideon

Golden Member
Nov 27, 2007
1,608
3,573
136
I just went back over those graphs from Anand's article you posted a few posts back.

Neither of them have A55 perf or energy consumption data on them?
I do agree that Andrei's graphs are needlessly complex. There are so many ways to present the same data much more readable way.

And no there isn't A55 on the graphs directly (I wish it was) but there is a direct quote under the graph from Iphone 12 review.

Andrei Frumusanu said:
I’ve included the efficiency cores in the chart here to showcase that they’re not weak at all. The performance showcased here roughly matches a 2.2GHz Cortex-A76 which is essentially 4x faster than the performance of any other mobile SoC today which relies on Cortex-A55 cores, all while using roughly the same amount of system power and having 3x the power efficiency.

And from his Iphone XS review from 2018:
Apple’s small cores in general are a lot more performant that one would think. I’ve gathered some incomplete SPEC numbers on Arm’s A55 (it takes ages!) and in general the performance difference here is 2-3x depending on the benchmark. In recent years I’ve felt that Arm’s little core performance range has become insufficient in many workloads, and this may also be why we’re going to see a lot more three-tiered SoCs (such as the Kirin 980) in the coming future. As it stands, the gap between the maximum performance of the little cores and the most efficient low performance point of the big continues to grow into one direction. All of which makes me wonder whether it’s still worth it to stay with an in-order microarchitecture for Arm's efficiency cores.

Considering his past record I have no reason to doubt his words, but ALAS no graphs that I'm aware of
 

soresu

Platinum Member
Dec 19, 2014
2,617
1,812
136
I do agree that Andrei's graphs are needlessly complex. There are so many ways to present the same data much more readable way.

And no there isn't A55 on the graphs directly (I wish it was) but there is a direct quote under the graph from Iphone 12 review.



And from his Iphone XS review from 2018:


Considering his past record I have no reason to doubt his words, but ALAS no graphs that I'm aware of
The fact that he includes Apple's efficiency cores and not those on Snapdragon or Kirin SoC's relatively recently (I can't find any on 7nm) is reason enough to doubt his math at the very least.

There's a reason that school examiners insist that you show your working on math tests, otherwise you cannot prove how you got from A to B, or if someone else simply told you the answer.

The little cores determine a lot about the idle efficiency of any ARM mobile device post bigLittle, so it is insanely dubious to leave them off any graph when he is going out of his way to talk about power consumption/efficiency in the first place.

For a start, the rated power window for A55 was 100-250mw back when it was first on the ARM roadmaps at 10nm (not 100% sure this means 10nm as it's not specified while the A75/Prometheus it was announced with is).

arm_roadmap_cortex_futureict-cortex-a-roadmap-strategy-april-2015.png

Meanwhile the A13 and A14 'little' cores are drawing 440mw and 480mw on average for fp workloads, and 380mw and 440mw for int workloads.

Considering the average power delta there is a minimum of +76% for int workloads just on the highest end of that power envelope for A55 I'd be extremely interested in the full power and perf figures for A55.
 
Last edited:
  • Like
Reactions: Tlh97 and Thala

Thala

Golden Member
Nov 12, 2014
1,355
653
136
Meanwhile the A13 and A14 'little' cores are drawing 440mw and 480mw on average for fp workloads, and 380mw and 440mw for int workloads.

Precisely. An A55 does not consume remotely this amount of power. In one of the architectures i was working on, we had an A55 consuming 60mW@1.1 GHz 7nm TSMC under integer workloads. Now of course, the A55 when used as little cores are clocked higher and could indeed approach 200mw.

Considering his past record I have no reason to doubt his words, but ALAS no graphs that I'm aware of

As i said, i doubt these statements - in fact any reasonable person should do as well. I mean be realistic - a 3x efficiency difference would make ARM engineers total dilettantes - given their particular efficiency focus for these cores.
In addition Andrei's method have a weakness, because he is largely measuring system power and cannot easily isolate the cores.
 
Last edited:

insertcarehere

Senior member
Jan 17, 2013
639
607
136
As i said, i doubt these statements - in fact any reasonable person should do as well. I mean be realistic - a 3x efficiency difference would make ARM engineers total dilettantes - given their particular efficiency focus for these cores.
In addition Andrei's method have a weakness, because he is largely measuring system power and cannot easily isolate the cores.
Here's a graph with spec performance of A55 cores, they certainly look miles off whatever Apple was doing with A12 circa 2018.
SPEC2006eff-overview.png

With more improvements to the small A14 cores as well as the process boost (which SoCs w/ A55 didn't have because Samsung processes are worse), its very plausible for the A14 efficiency cores to be 3x as energy efficient as the A55s in S888. If the A510 cannot seriously close a lot of this gap they will be left behind again with Apple's yearly iterations.
 
Last edited:
  • Like
Reactions: Tlh97

naukkis

Senior member
Jun 5, 2002
702
571
136
Here's a graph with spec performance of A55 cores, they certainly look miles off whatever Apple was doing with A12 circa 2018
.
With more improvements to the small A14 cores as well as the process boost (which SoCs w/ A55 didn't have because Samsung processes are worse), its very plausible for the A14 efficiency cores to be 3x as energy efficient as the A55s in S888. If the A510 cannot seriously close a lot of this gap they will be left behind again with Apple's yearly iterations.

In those tests A55 cores are clocked way beyond their efficiency point where Apples little cores are spot on about 500Mhz lower clocks. Apple little cores are very comparable to A76-A78 designs, A55 and A510 are much more lower performance and power cores which Apple doesn't implement in their designs.

But have to wonder why other SOC-designers ruin their designs by clocking their SOC's too high, as seen with Apple designs similar efficiency can be have without little cores if medium and big cores are kept in sane clock levels instead pushing them far beyond their efficiency points. And actually they probably won't, for performance review like that those little cores has to locked and clocked high against their normal behavior.

But those high-performance test are irrelevant - if their efficiency is compared it should be done with low intensity test, high stressing low power cores is insane as device shouldn't do that but instead power on higher performance cores when high stress is needed. Low power cores should be utilized only when stress is low - nobody has tested that yet but I bet that in-order-cores should be way more efficient in those cases than higher performing out-of-order cores, and that's why ARM provides them, not to increase performance but lower power usage where possible. Those little cores performance target should always be just lowest possible power, whatever performance they offer just second grade argument.
 
  • Like
Reactions: Tlh97 and moinmoin

soresu

Platinum Member
Dec 19, 2014
2,617
1,812
136
Here's a graph with spec performance of A55 cores, they certainly look miles off whatever Apple was doing with A12 circa 2018.
SPEC2006eff-overview.png

With more improvements to the small A14 cores as well as the process boost (which SoCs w/ A55 didn't have because Samsung processes are worse), its very plausible for the A14 efficiency cores to be 3x as energy efficient as the A55s in S888. If the A510 cannot seriously close a lot of this gap they will be left behind again with Apple's yearly iterations.
There's something very odd going on there.

The average power for both implementations of A55 don't even close to match the projected 100-250mw power range for 10nm on either fp or int workloads.

Unless the lower 1800 mhz clock Qualcomm used for the SD 855 implementation is way out of the optimal power/frequency curve for A55 I would be inclined to question the accuracy of those power readings.

While the big core implementations might stray out of optimal territory it seems fundamentally pointless to design a mobile SoC with sub optimally clocked efficiency cores whose entire point is prolonging battery life for low and background load use cases, which are the majority for those not obsessively using their phone or tablet 24/7.

Also, though the A55 in the Exynos 9820 is 8.33% higher clocked than the A55 in SD 855 it actually performs noticeably worse on both fp and int workloads.

I know that cache size can affect the big Axx cores performance given how much people moan about it, but should it significantly affect the A55 performance too?

I also wonder what platform these tests were done on for Exynos and SD.

I would assume Android, and given how much people say that it is inefficient for a mobile OS, one does have to wonder if this is even close to an apples to apples comparison given how much control Apple has over the stack to optimize beyond the SoC itself for running benchmarks - especially given Apple's past willingness to pull the wool over their customers eyes on power/battery issues.
 

soresu

Platinum Member
Dec 19, 2014
2,617
1,812
136
I've gone back and done some quick calculations.

SPEC2006eff-overview.png


Icestorm power.png

Assuming for now that the Anand A55 power data on the SD 855 is completely correct and that it uses the N7 TSMC process I inferred a 30% power reduction to N5 to normalize process node to the A14's efficiency core.

SD 855>888 A55 is ISO frequency already.

For int workloads that turns 320mW to 224mW, about 50.9% the average power of Icestorm in int workloads.

The int performance increase of IS over A55 is 20.03 from 5.42, not quite 3.7x but we'll round it up for ease of computation.

That's 3.7 x 0.509, about 1.88x for the efficiency difference between Icestorm and A55 on ISO process, unless my own math is completely out of whack?

A wide gulf to be sure, but far outside the exaggerated claims of Anand.

Again, all assuming that the power data for the A55 is accurate.

I'd like to see it measured using one of the S905X3+ SBC's on a pure Linux OS, minus the Android kludge.

It wouldn't be reflective of a battery constrained scenario, but at least it would be a pure A55 system to take readings from.

Fuchsia OS would also be an interesting power bench target now that it has finally launched - it will be interesting to see if Google have improved efficiency much over Android.

As an addendum I added the 20% efficiency gain ARM projected for A510 vs A55 at ISO process node.

Which gives us 1.505x efficiency for Icestorm over A510 on ISO process, though admittedly I'm not sure I calculated that correctly.

Edit: I used average power in mW rather than joules to calculate the int compute efficiency difference, so 1.88x becomes 1.93x - still well below 3x, let alone 4x efficiency.
 
Last edited:
  • Like
Reactions: Tlh97

insertcarehere

Senior member
Jan 17, 2013
639
607
136
In those tests A55 cores are clocked way beyond their efficiency point where Apples little cores are spot on about 500Mhz lower clocks. Apple little cores are very comparable to A76-A78 designs, A55 and A510 are much more lower performance and power cores which Apple doesn't implement in their designs.

Maybe because the A55 is such an inept core that Qualcomm, the licensee, figured that pushing to 1.8ghz is necessary to handle even background tasks.

And with Apple's case their "medium" cores easily beat out their high performance cores in energy efficiency when doing the same task, the same cannot be said for A55 vs its bigger breatheren. That's a problem considering it delivers much lower performance vs those bigger cores.

But have to wonder why other SOC-designers ruin their designs by clocking their SOC's too high, as seen with Apple designs similar efficiency can be have without little cores if medium and big cores are kept in sane clock levels instead pushing them far beyond their efficiency points. And actually they probably won't, for performance review like that those little cores has to locked and clocked high against their normal behavior.

The real question is why even bother including A55s in the SoC at all, a single A77/A78 at ~2ghz is just as fast as a 4x A55 cluster, probably takes less space on silicon, and would be more energy efficient as well.

But those high-performance test are irrelevant - if their efficiency is compared it should be done with low intensity test, high stressing low power cores is insane as device shouldn't do that but instead power on higher performance cores when high stress is needed. Low power cores should be utilized only when stress is low - nobody has tested that yet but I bet that in-order-cores should be way more efficient in those cases than higher performing out-of-order cores, and that's why ARM provides them, not to increase performance but lower power usage where possible. Those little cores performance target should always be just lowest possible power, whatever performance they offer just second grade argument.

Modern mobile SoCs (should) have very sophisticated power gating and DVFS mechanisms which favor race-to-idle scenarios. So that in effect every task is "high intensity" so that the core can go back to an idle, power-gated state. The smaller cores purpose would be scheduled with lower priority tasks with which efficiency is more important than performance. The trouble with the A55 is that there arent many of those tasks where it would be more efficient than the bigger cores (A77/A78/X1), so why even bother.

Case in point, the A14 has a much bigger, sophisticated OOO small core in icestorm but when implemented in the iPhone 12 battery life does not suffer when compared to Android flagships with much bigger batteries. So "lower intensity" tasks and idle battery life are plenty fine without an in-order design.
 
  • Like
Reactions: BorisTheBlade82

Thala

Golden Member
Nov 12, 2014
1,355
653
136
Maybe because the A55 is such an inept core that Qualcomm, the licensee, figured that pushing to 1.8ghz is necessary to handle even background tasks.

Thats not what i observe. The small cores are barely loaded in typical low performance situations (for example when they are running Windows services)
The more likely reason for these high clocking A55s is that under test conditions they are loaded with high performance tasks - for instance when running SPEC, like Andrei is doing. When running threads with unbounded load, OS will scale frequency to maximum.

Below a sample of background services running: Core 0-3: A55 - Windows background services Core 4: A76 - foreground (thats me working with the snipping tool) Core 5-7: A76 - power gated.background_service_load.PNG
 
Last edited:
  • Like
Reactions: Tlh97 and moinmoin

soresu

Platinum Member
Dec 19, 2014
2,617
1,812
136
Not sure if this is posted elsewhere but I randomly noticed an article on WikiChips about ARM stacking logic chips mentioning some Project Trishul.

A bit of digging revealed a powerpoint slide and an ARM blog entry:

arm-trishul-test-chip.png


Blog entry:

"3D stacking for next-generation high performance energy efficient systems"

The slide is probably talking about that early A72 stacked chip, but they are clearly not just stopping there and leaving it to partners to do what they will with it.
 
  • Like
Reactions: Tlh97 and Thala

Gideon

Golden Member
Nov 27, 2007
1,608
3,573
136
So the first Server CPUs with Armv9 are announced (and being tested by clients) - Graviton 3:

  • ARM v9 and N2 most likely (based on supporting pointer authentication)
  • DDR5
  • 2x floating point perf (vs G2)
  • 3x ML perf (vs G2)
  • Support for bfloat16
  • Built on TSMC N5
  • Only about 100W per chip and BGA.
  • The die size is only around 300mm^2
And it's not just baseless claims. Several high-profile clients are quite excited after testing it out:


Twitter - 20%-80% higher perf, 35% lower latency Epic Games - 40% faster Honeycomb - 35% faster. 30% fewer instances and 30% lower latency Mercado libre - better price/perf

I'll copy the twitter quote in full:
1638310816600.png

The most impressive part is the simple monolithic and power efficient design. A single 300mm die (only a third bigger than say Alder Lake S)
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,482
14,434
136
So the first Server CPUs with Armv9 are announced (and being tested by clients) - Graviton 3:

  • ARM v9 and N2 most likely (based on supporting pointer authentication)
  • DDR5
  • 2x floating point perf (vs G2)
  • 3x ML perf (vs G2)
  • Support for bfloat16
  • Built on TSMC N5
  • Only about 100W per chip and BGA.
  • The die size is only around 300mm^2
And it's not just baseless claims. Several high-profile clients are quite excited after testing it out:




I'll copy the twitter quote in full:

The most impressive part is the simple monolithic and power efficient design. A single 300mm die (only a third bigger than say Alder Lake S)
Any comparison to AMD's EPYC ? how many cores ?
 
  • Like
Reactions: Drazick

DisEnchantment

Golden Member
Mar 3, 2017
1,590
5,722
136
The most impressive part is the simple monolithic and power efficient design. A single 300mm die (only a third bigger than say Alder Lake S)
I may be the only one who don't find this exciting at all.
300mm2 on N5 is a lot of transistors, more than EPYC Rome, when tailored for lower clock and high density.
Efficiency gains again contributed in a big way by N5
25% more perf than Gravitron 2 is a not at all impressive in 2022. Gravitron 2 is so much behind even 2nd Gen EPYC 7742 in almost all workloads

ML perf and FP perf strange remains to be seen, but OK.

What I find this odd is that this processor is marketed as an ML inferencing processor for which an accelerator will do a much better job and with only mention of Bfloat16 as a main feature?


Google Cloud and Azure will be mighty pleased
AWS will use it anyway but GCP and Azure will sure have an easier time marketing vs AWS.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,482
14,434
136
I may be the only one who don't find this exciting at all.
300mm2 on N5 is a lot of transistors, more than EPYC Rome, when tailored for lower clock and high density.
Efficiency gains again contributed in a big way by N5
25% more perf than Gravitron 2 is a not at all impressive in 2022. Gravitron 2 is so much behind even 2nd Gen EPYC 7742 in almost all workloads

ML perf and FP perf strange remains to be seen, but OK.

What I find this odd is that this processor is marketed as an ML inferencing processor for which an accelerator will do a much better job and with only mention of Bfloat16 as a main feature?


Google Cloud and Azure will be mighty pleased
AWS will use it anyway but GCP and Azure will sure have an easier time marketing vs AWS.
Thanks for that.... 7742 EPYC is 51% faster than graviton 2, so this new processor should be slower still than the 7742, and Milan, or Milan-X that will be out in the same timeframe ? (or are already out) It will be a massacre.

Nothing to see here, move along.....
 
  • Like
Reactions: Drazick

DisEnchantment

Golden Member
Mar 3, 2017
1,590
5,722
136
Thanks for that.... 7742 EPYC is 51% faster than graviton 2, so this new processor should be slower still than the 7742, and Milan, or Milan-X that will be out in the same timeframe ? (or are already out) It will be a massacre.
For general purpose compute 2nd Gen EPYC looks like it would still be faster, but for API gateways and stuff this thing would be much more efficient, hence the reason AMD is creating Zen4c.
The AI perf claims comes from Bfloat16 and fp throughput is to be seen.

Anyway I am very biased since I am mainly Azure user.
 

jpiniero

Lifer
Oct 1, 2010
14,510
5,159
136
Seeing Qualcomm's Snapdragon 8 has worse specs than the MediaTek chip, pretty much confirms that Samsung 4 nm is still quite a bit worse than TSMC N5. They are also using the dual 2-core A510 clusters versus the Dimensity having 4 separate A510 cores.
 

eek2121

Platinum Member
Aug 2, 2005
2,904
3,906
136
What a disappointing refresh. IMO Qualcomm is holding the entire industry back with their draconian patents/fees/licensing agreements. Of course, using Samsung 4nm (which is closer to TSMC 7nm than TSMC 4nm) doesn't help.

I am checking this slide.

CPU_17_575px.png


And they compares the small A510 core to the A73... why they didn't compared to A57 or A72 instead?

Meanwhile, if you look at any CPU from any other major chip maker you'll find must bigger performance gains. Hopefully Google will help spur some innovation. Their current chips aren't particularly exciting, but I have to hope there is more to come.

Meanwhile, Intel/AMD are leapfrogging each other with 10-30% gains in the desktop and laptop segments. Sure, Qualcomm isn't in that space, but if they keep sleeping, even the extremely low power designs from AMD will be faster than the chips they are churning out. AMD DOES have a 7W SKU after all.
 
  • Like
Reactions: Tlh97

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,482
14,434
136
What a disappointing refresh. IMO Qualcomm is holding the entire industry back with their draconian patents/fees/licensing agreements. Of course, using Samsung 4nm (which is closer to TSMC 7nm than TSMC 4nm) doesn't help.



Meanwhile, if you look at any CPU from any other major chip maker you'll find must bigger performance gains. Hopefully Google will help spur some innovation. Their current chips aren't particularly exciting, but I have to hope there is more to come.

Meanwhile, Intel/AMD are leapfrogging each other with 10-30% gains in the desktop and laptop segments. Sure, Qualcomm isn't in that space, but if they keep sleeping, even the extremely low power designs from AMD will be faster than the chips they are churning out. AMD DOES have a 7W SKU after all.
Correct me if I am wrong, but I think AMD has the server chips, that nobody can touch, and I don't see that changing anytime soon. You mentioned the other areas, but servers is the beginning (IMO) of respect for a CPU company. And what I have heard about Milan, and Milan-X makes me even more confident. I own a few Rome chips, and they are outstanding, but the newer ones are incredible.
 
  • Like
Reactions: Drazick and Tlh97

eek2121

Platinum Member
Aug 2, 2005
2,904
3,906
136
Correct me if I am wrong, but I think AMD has the server chips, that nobody can touch, and I don't see that changing anytime soon. You mentioned the other areas, but servers is the beginning (IMO) of respect for a CPU company. And what I have heard about Milan, and Milan-X makes me even more confident. I own a few Rome chips, and they are outstanding, but the newer ones are incredible.

Server chips that scale down to 7W or less. There-in lies the rub. Qualcomm does not operate in a vacuum. Intel and AMD don't currently target the mobile market because they aren't in that business (and because Qualcomm owns a gazillion patents surrounding their modems and can lock in folks). However, what happens when AMD or Intel can produce a chip that performs twice as fast as the flagship Qualcomm chips while maintaining a similar power envelope? Android does not have to run on ARM.

Qualcomm needs to step it up. Apple is ahead of them at minimum. Look at what happened with Google Wear and associated SoCs. Qualcomm's lack of ability to innovate in that area effectively killed the platform. Samsung is the one in the process of reviving it.
 
  • Like
Reactions: Tlh97

NTMBK

Lifer
Nov 14, 2011
10,208
4,940
136
What a disappointing refresh. IMO Qualcomm is holding the entire industry back with their draconian patents/fees/licensing agreements. Of course, using Samsung 4nm (which is closer to TSMC 7nm than TSMC 4nm) doesn't help.



Meanwhile, if you look at any CPU from any other major chip maker you'll find must bigger performance gains. Hopefully Google will help spur some innovation. Their current chips aren't particularly exciting, but I have to hope there is more to come.

Meanwhile, Intel/AMD are leapfrogging each other with 10-30% gains in the desktop and laptop segments. Sure, Qualcomm isn't in that space, but if they keep sleeping, even the extremely low power designs from AMD will be faster than the chips they are churning out. AMD DOES have a 7W SKU after all.

There's a reason why Qualcomm spent billions to buy CaviumNuvia. Clearly they felt that they needed a more competitive CPU than what ARM is offering.

EDIT: Derp
 
Last edited:
  • Like
Reactions: Tlh97

Gideon

Golden Member
Nov 27, 2007
1,608
3,573
136
Somewhat offtopic but relevant enough to discuss, Andrei is leaving Anandtech :oops:

He was the main guy covering mobile, ARM and often the nitty-gritty parts of processors in general (such as cache-latency graphs). While his coverage certainly wasn't perfect and graphs peculiar (and we've complained about it in this very thread) it was at least very different from other and almost always highly informative.

I just can't help but wonder what happens to Anandtech in general now that Ian is the only one left really doing reviews or benches and he seems to have his own stint with TechTechPotato ...

Ryan hasn't updated his GPU test-suite since 2019 and hasn't done anything worth mentioning after the RTX 2xxx series.
 
Last edited:

DisEnchantment

Golden Member
Mar 3, 2017
1,590
5,722
136
I just can't help but wonder what happens to Anandtech in general now that Ian is the only one left really doing reviews or benches and he seems to have his own stint with TechTechPotato ...
It is just a matter of time.
There is not much revenue in this business anymore. Difficult to retain people with decent technical knowhow.
Clickbait sites, YouTube/Twitter chip experts and semiconductor prophets are multiplying like rats and worse of all seems to attract bigger audience than legit outlets.
 

eek2121

Platinum Member
Aug 2, 2005
2,904
3,906
136
It is just a matter of time.
There is not much revenue in this business anymore. Difficult to retain people with decent technical knowhow.
Clickbait sites, YouTube/Twitter chip experts and semiconductor prophets are multiplying like rats and worse of all seems to attract bigger audience than legit outlets.
There is plenty of revenue to be had, but only with proper investment. Gamers Nexus is growing, Linus Tech Tips is growing, etc.
 
  • Like
Reactions: Tlh97 and coercitiv