News [Anandtech] Arm Announces Mobile Armv9 CPU Microarchitectures: Cortex-X2, Cortex-A710 & Cortex-A510

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Gideon

Golden Member
Nov 27, 2007
1,644
3,705
136
  • Like
Reactions: Tlh97 and NTMBK

Thala

Golden Member
Nov 12, 2014
1,355
653
136
its simple logic, even if the scheduler works perfectly (i want to see ADP first), you really want someone to make a notebook SoC with Quad X2 + Quad A510?

I was only commenting on the scheduler - not on the other points of your post.

Regarding your question, it is not clear to me what TDP you are aiming for, when talking about a notebook SoC...lets assume 15W for a moment (that would be a for slim notebook, but too much for a slim tablet) - then a Quad X2 + Quad A510 would be much too low power to even remotely touch 15W. For 15W they need to go at least 8x X2 + 4x A510 or more. For reference - with X2 + 3x A710 + 4x A510 they are most likely aiming at 3W TDP SoCs, like the next generation Exynos/Snapdragon. You are asking for a SoC, which has roughly 5x the thermal capacity.
 

Gideon

Golden Member
Nov 27, 2007
1,644
3,705
136
At least we have 64bit and ARMv9 as standard. I failed my Google-fu a bit, but I remember seeing slides showing large gains from Auto-Vectorization on LLVM in a wide sortiment of applications just going from NEON to SVE2. There should be quite large gains for binaries compiled with ARMv9 as the exclusive target.

I still didn't find the graphs, but here is an excellent talk that shows the avenues LLVM can use with SVE2:

https://hps.vi4io.org/_media/events/2020/llvm-cth20_lovett.pdf [PDF]

Pages 14 and onwards are particularily of interest.

So all-in-all code compiled to Arm v9 target could very well get unexpected speedups in some applications that play well with Auto-Vectorization. There at least are a huge number of new avenues for that. For client it probably isn't a big deal for a while, as backwards compatibility is a must, but this can totally flip some close results in some server workloads. I wouldn't even be surprised the least if SPEC benefits from this.

This talk is already nearly a year old, considering a lot of SVE related commits are being merged currently I wonder what's the status now:

EDIT:

I missed the target date. LLVM 13 (September 2021) is supposed to have Full support for Vector-length agnostic SVE vectorization. Wouldn't mind seeing benches with Ampere Altera vs Zen3 then with the latest compiler (which should also add optimizations to the latter)
 
Last edited:
  • Like
Reactions: moinmoin and Tlh97

Shivansps

Diamond Member
Sep 11, 2013
3,855
1,518
136
I was only commenting on the scheduler - not on the other points of your post.

Regarding your question, it is not clear to me what TDP you are aiming for, when talking about a notebook SoC...lets assume 15W for a moment (that would be a for slim notebook, but too much for a slim tablet) - then a Quad X2 + Quad A510 would be much too low power to even remotely touch 15W. For 15W they need to go at least 8x X2 + 4x A510 or more. For reference - with X2 + 3x A710 + 4x A510 they are most likely aiming at 3W TDP SoCs, like the next generation Exynos/Snapdragon. You are asking for a SoC, which has roughly 5x the thermal capacity.

A notebook SoC is 10 to 15W, but remember that it cant be the same as a phone SoC, you need to have a lot more I/O.
8xX2 SoC and 4xX2+4xA710 SoC seems to be the correct number to use to me.

Too bad that ARMv9 is too late for the RPI5 :(
 
  • Like
Reactions: Tlh97

DrMrLordX

Lifer
Apr 27, 2000
21,637
10,855
136
Too bad that ARMv9 is too late for the RPI5 :(

Not a huge loss. Bigger cores than A72 on smaller present/future nodes could yield isopower performance gains. A73, A75, A76, A77, and A78 are all available. I don't know that Pi would gain all that much from ARMv9 regardless.

Going back to A55 and A510, I will agree that +35% performance after all this time is . . . underwhelming? A55 was new in 2017. It should have been updated multiple times by now.
 

Geranium

Member
Apr 22, 2020
83
101
61
We’ll have to wait for the new generation SoCs to actually hit the market for us to test the new A510 cores, but if indeed they come with larger power consumption operating points to achieve higher performance, then Arm won't be much nearer in catching up to what Apple has been doing with their efficiency cores. As of the latest generation of SoCs, Apple’s efficiency cores were around 4x faster than any Cortex-A55 based SoC. Which, running at roughly the same system active power, also made them 3-4x more efficient in the traditional benchmarks. As presented, a theoretical A510 SoC won't be able to close that efficiency gap at all.
Bit of fanboyism in Anderi's part. 35% improvement is quite better for those low power core. Efficiency calculation based on one canned benchmark is not optimal.

And also in-order design don't have Specter/Meltdown vulnerability like every other OoO CPU.
 
  • Like
Reactions: soresu

Thala

Golden Member
Nov 12, 2014
1,355
653
136
I already said, "it should have been updated multiple times by now". The big cores were.

I was asking if you consider the performance an issue. Updating a CPU several times does not give different answers to the question where the power/performance optimum is - and clearly performance is not the driving factor when developing these small cores.
 

Gideon

Golden Member
Nov 27, 2007
1,644
3,705
136
I was asking if you consider the performance an issue. Updating a CPU several times does not give different answers to the question where the power/performance optimum is - and clearly performance is not the driving factor when developing these small cores.
But performance per watt certainly should be, right? The fact that these cores use about as much energ yas Apple's small cores is what dooms them.
 

Thala

Golden Member
Nov 12, 2014
1,355
653
136
But performance per watt certainly should be, right? The fact that these cores use about as much energ yas Apple's small cores is what dooms them.

Of course performance per watt at a certain low performance point is the most significant metric. Precisely this was the reason for my question, as DrMrLordX was apparently only looking at performance when judging the core - while performance is the least important metric if it is above a certain threshold.
Regarding your reference to Apple cores, could you please show the particular power/energy numbers you were referring to when comparing both cores? Keep in mind we are comparing low utilization use-cases.
 

Gideon

Golden Member
Nov 27, 2007
1,644
3,705
136
Of course performance per watt at a certain low performance point is the most significant metric. Precisely this was the reason for my question, as DrMrLordX was apparently only looking at performance when judging the core - while performance is the least important metric if it is above a certain threshold.
Regarding your reference to Apple cores, could you please show the particular power/energy numbers you were referring to when comparing both cores? Keep in mind we are comparing low utilization use-cases.
Well one is 4x faster in spec with similar power draw. If it used significantly more power during low utilization cases it would be shown in battery life.

And why should it be any different? A55 is a tiny dumb core, it doesn't have anything special on board to be vastly more power efficient at lower utilization.

Besides all that, absolute performance osn't meaningless as small cores are used for all core loads too. It's less important in phones, but certsinly in tablets and laptops (where ARM is pressing too, not only Apple)
 
  • Like
Reactions: Tlh97 and coercitiv

soresu

Platinum Member
Dec 19, 2014
2,662
1,862
136
plenty of budget phone SoCs use small core clusters only
I'd say that the mainstream segment below high end is mostly SoC's using older generation 'big' Axx cores, so basically SoC's with 2-3 gen old cores on a newer node.

Below that pretty much all truly budget phone SoC's use only little cores.

Though even in mainstream and high end/flagship phones the little cores are still very important.

Anything that doesn't require a heavy compute load and/or background apps running while the phone display is off are going to be running on those cores.

So they actually play a pretty big part in the general efficiency of a SoC for a battery powered mobile device, and next to radio power draw they will probably play the biggest part in battery life.

While radios currently eat so battery much that this is less significant, the coming advent of 'passive' wireless tech will put the efficieny ball much more in the court of the little cores.

Not to mention they are used in basically 90+% of all streaming devices like Fire TV, Chromecast, Roku etc etc which is no small market by itself, and likely way more than half of all STB and TV SoC's too.

Sadly streamers seem to be very slow to get new upgrades, I think the first to get A55 was the latest Chromecast 4 that came last October, nearly three and a half years after the core IP was announced in 2017 along with A75.

Going by that we'll probably see A510 in streamers late 2024 😅🤣
 

Lodix

Senior member
Jun 24, 2016
340
116
116
Here in the review of the iPhone 12, Andrei analyzed the "little" cores from the A14 and said they have 4x the performance of the Cortex A55 while consuming similar amounts of power. Resulting in 3x in efficiency.

I think Apple "little" cores are more similar to ARM middle cores. If you take the Cortex A77 @2'4GHz from the Snapdragon 865 they have similar IPC. If you take in mind the difference in frequency and process node they end up with similar power consumption too. So probably the new Cortex A710 is a better alternative than Apple little cores.

The advantage of the A55/510 could probably be in ultra low power scenarios like when the screen is off for standby battery life or to run some of the extra "bloat" Android has on the background and since iOS is more "efficient" and has less things running in the background they don't need specialized little cores for that. And since most of the ARM's clients are making chips for Android it makes sense.

This is from the anandtech article: "Arm is still adamant that for the kind of general use-cases in which the little cores are used in mobile phones – such as very light UI workloads – that their little core approach is still the most power-efficient way to achieve the best “DoU” or days of use figures. This is based in part on their internal projections as well as their partners', all of which indicate that the the triple issue in-order design they've developed is the most efficient option."
19e3ebb23e2f29b200204c9f4a67644e.jpg
e1f52ed4e1bd0dc76bddafe8c8067862.jpg


Sent from my SM-G998B using Tapatalk
 

soresu

Platinum Member
Dec 19, 2014
2,662
1,862
136
where ARM is pressing too, not only Apple
*where ARM vendors like Qualcomm and Samsung are pressing.

ARM are just working with anticipated supply and demand for core IP among those vendors based on market trends like the laptop push.

Which is part of what worries me about QC's Nuvia acquisition.

It will put something of a dent in ARM's X1/2+ licensing revenue and possibly even 'big' Axxx too unless other SoC vendors pick up some slack there making the development of larger cores less attractive to ARM.

Though I guess Samsung do still crank out a lot of high end and mainstream phones and only use QC SoC's in limited areas now, so there is still that.

Hopefully restrictions on Huawei will be lifted soon as they were no small licensee at that.
 
Last edited:
  • Like
Reactions: Lodix

soresu

Platinum Member
Dec 19, 2014
2,662
1,862
136
Going back to A55 and A510, I will agree that +35% performance after all this time is . . . underwhelming? A55 was new in 2017. It should have been updated multiple times by now.
One of the slides on A510 implies that it is merely the first of a whole new generation of little cores from Cambridge rather than a once off every 3-4 years.

I think Apple's YoY 'little' (not so little now) core cadence finally got through to ARM and caused them to rethink their current slow upgrade cycle.
Not a huge loss. Bigger cores than A72 on smaller present/future nodes could yield isopower performance gains. A73, A75, A76, A77, and A78 are all available. I don't know that Pi would gain all that much from ARMv9 regardless.
It's a essentially a cheap dev board, so following an up to date ISA core seems fairly smart, even if that means a dip in raw performance over the last generation of the board.

Given A510 is supposedly within 10% of A73 integer IPC at 35% less power I would not be surprised to see a 2nd or 3rd generation of that family end up in Pi5.

Even just A510 would be close to A72 performance at much less power given A73 was already more power efficient than A72.
 

soresu

Platinum Member
Dec 19, 2014
2,662
1,862
136
Here in the review of the iPhone 12, Andrei analyzed the "little" cores from the A14 and said they have 4x the performance of the Cortex A55 while consuming similar amounts of power. Resulting in 3x in efficiency.
DId that account for the power efficiency difference of TSMC N5 to the Samsung process the Android SoC vendors are using?
 

Thala

Golden Member
Nov 12, 2014
1,355
653
136
Well one is 4x faster in spec with similar power draw. If it used significantly more power during low utilization cases it would be shown in battery life.

Thats nice, but where did you get the numbers from? What was your setup and method when measuring power and performance of these cores? How did you even isolate the power from a single core?
4xperformance at similar power is totally unrealistic and you should question your method of measurement.

And why should it be any different? A55 is a tiny dumb core, it doesn't have anything special on board to be vastly more power efficient at lower utilization

There is nothing in particular, except it is much smaller and has lower leakage. At low utilization leakage becomes a dominant factor.

Besides all that, absolute performance osn't meaningless as small cores are used for all core loads too. It's less important in phones, but certsinly in tablets and laptops (where ARM is pressing too, not only Apple)

Thats a silly argument. If you need more performance just use more larger cores. Its not that we are talking Intel/Alderlake here, where Intel has to use smaller cores for performance reasons to compensate for the inefficiencies of their larger cores. You wont see AMD (and neither ARM) doing this stupid stunt.
If the cores main purpose is to increase system efficiency under low utilization scenarios - and even today the A55 are largely under-utilized - then it is a very valid question how far you need to increase performance and potentially sacrifice efficiency along the way. I would even go as far to say, that for devices like the Surface Pro X, using 4 small cores is already too much - and increasing their performance to get better MT scores under high load is inherently the wrong path.
 
Last edited:

Thala

Golden Member
Nov 12, 2014
1,355
653
136
The numbers aren't that different when comparing to A13 little cores on 7nm

The performance surely is. Are you really implying, that you took the performance of the A14 cores but the power of the A13 cores and then mix both together to conclude the efficiency?
You should either take the performance and power of A14 and then compensate for process, or optionally just use A13 as reference.
 

Gideon

Golden Member
Nov 27, 2007
1,644
3,705
136
The performance surely is. Are you really implying, that you took the performance of the A14 cores but the power of the A13 cores and then mix both together to conclude the efficiency?
You should either take the performance and power of A14 and then compensate for process, or optionally just use A13 as reference.
It's a 35% uplift in integer workloads with the same power-draw. Still a far cry from the 3-4x difference compared to A55, and now you've normalized battery life and process.
 
  • Like
Reactions: Tlh97

soresu

Platinum Member
Dec 19, 2014
2,662
1,862
136
It's a 35% uplift in integer workloads with the same power-draw. Still a far cry from the 3-4x difference compared to A55, and now you've normalized battery life and process.
I'm starting to feel like Anand has purposely made that graph to be confusing given it lacks process node context and in places even the name of the CPU core that certain bars represent.

I certainly don't see any normalization - I mean is it so hard to make a graph that directly compares only big/performance core to big/performance core and a separate graph to compare little/efficiency core to the same?

It also lacks any comparison for the different cores inside the Kirin SoC's despite the fact that we know that even on the same process that different SoC vendors can get some drift in performance/power.
 

soresu

Platinum Member
Dec 19, 2014
2,662
1,862
136
It's a 35% uplift in integer workloads with the same power-draw. Still a far cry from the 3-4x difference compared to A55, and now you've normalized battery life and process.
I just went back over those graphs from Anand's article you posted a few posts back.

Neither of them have A55 perf or energy consumption data on them?