News [Anandtech] Arm Announces Mobile Armv9 CPU Microarchitectures: Cortex-X2, Cortex-A710 & Cortex-A510

Gideon

Golden Member
Nov 27, 2007
1,608
3,571
136
  • Like
Reactions: Tlh97 and NTMBK

uzzi38

Platinum Member
Oct 16, 2019
2,565
5,573
146
Update to ARMv9 is nice, but performance uplifts are eh. For X2 they compared 4MB L3 X1 vs 8MB X2, despite using 8MB L3 on last year's slides. While they did claim that they expect 8MB L3 to be the standard this year... They also did last year, yet neither QC nor SS went 8MB L3 for the X1 core.

The little core is a tad more disappointing though. +35% perf there is pretty lacklustre considering how weak these cores are, and from ARM's own slides they're not even more particularly more efficient, just just scale up in power consumption and performance better. Compared to Apple's littles they're rather lacklustre.

Also mild lol at ARM going the Bulldozer design, but hey if they feel it nets them more efficiency, then by all means I guess.
 

Gideon

Golden Member
Nov 27, 2007
1,608
3,571
136
Update to ARMv9 is nice, but performance uplifts are eh. For X2 they compared 4MB L3 X1 vs 8MB X2, despite using 8MB L3 on last year's slides.
That's true, and they did the same 8MB vs 4MB comparison for A710 as well, which is even more disappointing.

The little core is a tad more disappointing though. +35% perf there is pretty lacklustre considering how weak these cores are, and from ARM's own slides they're not even more particularly more efficient, just just scale up in power consumption and performance better. Compared to Apple's littles they're rather lacklustre.
That's true, it seems their Cambridge design team is behind the curve with its in-order designs :(

But at least the upside is that they did finally update the little core microarchitecture, and seem to plan to do it more often from now on (at least it looks to be the case judging by this slide)

At least we have 64bit and ARMv9 as standard. I failed my Google-fu a bit, but I remember seeing slides showing large gains from Auto-Vectorization on LLVM in a wide sortiment of applications just going from NEON to SVE2. There should be quite large gains for binaries compiled with ARMv9 as the exclusive target.
 
Last edited:
  • Like
Reactions: Tlh97 and uzzi38

NostaSeronx

Diamond Member
Sep 18, 2011
3,683
1,218
136
Also mild lol at ARM going the Bulldozer design, but hey if they feel it nets them more efficiency, then by all means I guess.
Two processors sharing a co-processor datapath is definitely not a Bulldozer design.

CMT-designs are a single processor. Which a dual-processor complex A510 is not.

It doesn't have any advantage of a CMT-processor so I doubt this design actually netted them energy.

Potential CMT successor to E1 > Neoverse E1 > Cortex A510
Best to worst in energy efficiency.
 
Last edited:

Gideon

Golden Member
Nov 27, 2007
1,608
3,571
136
From the 4th page
We’ll have to wait for the new generation SoCs to actually hit the market for us to test the new A510 cores, but if indeed they come with larger power consumption operating points to achieve higher performance, then Arm won't be much nearer in catching up to what Apple has been doing with their efficiency cores. As of the latest generation of SoCs, Apple’s efficiency cores were around 4x faster than any Cortex-A55 based SoC. Which, running at roughly the same system active power, also made them 3-4x more efficient in the traditional benchmarks. As presented, a theoretical A510 SoC won't be able to close that efficiency gap at all.
OUCH.

While I still dislike most of what Apple does overall as a company (Vendor Lock-ins, designing products that can't be repaired, total lockdown on app-devs perspective) and MacOS is a true disaster GUI-wise. I have to give them that, they certainly know how to design SoCs.
 

dark zero

Platinum Member
Jun 2, 2015
2,655
138
106
Since this is the transition ARM v9 chip, is decent. But if the next core is 64 bit only, we need to point that is very possible to see a new little, "mid" and big core next year. With a big performance uplift.

GPU wise the improvements are interesting. Still... I am thinking that nVIDIA would make gamebreaking GPUs if they enters on that market.
 

Gideon

Golden Member
Nov 27, 2007
1,608
3,571
136
The L3 Cache with 5x the bandwidth is a huge (and much needed) change. It looks to be a Ring-bus design similar to what Intel does.

A higher-end laptop SoC with 8x X2 cores (or even just 4x X2, 2x A710 and 2x A510) and 16MB of L3 would actually be very interesting. Too bad that the only company really doing Windows on ARM SOC's is hell-bent on releasing 4-year old designs and let this market die. Snapdragon c7 is a disgusting release for anything but garbage-bin Chromebooks.

... And to think that's the company that bought Nuvia
 

soresu

Platinum Member
Dec 19, 2014
2,617
1,812
136
The L3 Cache with 5x the bandwidth is a huge (and much needed) change. It looks to be a Ring-bus design similar to what Intel does.

A higher-end laptop SoC with 8x X2 cores (or even just 4x X2, 2x A710 and 2x A510) and 16MB of L3 would actually be very interesting. Too bad that the only company really doing Windows on ARM SOC's is hell-bent on releasing 4-year old designs and let this market die. Snapdragon c7 is a disgusting release for anything but garbage-bin Chromebooks.

... And to think that's the company that bought Nuvia
It's because they bought Nuvia that they are releasing old designs, all the better to make the Nuvia designs look when they come out.

Just as Apple are doing by not putting Threadripper/EPYC into Mac Pro to make the coming higher end Mx designs look all the better for it.

Samsung will be pushing the ARM cores for laptop Exynos though having abandoned their own custom project and with no Nuvia equivalent to buy on the market.
 
  • Like
Reactions: Tlh97

Panino Manino

Senior member
Jan 28, 2017
813
1,010
136
The performance increase of the little core is expected, but as everyone is saying it's still disappointing. They really don't seem to care, that's why they went the Bulldozer Way (despite better) to dedicate the maximum amount of transistors to the big cores... that are still disappointing.
It's hopeless, Apple have no competition. (SAVE US AMD!)
 
  • Like
Reactions: Tlh97

beginner99

Diamond Member
Jun 2, 2009
5,208
1,580
136
Yeah the little cores are disappointing. Explains why they didn't release something earlier if that is the best they can do after all these years.

I wonder if it wouldn't have been better to make them more efficient than more powerful. Maybe I'm missunderstanding how ARM SOCs work but isn't any user interaction basically always happening on the big cores? So little-core performance only matters for background and hence efficiency seems much more important. Or what am I missing?
 
  • Like
Reactions: Tlh97

NostaSeronx

Diamond Member
Sep 18, 2011
3,683
1,218
136
They really don't seem to care, that's why they went the Bulldozer Way (despite better) to dedicate the maximum amount of transistors to the big cores... that are still disappointing.
Again, Cortex A510 dual-processor complex isn't Clustered Multithreading/Multicluster Multithreading.

Also, if they did go the CMT/MCMT route it would have been better than the A510.

CMT = 30% of the total energy of two processors.
SMT = 40% of the total energy of two processors.
CMP = 100% of the total energy of two processors. (Baseline)

Two Cortex-A510 processor complex is only sharing the FPU datapath(+L2c/L2tlb). Which is probably only because of the sheer size of the vector floating-point datapath. Relative, to a single A510 processor.

There is no CMT/MCMT architecture design choices with the A510, thus there is no actual energy efficiency gain similar to that of a SMT architecture.
 
Last edited:
  • Like
Reactions: Tlh97 and dark zero

dark zero

Platinum Member
Jun 2, 2015
2,655
138
106
Yeah the little cores are disappointing. Explains why they didn't release something earlier if that is the best they can do after all these years.

I wonder if it wouldn't have been better to make them more efficient than more powerful. Maybe I'm missunderstanding how ARM SOCs work but isn't any user interaction basically always happening on the big cores? So little-core performance only matters for background and hence efficiency seems much more important. Or what am I missing?
I am checking this slide.

CPU_17_575px.png


And they compares the small A510 core to the A73... why they didn't compared to A57 or A72 instead?
 

LightningZ71

Golden Member
Mar 10, 2017
1,627
1,898
136
I'm going to support Nosta here. The existence of the A510 is SOLELY related to ARM's decision to go strictly AArch64, ARMv9 going forward. The A55 wasn't going to handle that shift in any sort of reasonable way. They HAD to have a vector unit in there that didn't make the core even more power hungry than it already was. The only way to keep the transistor budget and floor space requirements of the little cores within sane limits was to share the vector unit between pairs of cores. It doesn't have to be that way. There is a design option to have dedicated vector units for single A510 processors if the chip maker really wants it.

I expect, in the future, that ARM will be able to increase the efficiency of the A510 series reasonably well. I don't think that they are going to extract a whole lot more performance out of it due to it being in-order. That may not matter much, given the stated purpose of those cores. As long as they can keep their voltages low and their power draw in check, they will do an adequate job hosting background tasks and extending battery life.
 
  • Like
Reactions: Tlh97

Shivansps

Diamond Member
Sep 11, 2013
3,835
1,514
136
Good to see A55 finally retire. Not big updates on performance for big or little but there is something at least. I still think you cant use that little core at all if you are planing to do any kind of Windows SoC because it is going to kill the MT perf.
 
  • Like
Reactions: Tlh97 and dark zero

Shivansps

Diamond Member
Sep 11, 2013
3,835
1,514
136
The L3 Cache with 5x the bandwidth is a huge (and much needed) change. It looks to be a Ring-bus design similar to what Intel does.

A higher-end laptop SoC with 8x X2 cores (or even just 4x X2, 2x A710 and 2x A510) and 16MB of L3 would actually be very interesting. Too bad that the only company really doing Windows on ARM SOC's is hell-bent on releasing 4-year old designs and let this market die. Snapdragon c7 is a disgusting release for anything but garbage-bin Chromebooks.

... And to think that's the company that bought Nuvia

I would forget about A510, Windows scheduler alone it is likely to kill perf there... I could see Windows SoC with 8x X2 as High end, 4x X2 and 4x A710 for mainstream, and 2x X2 and 4x A710 for entry level. But forget about A510, leave that for phones.

It would remain to see how power efficient they are once you start adding the needed I/O and how they perform vs x86 big cores that have HT/SMT on them.
 
  • Like
Reactions: Tlh97 and Gideon

AkulaMD

Member
May 20, 2017
56
17
81
ARM consumer stack updates:

  • Finally a decent "little" core update (A55 -> A510) with 35% performance gain
  • Big core is less ambitious (A78 -> A710) with 10% uplift mentioned
  • X2 is supposedly 16% faster than X1
  • Lots of other changes, Armv9 ISA (with decent vector ops finally), new interconnects, and more L3 cluster designs

Seems like they want to once and for all bring down the A7xx big core to a more definitive mid level instead of at the top like in the past.
 

AkulaMD

Member
May 20, 2017
56
17
81
Good to see A55 finally retire. Not big updates on performance for big or little but there is something at least. I still think you cant use that little core at all if you are planing to do any kind of Windows SoC because it is going to kill the MT perf.
And it will definitely kill the single thread performance.
 

krumme

Diamond Member
Oct 9, 2009
5,952
1,585
136
There is no free lunch.
We need some area measures to discuss this.
What is the current size of a55 a78 and x1?
What is apple big and little cores?

You get dirt cheap phones with a73 or even a75. Those cant be much bigger than apple little cores.
 
  • Like
Reactions: spursindonesia

Thala

Golden Member
Nov 12, 2014
1,355
653
136
I would forget about A510, Windows scheduler alone it is likely to kill perf there.

Windows Scheduler is not killing anything. The A55 in the past did contribute linearly to their relative capabilities to overall MT performance - so there is nothing what the Windows scheduler is doing wrong here. Stop speculating...
 

LightningZ71

Golden Member
Mar 10, 2017
1,627
1,898
136
Keep in mind, with Intel investing in big.LITTLE, there will certainly be a level of awareness of each core's capabilities in future scheduling algorithms in Windows. While the "Wintel" ecosystem may not have quite the hold on the market that it used to, it is still a mighty strong force to be reckoned with. It's already been taught to load physical cores first, and to be NUMA aware, all the way down to the home edition.
 

Thala

Golden Member
Nov 12, 2014
1,355
653
136
Keep in mind, with Intel investing in big.LITTLE, there will certainly be a level of awareness of each core's capabilities in future scheduling algorithms in Windows.

The Windows scheduler sees the little cores as just slower cores of the same capabilities. I believe the same assumption can be made for Alderlake & Co - in fact Microsoft already stated that they enhanced their scheduler with big.LITTLE awareness few years ago - and it works as you would expect (given that the scheduler cannot look into the future).
 
Last edited:

insertcarehere

Senior member
Jan 17, 2013
639
607
136
ARM baffles me, they already HAD a small OOO core design in the Neoverse E1/A65. Why not just tart that thing up for AArch64 and get something that would work for Neoverse E2 as well? Sticking with in-order for the small core just doesn't seem to make much sense.
 

Shivansps

Diamond Member
Sep 11, 2013
3,835
1,514
136
Windows Scheduler is not killing anything. The A55 in the past did contribute linearly to their relative capabilities to overall MT performance - so there is nothing what the Windows scheduler is doing wrong here. Stop speculating...

its simple logic, even if the scheduler works perfectly (i want to see ADP first), you really want someone to make a notebook SoC with Quad X2 + Quad A510? We are talking about having half the cpu with around Braswell level perf with the little extra of being limited to in-order. Even today that thing would have a really hard time vs something like the 5600U.

Maybe its fine for some niche product like a premium Windows tablet were there is little competition, but not for the mainstream notebook market. Except maybe the low-end that is invaded by dual and quad Atom-based stuff.
 
Last edited:
  • Like
Reactions: Tlh97

moinmoin

Diamond Member
Jun 1, 2017
4,933
7,619
136
My biggest problem is the sudden shift to three digits for the model numbers. A510 and A710 make me think of (non-existing) cheap motherboards, not of ARM cores...