Discussion ARM Cortex/Neoverse IP + SoCs (no custom cores) Discussion

511 · Sep 10, 2025

branch_suggestion said:
Here I was thinking SME2 would be core level.
Pathetic, cluster level accel is a benchmark hack, nothing more.

Finally someone agrees with me

adroc_thurston · Sep 10, 2025

poke01 said:
Unleashing Leading On-Device AI Performance and Efficiency with New Arm C1 CPU Cluster

The new Arm C1 CPU cluster, delivering unmatched on-device AI performance, power efficiency, and scalability for next-gen mobile devices.

newsroom.arm.com

here

Dear god the naming.
People should be shot for that.

soresu · Sep 10, 2025

poke01 said:
Unleashing Leading On-Device AI Performance and Efficiency with New Arm C1 CPU Cluster

The new Arm C1 CPU cluster, delivering unmatched on-device AI performance, power efficiency, and scalability for next-gen mobile devices.

newsroom.arm.com

here

Ahhh, they didn't replace Cortex branding so much as shorten it.

Interesting that they only increased it to v9.3-A tho.

soresu · Sep 10, 2025

This is a bit more detail...

Arm Lumex Compute Subsystem Platform

Redefining mobile AI with Arm Lumex Compute Subsystem Platform: On-device performance, immersive graphics, and developer-first tools.

www.arm.com

soresu · Sep 10, 2025

C1-Ultra (Travis/X930):

C1-Ultra

C1-Ultra is Arm’s flagship Armv9.3 CPU, delivering best-in-class IPC, expanded caches, advanced prefetching, and low-latency pipelines.

developer.arm.com

C1-Premium (Alto?):

https://developer.arm.com/Processors/C1-Premium

C1-Pro (Gelas/A730):

C1-Pro

C1-Pro is an Armv9.3 CPU delivering sustained performance in flagship CPU clusters when combined with C1-Ultra and C1-Premium cores, and serves as a high-performance big CPU in big.LITTLE clusters alongside the C1-Nano.

developer.arm.com

C1-Nano (Nevis/A530):

https://developer.arm.com/Processors/C1-Nano

G1-Ultra (Drage):

Mali G1-Ultra

Mali G1-Ultra is Arm’s highest-performance GPU, featuring up to 16 shader cores and second-generation ray tracing (RTUv2) integrated per core.

developer.arm.com

G1-Premium:

Mali G1-Premium

High-performance GPU with optional RTUv2 and faster AI, bringing immersive gaming to premium smartphones with power efficiency.

developer.arm.com

G1-Pro:

https://developer.arm.com/Processors/Mali%20G1-Pro

soresu · Sep 10, 2025

PR blurb version...

Smarter, Faster, More Personal AI Delivered on Consumer Devices with Arm’s New Lumex CSS Platform, Driving Double-Digit Performance Gains

Arm introduces Lumex, its most advanced compute subsystem (CSS) platform for consumer devices, powering faster on-device AI, gaming, and real-time intelligence.

newsroom.arm.com

CPU	Key benefit	Performance and efficiency gains	Ideal use cases
C1-Ultra	Flagship peak performance	+25% single-thread performance Double-digit IPC gain year-on-year	Large-model inference, computational photography, content creation, generative AI
C1-Premium	C1-Ultra performance with greater area efficiency	35% smaller area than C1-Ultra	Sub-flagship mobile segments, voice assistants, multitasking
C1-Pro	Sustained efficiency	+16% sustained performance	Video playback, streaming inference
C1-Nano	Extremely power-efficient	+26% efficiency, using less area	Wearables, smallest form factors

soresu · Sep 10, 2025

Seems I was right about Alto/C1-Premium being a ZenC like reduced area, near same flagship IPC core.

I wonder what devices it will end up in, and how that will play out in the Neoverse SKUs.

gdansk · Sep 10, 2025

Amazing to see that they updated the naming in the documentation 1.5 years ago. And in that time no one rethought the decision.
But they have so many core types it's a lost battle. Most consumers aren't going to care or know. They may stick to "MediaTek bad, Exynos bad, Qualcomm good".

soresu · Sep 10, 2025

The naming as Adroc said is terrible.

It should at least be consistent, which it really isn't as to me Pro sounds better than Premium.

Call the Nano core Micro instead, or perhaps even Kilo, saving the smol scale names like Micro and Nano for embedded or real time only.

Then Pro, Premium and Ultra can be Mega, Giga and Tera 🤘

My naming scheme probably isn't any better for the academic peasants out there tho 😂😆

511 · Sep 10, 2025

soresu said:
The naming as Adroc said is terrible.

It should at least be consistent, which it really isn't as to me Pro sounds better than Premium.

Call the Nano core Micro instead, or perhaps even Kilo, saving the smol scale names like Micro and Nano for embedded or real time only.

Then Pro, Premium and Ultra can be Mega, Giga and Tera 🤘

My naming scheme probably isn't any better for the academic peasants out there tho 😂😆

Core name should be
Core - A520
Ultra Core - A725
Ultron pro Max Core - X930

Doug S · Sep 10, 2025

branch_suggestion said:
Here I was thinking SME2 would be core level.
Pathetic, cluster level accel is a benchmark hack, nothing more.

No, you guys simply don't understand. Look at M4 annotated die photos, the SME unit is HALF the size of a P core! It wouldn't be practical to include one that size in every core. So let's say you split it up and each P core got 1/4 of the SME capability. Your ST capability for SME suffers, and your MT capability is roughly the same. I guess you'd call that a win? If so you ignore that the cost is that to fully exploit it ALL FOUR CORES are busy!

With the separate unit you can get the entire cluster's worth of SME performance with just one core, leaving the other three cores free to do other stuff. That's a clear and undeniable win.

These matmul instructions aren't something where you might slip in just a few instructions worth here and there like NEON/SSE/AVX512. It is only going to be used for longer sequences - the kind of stuff that previously was kicked off to a GPGPU (if you needed enough of it and were willing to deal with the hassle) or more often run slower using existing FP capability. There is no benefit to having SME in each core, if the cost is that each core has less of it.

The only people calling it a "benchmark hack" are people reaching and grasping at straws for excuses why AVX512 acceleration is A-OK but SME is not. They're butthurt because ARM is finally benefitting from special instructions which used to be a great way to make x86 score better on benchmarks without programs using the traditional instructions that have existed for many years benefiting nearly as much.

511 · Sep 10, 2025

Doug S said:
The only people calling it a "benchmark hack" are people reaching and grasping at straws for excuses why AVX512 acceleration is A-OK but SME is not. They're butthurt because ARM is finally benefitting from special instructions which used to be a great way to make x86 score better on benchmarks without programs using the traditional instructions that have existed for many years benefiting nearly as much.

Intel AMX exists and it is per core. SME is a benchmark hack for Single Core if it's not part of the core why don't they show us the SPEC Score instead of memebench with a generic GCC -O2 with same compiler.

Doug S · Sep 10, 2025

511 said:
Intel AMX exists and it is per core. SME is a benchmark hack for Single Core if it's not part of the core why don't they show us the SPEC Score instead of memebench with a generic GCC -O2 with same compiler.

So if instead of making SME per cluster they had one core that was SME enabled in each cluster you'd be fine with it? Or would you say "it only counts if ALL the cores have it", because you're just going to look for any reason to not count something that helps ARM be faster than your beloved x86?

This is the reason I only care about comparisons on the clang subtest, or gcc on SPEC. Then none of the SIMD bs counts, only the same type of real instructions that a compiler is generating not hand coded sequences. If a compiler can generate AVX512 code in the clang subtest more power to it!

511 · Sep 10, 2025

Doug S said:
So if instead of making SME per cluster they had one core that was SME enabled in each cluster you'd be fine with it? Or would you say "it only counts if ALL the cores have it", because you're just going to look for any reason to not count something that helps ARM be faster than your beloved x86?

This is the reason I only care about comparisons on the clang subtest, or gcc on SPEC. Then none of the SIMD bs counts, only the same type of real instructions that a compiler is generating not hand coded sequences. If a compiler can generate AVX512 code in the clang subtest more power to it!

You have to pass AVX512 or x86_64V4 falg for Clang to generate AVX-512 if you don't pass that it won't generate AVX-512. That's why I said generic O2 Compile.
And I meant either all core should support or it shouldn't count as ST.

hemedans · Sep 10, 2025

gdansk said:
Amazing to see that they updated the naming in the documentation 1.5 years ago. And in that time no one rethought the decision.
But they have so many core types it's a lost battle. Most consumers aren't going to care or know. They may stick to "MediaTek bad, Exynos bad, Qualcomm good".

Nowadays it's "Mediatek offer better value", "Qualcomm overpriced", "Exynos overheat"

soresu · Sep 10, 2025

511 said:
Intel AMX exists and it is per core. SME is a benchmark hack for Single Core if it's not part of the core why don't they show us the SPEC Score instead of memebench with a generic GCC -O2 with same compiler.

I'd rather it's per cluster and not wasted space on the cores when it's not in use.

It's only a benchmark hack because a certain benchmark used it as part of its generic score and doesn't include the score without it.

Now that we all know about it the benchmark is exposed as pandering at best.

511 · Sep 10, 2025

soresu said:
I'd rather it's per cluster and not wasted space on the cores when it's not in use

Than don't potray it as a part of Single Core performance which GB is doing.

DZero · Sep 10, 2025

So, A5XX is nano, where A3XX is then? Pico?

ToTTenTranz · Sep 10, 2025

DZero said:
So, A5XX is nano, where A3XX is then? Pico?View attachment 129936

Looks like C1-Nano is the only LITTLE core they'll release, so it replaces everything from A520 to A320 and all the A53 being used in smartwatch SoCs.

soresu · Sep 10, 2025

511 said:
Than don't potray it as a part of Single Core performance which GB is doing.

tbh I'm not that bothered by it.

It's not like the average buyer is ever going to look that closely (or even needs as much CPU perf as they can get today), and the more discerning crowd like us know the problem exists anyway.

I'm more interested in a deep dive IPC and perf/watt testing round to see where the new cores stand, as the ARM PR slides are utterly terrible and uninformative.

511 · Sep 10, 2025

soresu said:
m more interested in a deep dive IPC and perf/watt testing round to see where the new cores stand, as the ARM PR slides are utterly terrible and uninformative.

We got a meme slide in name of IPC and performance per watt

soresu · Sep 10, 2025

hemedans said:
Nowadays it's "Mediatek offer better value", "Qualcomm overpriced", "Exynos overheat"

The Exynos thing seems more a result of Samsung's never ending fab node woes.

The semicon design team tried to make a deal with TSMC to fab Exynos but they would rather just give Sammy the middle finger even though they are fabbing Intel who are threatening to be a better fab competitor (at least in tech if not partnership competency) than Samsung.

soresu · Sep 10, 2025

511 said:
We got a meme slide in name of IPC and performance per watt

Precisely, sick of this from ARM.

Eventually someone will measure it properly with benchmarks and voltmeters so there's no point in trying to obfuscate the facts with PR that doesn't functionally matter to people that actually buy ARM based products.

DZero · Sep 10, 2025

ToTTenTranz said:
Looks like C1-Nano is the only LITTLE core they'll release, so it replaces everything from A520 to A320 and all the A53 being used in smartwatch SoCs.

Interesting, so Nano is being pushed out of the mainstream?
Oh boy... so in some years, all the phones might end with processors with all out of order cores like Dimensity 9300?

mikegg · Sep 10, 2025

ARM trying its best to insert GenAI into every CPU media release. It's kind of annoying. No, no one is using an ARM CPU to do inference. At best, ARM is just a sidekick in an Nvidia system. Everyone is using GPUs to do inference.

Discussion ARM Cortex/Neoverse IP + SoCs (no custom cores) Discussion

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Diamond Member

Platinum Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Platinum Member

Platinum Member