Discussion ARM Cortex/Neoverse IP + SoCs (no custom cores) Discussion

Page 58 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

soresu

Diamond Member
Dec 19, 2014
4,105
3,566
136
C1-Ultra (Travis/X930):

C1-Premium (Alto?):

C1-Pro (Gelas/A730):

C1-Nano (Nevis/A530):

G1-Ultra (Drage):

G1-Premium:

G1-Pro:
 
  • Like
Reactions: 511

soresu

Diamond Member
Dec 19, 2014
4,105
3,566
136
PR blurb version...


CPU Key benefit Performance and efficiency gains Ideal use cases
C1-Ultra Flagship peak performance +25% single-thread performance
Double-digit IPC gain year-on-year
Large-model inference, computational photography, content creation, generative AI
C1-Premium C1-Ultra performance with greater area efficiency 35% smaller area than C1-Ultra Sub-flagship mobile segments, voice assistants, multitasking
C1-Pro Sustained efficiency +16% sustained performance Video playback, streaming inference
C1-Nano Extremely power-efficient +26% efficiency, using less area Wearables, smallest form factors
 

soresu

Diamond Member
Dec 19, 2014
4,105
3,566
136
Seems I was right about Alto/C1-Premium being a ZenC like reduced area, near same flagship IPC core.

I wonder what devices it will end up in, and how that will play out in the Neoverse SKUs.
 
  • Like
Reactions: Tlh97 and DZero

gdansk

Diamond Member
Feb 8, 2011
4,568
7,682
136
Amazing to see that they updated the naming in the documentation 1.5 years ago. And in that time no one rethought the decision.
But they have so many core types it's a lost battle. Most consumers aren't going to care or know. They may stick to "MediaTek bad, Exynos bad, Qualcomm good".
 

soresu

Diamond Member
Dec 19, 2014
4,105
3,566
136
The naming as Adroc said is terrible.

It should at least be consistent, which it really isn't as to me Pro sounds better than Premium.

Call the Nano core Micro instead, or perhaps even Kilo, saving the smol scale names like Micro and Nano for embedded or real time only.

Then Pro, Premium and Ultra can be Mega, Giga and Tera 🤘

My naming scheme probably isn't any better for the academic peasants out there tho 😂😆
 

511

Diamond Member
Jul 12, 2024
4,523
4,144
106
The naming as Adroc said is terrible.

It should at least be consistent, which it really isn't as to me Pro sounds better than Premium.

Call the Nano core Micro instead, or perhaps even Kilo, saving the smol scale names like Micro and Nano for embedded or real time only.

Then Pro, Premium and Ultra can be Mega, Giga and Tera 🤘

My naming scheme probably isn't any better for the academic peasants out there tho 😂😆
Core name should be
Core - A520
Ultra Core - A725
Ultron pro Max Core - X930
 
  • Haha
Reactions: Tlh97 and soresu

Doug S

Diamond Member
Feb 8, 2020
3,574
6,311
136
Here I was thinking SME2 would be core level.
Pathetic, cluster level accel is a benchmark hack, nothing more.

No, you guys simply don't understand. Look at M4 annotated die photos, the SME unit is HALF the size of a P core! It wouldn't be practical to include one that size in every core. So let's say you split it up and each P core got 1/4 of the SME capability. Your ST capability for SME suffers, and your MT capability is roughly the same. I guess you'd call that a win? If so you ignore that the cost is that to fully exploit it ALL FOUR CORES are busy!

With the separate unit you can get the entire cluster's worth of SME performance with just one core, leaving the other three cores free to do other stuff. That's a clear and undeniable win.

These matmul instructions aren't something where you might slip in just a few instructions worth here and there like NEON/SSE/AVX512. It is only going to be used for longer sequences - the kind of stuff that previously was kicked off to a GPGPU (if you needed enough of it and were willing to deal with the hassle) or more often run slower using existing FP capability. There is no benefit to having SME in each core, if the cost is that each core has less of it.

The only people calling it a "benchmark hack" are people reaching and grasping at straws for excuses why AVX512 acceleration is A-OK but SME is not. They're butthurt because ARM is finally benefitting from special instructions which used to be a great way to make x86 score better on benchmarks without programs using the traditional instructions that have existed for many years benefiting nearly as much.
 

511

Diamond Member
Jul 12, 2024
4,523
4,144
106
The only people calling it a "benchmark hack" are people reaching and grasping at straws for excuses why AVX512 acceleration is A-OK but SME is not. They're butthurt because ARM is finally benefitting from special instructions which used to be a great way to make x86 score better on benchmarks without programs using the traditional instructions that have existed for many years benefiting nearly as much.
Intel AMX exists and it is per core. SME is a benchmark hack for Single Core if it's not part of the core why don't they show us the SPEC Score instead of memebench with a generic GCC -O2 with same compiler.
 

Doug S

Diamond Member
Feb 8, 2020
3,574
6,311
136
Intel AMX exists and it is per core. SME is a benchmark hack for Single Core if it's not part of the core why don't they show us the SPEC Score instead of memebench with a generic GCC -O2 with same compiler.

So if instead of making SME per cluster they had one core that was SME enabled in each cluster you'd be fine with it? Or would you say "it only counts if ALL the cores have it", because you're just going to look for any reason to not count something that helps ARM be faster than your beloved x86?

This is the reason I only care about comparisons on the clang subtest, or gcc on SPEC. Then none of the SIMD bs counts, only the same type of real instructions that a compiler is generating not hand coded sequences. If a compiler can generate AVX512 code in the clang subtest more power to it!
 

511

Diamond Member
Jul 12, 2024
4,523
4,144
106
So if instead of making SME per cluster they had one core that was SME enabled in each cluster you'd be fine with it? Or would you say "it only counts if ALL the cores have it", because you're just going to look for any reason to not count something that helps ARM be faster than your beloved x86?

This is the reason I only care about comparisons on the clang subtest, or gcc on SPEC. Then none of the SIMD bs counts, only the same type of real instructions that a compiler is generating not hand coded sequences. If a compiler can generate AVX512 code in the clang subtest more power to it!
You have to pass AVX512 or x86_64V4 falg for Clang to generate AVX-512 if you don't pass that it won't generate AVX-512. That's why I said generic O2 Compile.
And I meant either all core should support or it shouldn't count as ST.
 

hemedans

Senior member
Jan 31, 2015
276
154
116
Amazing to see that they updated the naming in the documentation 1.5 years ago. And in that time no one rethought the decision.
But they have so many core types it's a lost battle. Most consumers aren't going to care or know. They may stick to "MediaTek bad, Exynos bad, Qualcomm good".
Nowadays it's "Mediatek offer better value", "Qualcomm overpriced", "Exynos overheat"
 
  • Haha
Reactions: 511

soresu

Diamond Member
Dec 19, 2014
4,105
3,566
136
Intel AMX exists and it is per core. SME is a benchmark hack for Single Core if it's not part of the core why don't they show us the SPEC Score instead of memebench with a generic GCC -O2 with same compiler.
I'd rather it's per cluster and not wasted space on the cores when it's not in use.

It's only a benchmark hack because a certain benchmark used it as part of its generic score and doesn't include the score without it.

Now that we all know about it the benchmark is exposed as pandering at best.
 

soresu

Diamond Member
Dec 19, 2014
4,105
3,566
136
Than don't potray it as a part of Single Core performance which GB is doing.
tbh I'm not that bothered by it.

It's not like the average buyer is ever going to look that closely (or even needs as much CPU perf as they can get today), and the more discerning crowd like us know the problem exists anyway.

I'm more interested in a deep dive IPC and perf/watt testing round to see where the new cores stand, as the ARM PR slides are utterly terrible and uninformative.
 
  • Like
Reactions: 511

511

Diamond Member
Jul 12, 2024
4,523
4,144
106
m more interested in a deep dive IPC and perf/watt testing round to see where the new cores stand, as the ARM PR slides are utterly terrible and uninformative.
We got a meme slide in name of IPC and performance per watt
 

soresu

Diamond Member
Dec 19, 2014
4,105
3,566
136
Nowadays it's "Mediatek offer better value", "Qualcomm overpriced", "Exynos overheat"
The Exynos thing seems more a result of Samsung's never ending fab node woes.

The semicon design team tried to make a deal with TSMC to fab Exynos but they would rather just give Sammy the middle finger even though they are fabbing Intel who are threatening to be a better fab competitor (at least in tech if not partnership competency) than Samsung.
 

soresu

Diamond Member
Dec 19, 2014
4,105
3,566
136
We got a meme slide in name of IPC and performance per watt
Precisely, sick of this from ARM.

Eventually someone will measure it properly with benchmarks and voltmeters so there's no point in trying to obfuscate the facts with PR that doesn't functionally matter to people that actually buy ARM based products.
 
  • Like
Reactions: 511

DZero

Golden Member
Jun 20, 2024
1,625
629
96
Looks like C1-Nano is the only LITTLE core they'll release, so it replaces everything from A520 to A320 and all the A53 being used in smartwatch SoCs.
Interesting, so Nano is being pushed out of the mainstream?
Oh boy... so in some years, all the phones might end with processors with all out of order cores like Dimensity 9300?
 

mikegg

Golden Member
Jan 30, 2010
1,975
577
136
ARM trying its best to insert GenAI into every CPU media release. It's kind of annoying. No, no one is using an ARM CPU to do inference. At best, ARM is just a sidekick in an Nvidia system. Everyone is using GPUs to do inference.