Discussion ARM Cortex/Neoverse IP + SoCs (no custom cores) Discussion

Page 59 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

soresu

Diamond Member
Dec 19, 2014
4,037
3,500
136
Everyone is using GPUs to do inference
On mobile it's more likely to be random NPU xyz.

GPUs might be decent for inference at large/datacenter scale, but for edge/mobile use it's probably going to be domain specific accelerators.

Hexagon has kinda evolved into that for Snapdragon.
 

DZero

Golden Member
Jun 20, 2024
1,539
579
96
The GPU rebranding is screwed up...
The first Mali G1 family has Ultra (10+ cores), Premium (6–9 cores), and Pro (1–5 cores) versions.

The 1-2 cores should be renamed Nano since Pro is too much for such small ammount of cores.
 
  • Like
Reactions: soresu

Doug S

Diamond Member
Feb 8, 2020
3,477
6,178
136
You have to pass AVX512 or x86_64V4 falg for Clang to generate AVX-512 if you don't pass that it won't generate AVX-512. That's why I said generic O2 Compile.
And I meant either all core should support or it shouldn't count as ST.

I notice you never responded to my point about splitting up the SME block into 1/4s so each core gets some at the cost of all four cores being busy if you want to max out your SME throughput. Leaving three cores free for other work is a clear win for ACTUAL performance. Isn't that the ultimate goal here, not pretty numbers in GB?

What would you think if AMD came out with a CPU that had some big cores that could handle four AVX512 instructions per cycle, and then some smaller cores that could handle only one AVX512 instruction per cycle? Is using the big core's better AVX512 support for ST benchmarks fair, when other cores have less resources? What if they had some cores that didn't have AVX512 units, but the CPU could automatically move threads from a little core to a big core if AVX512 instructions are encountered so it would be transparent to software? Trying to figure out exactly where you line in the sand is, just in case we see Intel or AMD produce something in the future that violates it to determine whether this is all hypocritical BS that doesn't apply when x86 does it.
 

MS_AT

Senior member
Jul 15, 2024
843
1,703
96
Isn't this
The only people calling it a "benchmark hack" are people reaching and grasping at straws for excuses why AVX512 acceleration is A-OK but SME is not. They're butthurt because ARM is finally benefitting from special instructions which used to be a great way to make x86 score better on benchmarks without programs using the traditional instructions that have existed for many years benefiting nearly as much.
somewhat contradicted by this
These matmul instructions aren't something where you might slip in just a few instructions worth here and there like NEON/SSE/AVX512.
I mean you note that AVX/SVE2/Neon has somewhat better utility. Anyway I don't mind Apple has SME or whatever they pay for it with silicon and their customers pay in turn. It would be nice if we did not need extensions to geekbench to see which subtest is using which instructions sets.
Then none of the SIMD bs counts, only the same type of real instructions that a compiler is generating not hand coded sequences. If a compiler can generate AVX512 code in the clang subtest more power to it!
So as long as compiler generates SIMD it's ok?:) Problem with it is It depends on compiler code gen quality and programmers understanding of the language they are writing in if the compiler will or will not emit SIMD code. For example Clang and GCC will do so more often than MSVC. What kind of ops will be emitted depends then on compiler options. That is why SPEC is as much hw benchmark as a compiler benchmark.

While compilers themselves are seldom using SIMD on their own, this is slowly changing too, so might be in few years they will be "infected" with SIMD too.
 
  • Like
Reactions: Tlh97 and 511

poke01

Diamond Member
Mar 8, 2022
4,077
5,399
106
can we just agree to not compare SME clustered enabled SOC to CPUs/SoCs that don't have them?

Its not that hard. There doesn't even need to be a deep discussion about this.

Want to compare with x86, use GB5 if the total GB6.3 score upsets people
 

gdansk

Diamond Member
Feb 8, 2011
4,463
7,528
136
can we just agree to not compare SME clustered enabled SOC to CPUs/SoCs that don't have them?

Its not that hard. There doesn't even need to be a deep discussion about this.

Want to compare with x86, use GB5 if the total GB6.3 score upsets people
It doesn't matter. Every video or article will have GB6 aggregates. From YouTube, Bilibili, to Tom's Hardware. And most will not note the caveats and those that do will be taken out and extrapolated from anyway.

It's safe to say at this point GB6 poisoned cross platform scores. If it wasn't popular it wouldn't matter. I was initially pretty happy with its multi-thread score because it would put phone to phone comparisons on a better basis. But then I saw how people used the charts in their online arguments and it was not as I imagined.
 

hemedans

Senior member
Jan 31, 2015
261
148
116
The Exynos thing seems more a result of Samsung's never ending fab node woes.

The semicon design team tried to make a deal with TSMC to fab Exynos but they would rather just give Sammy the middle finger even though they are fabbing Intel who are threatening to be a better fab competitor (at least in tech if not partnership competency) than Samsung.
4lpp+ is fine, soc like Exynos E2400, 1480 and 1580 are as good as any soc made by Tsmc 4nm.
 

jpiniero

Lifer
Oct 1, 2010
16,720
7,181
136
ARM trying its best to insert GenAI into every CPU media release. It's kind of annoying. No, no one is using an ARM CPU to do inference. At best, ARM is just a sidekick in an Nvidia system. Everyone is using GPUs to do inference.

I mean every company is trying to insert AI into every PR.

Well, until the Hype ends.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,811
1,289
136
Did anyone notice?
https://developer.arm.com/Processors/C1-Nano
PipelineOut of Order
PipelineIn-order

Or is this another mistake again?

---
The key features of C1-Nano Core are:
In-order pipeline with direct and indirect branch prediction.
--- Arm® C1-Nano Core Software Optimization Guide

Someone get the ARM guys, they are doing the thing like last time again!
 
Last edited:

Panino Manino

Golden Member
Jan 28, 2017
1,136
1,377
136
Mistake most likely. Other PDF docs beside optimization guide are speaking of in order core.

Oh no, please no, why ARM?
Isn't even Huawei already making out of order little cores? How they justify this?
Are we cursed to another 5 years of affordable phones with in-order cores?


c7d738d819fac12bef3979a7c0ad2403.gif
 

soresu

Diamond Member
Dec 19, 2014
4,037
3,500
136
Skymont is a huge core. It's only called small because it's smaller than Lion Cove;)
Waiting patiently for the first slice or reduced out of order µArch CPU cores 🙏

I wonder if they have already been used by a big name and simply not declared 🤔
 

Covfefe

Member
Jul 23, 2025
36
49
46
Oh no, please no, why ARM?
Isn't even Huawei already making out of order little cores? How they justify this?
Are we cursed to another 5 years of affordable phones with in-order cores?
Entry level phone SOCs have middle cores nowadays.

I don't see in-order as a problem for the C1-nano. It's designed to be smaller and have lower power draw than is possible with an out-of-order design.
 
  • Like
Reactions: LightningDust

Doug S

Diamond Member
Feb 8, 2020
3,477
6,178
136
Entry level phone SOCs have middle cores nowadays.

I don't see in-order as a problem for the C1-nano. It's designed to be smaller and have lower power draw than is possible with an out-of-order design.

Yeah I don't know why people are bugging over this. ARM has a broad line of C1 cores, it makes sense to have one at the bottom that prioritizes size and power over performance, for use cases where you have the same priorities. If Android OEMs choose to use that as their "little core", your beef is with that OEM not with ARM!

Apple has a bunch of tiny cores embedded in its SoCs to perform various duties like running the Secure Element's L4 microkernel, various tasks related to I/O and image/video processing and so forth. There would be no need to increase their size/power/complexity by making them OOO if the extra performance delivered by OOO isn't useful for the task. I don't know if Apple designed their own cores for these or uses ARM designed cores, but if its the latter they might have C1 Nano somewhere in a future SoC. If they do design their own I have no doubt they have at least one, perhaps more than one, that's in-order. The advantage of designing your own would be that you could strip out stuff that a conforming ARM CPU requires that you don't need in a particular core, like NEON or floating point.
 
  • Like
Reactions: Tlh97 and gdansk

DZero

Golden Member
Jun 20, 2024
1,539
579
96
Yeah I don't know why people are bugging over this. ARM has a broad line of C1 cores, it makes sense to have one at the bottom that prioritizes size and power over performance, for use cases where you have the same priorities. If Android OEMs choose to use that as their "little core", your beef is with that OEM not with ARM!

Apple has a bunch of tiny cores embedded in its SoCs to perform various duties like running the Secure Element's L4 microkernel, various tasks related to I/O and image/video processing and so forth. There would be no need to increase their size/power/complexity by making them OOO if the extra performance delivered by OOO isn't useful for the task. I don't know if Apple designed their own cores for these or uses ARM designed cores, but if its the latter they might have C1 Nano somewhere in a future SoC. If they do design their own I have no doubt they have at least one, perhaps more than one, that's in-order. The advantage of designing your own would be that you could strip out stuff that a conforming ARM CPU requires that you don't need in a particular core, like NEON or floating point.
Apple had OoO cores on their custom ARM processors, even the Watch ones.
Only the ones used for the Airpods and small accesories are licenced from ARM and others. But those are secondary ones and just are the ones which has one duty, they are not multi taskers.

Oh no, please no, why ARM?
Isn't even Huawei already making out of order little cores? How they justify this?
Are we cursed to another 5 years of affordable phones with in-order cores?


View attachment 129999
The issue is not the in order core used for just one work (like the security ones), is the ones used for multitask. The current programs are not friendly with in order cores (you notice the INSANE lag with the Helio G35 processors). That's why there are Lite versions and even those ones which used 2 mid cores and 6 small ones (like the UNISOCs) are going with Android Go instead.
 
  • Like
Reactions: Tlh97