Discussion ARM Cortex/Neoverse IP + SoCs (no custom cores) Discussion

soresu · Sep 10, 2025

mikegg said:
Everyone is using GPUs to do inference

On mobile it's more likely to be random NPU xyz.

GPUs might be decent for inference at large/datacenter scale, but for edge/mobile use it's probably going to be domain specific accelerators.

Hexagon has kinda evolved into that for Snapdragon.

DZero · Sep 10, 2025

The GPU rebranding is screwed up...
The first Mali G1 family has Ultra (10+ cores), Premium (6–9 cores), and Pro (1–5 cores) versions.

The 1-2 cores should be renamed Nano since Pro is too much for such small ammount of cores.

Doug S · Sep 10, 2025

511 said:
You have to pass AVX512 or x86_64V4 falg for Clang to generate AVX-512 if you don't pass that it won't generate AVX-512. That's why I said generic O2 Compile.
And I meant either all core should support or it shouldn't count as ST.

I notice you never responded to my point about splitting up the SME block into 1/4s so each core gets some at the cost of all four cores being busy if you want to max out your SME throughput. Leaving three cores free for other work is a clear win for ACTUAL performance. Isn't that the ultimate goal here, not pretty numbers in GB?

What would you think if AMD came out with a CPU that had some big cores that could handle four AVX512 instructions per cycle, and then some smaller cores that could handle only one AVX512 instruction per cycle? Is using the big core's better AVX512 support for ST benchmarks fair, when other cores have less resources? What if they had some cores that didn't have AVX512 units, but the CPU could automatically move threads from a little core to a big core if AVX512 instructions are encountered so it would be transparent to software? Trying to figure out exactly where you line in the sand is, just in case we see Intel or AMD produce something in the future that violates it to determine whether this is all hypocritical BS that doesn't apply when x86 does it.

adroc_thurston · Sep 10, 2025

Doug S said:
What would you think if AMD came out with a CPU that had some big cores that could handle four AVX512 instructions per cycle, and then some smaller cores that could handle only one AVX512 instruction per cycle

well they did, mobile Z5 has half-rate SIMD.

MS_AT · Sep 10, 2025

Isn't this

Doug S said:
The only people calling it a "benchmark hack" are people reaching and grasping at straws for excuses why AVX512 acceleration is A-OK but SME is not. They're butthurt because ARM is finally benefitting from special instructions which used to be a great way to make x86 score better on benchmarks without programs using the traditional instructions that have existed for many years benefiting nearly as much.

somewhat contradicted by this

Doug S said:
These matmul instructions aren't something where you might slip in just a few instructions worth here and there like NEON/SSE/AVX512.

I mean you note that AVX/SVE2/Neon has somewhat better utility. Anyway I don't mind Apple has SME or whatever they pay for it with silicon and their customers pay in turn. It would be nice if we did not need extensions to geekbench to see which subtest is using which instructions sets.

Doug S said:
Then none of the SIMD bs counts, only the same type of real instructions that a compiler is generating not hand coded sequences. If a compiler can generate AVX512 code in the clang subtest more power to it!

So as long as compiler generates SIMD it's ok?

Problem with it is It depends on compiler code gen quality and programmers understanding of the language they are writing in if the compiler will or will not emit SIMD code. For example Clang and GCC will do so more often than MSVC. What kind of ops will be emitted depends then on compiler options. That is why SPEC is as much hw benchmark as a compiler benchmark.

While compilers themselves are seldom using SIMD on their own, this is slowly changing too, so might be in few years they will be "infected" with SIMD too.

poke01 · Sep 10, 2025

can we just agree to not compare SME clustered enabled SOC to CPUs/SoCs that don't have them?

Its not that hard. There doesn't even need to be a deep discussion about this.

Want to compare with x86, use GB5 if the total GB6.3 score upsets people

gdansk · Sep 10, 2025

poke01 said:
can we just agree to not compare SME clustered enabled SOC to CPUs/SoCs that don't have them?

Its not that hard. There doesn't even need to be a deep discussion about this.

Want to compare with x86, use GB5 if the total GB6.3 score upsets people

It doesn't matter. Every video or article will have GB6 aggregates. From YouTube, Bilibili, to Tom's Hardware. And most will not note the caveats and those that do will be taken out and extrapolated from anyway.

It's safe to say at this point GB6 poisoned cross platform scores. If it wasn't popular it wouldn't matter. I was initially pretty happy with its multi-thread score because it would put phone to phone comparisons on a better basis. But then I saw how people used the charts in their online arguments and it was not as I imagined.

hemedans · Sep 10, 2025

soresu said:
The Exynos thing seems more a result of Samsung's never ending fab node woes.

The semicon design team tried to make a deal with TSMC to fab Exynos but they would rather just give Sammy the middle finger even though they are fabbing Intel who are threatening to be a better fab competitor (at least in tech if not partnership competency) than Samsung.

4lpp+ is fine, soc like Exynos E2400, 1480 and 1580 are as good as any soc made by Tsmc 4nm.

adroc_thurston · Sep 10, 2025

hemedans said:
4lpp+ is fine, soc like Exynos E2400, 1480 and 1580 are as good as any soc made by Tsmc 4nm.

They're worse but not that much so.

jpiniero · Sep 10, 2025

mikegg said:
ARM trying its best to insert GenAI into every CPU media release. It's kind of annoying. No, no one is using an ARM CPU to do inference. At best, ARM is just a sidekick in an Nvidia system. Everyone is using GPUs to do inference.

I mean every company is trying to insert AI into every PR.

Well, until the Hype ends.

NostaSeronx · Sep 10, 2025

Did anyone notice?
https://developer.arm.com/Processors/C1-Nano

Pipeline

Out of Order

Cortex-A520 Product Support

Get help with your questions about the Cortex-A520 with our documentation, downloads, training videos, and product support content and services.

developer.arm.com

Pipeline

In-order

Or is this another mistake again?

---
The key features of C1-Nano Core are:
In-order pipeline with direct and indirect branch prediction.
--- Arm® C1-Nano Core Software Optimization Guide

Someone get the ARM guys, they are doing the thing like last time again!

SteinFG · Sep 11, 2025

NostaSeronx said:
Did anyone notice?

Pipeline Out of Order

Oh, their tiny cores are OOO now! took them long enough.

511 · Sep 11, 2025

Finally it's time kill In Order Cores

MS_AT · Sep 11, 2025

NostaSeronx said:
Or is this another mistake again?

Mistake most likely. Other PDF docs beside optimization guide are speaking of in order core.

Panino Manino · Sep 11, 2025

MS_AT said:
Mistake most likely. Other PDF docs beside optimization guide are speaking of in order core.

Oh no, please no, why ARM?
Isn't even Huawei already making out of order little cores? How they justify this?
Are we cursed to another 5 years of affordable phones with in-order cores?

511 · Sep 11, 2025

Even atoms are not in order

MS_AT · Sep 11, 2025

511 said:
Even atoms are not in order

Skymont is a huge core. It's only called small because it's smaller than Lion Cove

soresu · Sep 11, 2025

MS_AT said:
Skymont is a huge core. It's only called small because it's smaller than Lion Cove

Waiting patiently for the first slice or reduced out of order µArch CPU cores 🙏

I wonder if they have already been used by a big name and simply not declared 🤔

511 · Sep 11, 2025

MS_AT said:
Skymont is a huge core. It's only called small because it's smaller than Lion Cove

It's 1.1 mm2 N3B in similar size to A720/A725

Covfefe · Sep 11, 2025

Panino Manino said:
Oh no, please no, why ARM?
Isn't even Huawei already making out of order little cores? How they justify this?
Are we cursed to another 5 years of affordable phones with in-order cores?

Entry level phone SOCs have middle cores nowadays.

I don't see in-order as a problem for the C1-nano. It's designed to be smaller and have lower power draw than is possible with an out-of-order design.

Doug S · Sep 11, 2025

Covfefe said:
Entry level phone SOCs have middle cores nowadays.

I don't see in-order as a problem for the C1-nano. It's designed to be smaller and have lower power draw than is possible with an out-of-order design.

Yeah I don't know why people are bugging over this. ARM has a broad line of C1 cores, it makes sense to have one at the bottom that prioritizes size and power over performance, for use cases where you have the same priorities. If Android OEMs choose to use that as their "little core", your beef is with that OEM not with ARM!

Apple has a bunch of tiny cores embedded in its SoCs to perform various duties like running the Secure Element's L4 microkernel, various tasks related to I/O and image/video processing and so forth. There would be no need to increase their size/power/complexity by making them OOO if the extra performance delivered by OOO isn't useful for the task. I don't know if Apple designed their own cores for these or uses ARM designed cores, but if its the latter they might have C1 Nano somewhere in a future SoC. If they do design their own I have no doubt they have at least one, perhaps more than one, that's in-order. The advantage of designing your own would be that you could strip out stuff that a conforming ARM CPU requires that you don't need in a particular core, like NEON or floating point.

DZero · Sep 11, 2025

Doug S said:
Yeah I don't know why people are bugging over this. ARM has a broad line of C1 cores, it makes sense to have one at the bottom that prioritizes size and power over performance, for use cases where you have the same priorities. If Android OEMs choose to use that as their "little core", your beef is with that OEM not with ARM!

Apple has a bunch of tiny cores embedded in its SoCs to perform various duties like running the Secure Element's L4 microkernel, various tasks related to I/O and image/video processing and so forth. There would be no need to increase their size/power/complexity by making them OOO if the extra performance delivered by OOO isn't useful for the task. I don't know if Apple designed their own cores for these or uses ARM designed cores, but if its the latter they might have C1 Nano somewhere in a future SoC. If they do design their own I have no doubt they have at least one, perhaps more than one, that's in-order. The advantage of designing your own would be that you could strip out stuff that a conforming ARM CPU requires that you don't need in a particular core, like NEON or floating point.

Apple had OoO cores on their custom ARM processors, even the Watch ones.
Only the ones used for the Airpods and small accesories are licenced from ARM and others. But those are secondary ones and just are the ones which has one duty, they are not multi taskers.

Panino Manino said:
Oh no, please no, why ARM?
Isn't even Huawei already making out of order little cores? How they justify this?
Are we cursed to another 5 years of affordable phones with in-order cores?

View attachment 129999

The issue is not the in order core used for just one work (like the security ones), is the ones used for multitask. The current programs are not friendly with in order cores (you notice the INSANE lag with the Helio G35 processors). That's why there are Lite versions and even those ones which used 2 mid cores and 6 small ones (like the UNISOCs) are going with Android Go instead.

DZero · Sep 12, 2025

MS_AT said:
Skymont is a huge core. It's only called small because it's smaller than Lion Cove

Also... Skymont and all the Atoms since Silvermont are ALL out of Order

Geddagod · Sep 15, 2025

ARM my perf/mm2 GOATs lol

jdubs03 · Sep 16, 2025

Geddagod said:
View attachment 130330
ARM my perf/mm2 GOATs lol

Not sure why the X925 bar isn’t just y=0, and then just have normal scaling for the competitor.
Would be curious how it’ll be this year.

Discussion ARM Cortex/Neoverse IP + SoCs (no custom cores) Discussion

Diamond Member

Golden Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Diamond Member

Senior member

Diamond Member

Lifer

Diamond Member

Senior member

Diamond Member

Senior member

Golden Member

Diamond Member

Senior member

Diamond Member

Diamond Member

Member

Diamond Member

Golden Member

Golden Member

Golden Member

Golden Member