Discussion ARM Cortex/Neoverse IP + SoCs (no custom cores) Discussion

Nothingness · May 29, 2024

adroc_thurston said:
Apple doesn't have SVE2.

That's not what he wrote

He said 925 has 4 NEON units (as Apple does), and 925 has 2 extra SVE2 units.

SarahKerrigan · May 29, 2024

Nothingness said:
That's not what he wrote

He said 925 has 4 NEON units (as Apple does), and 925 has 2 extra SVE2 units.

Indeed. Not sure what adroc_thurston read.

That being said - I'd expect SVE and NEON to use the same backend resources. I believe this is six full 128b pipes.

eek2121 · May 29, 2024

Uh, congrats to ARM, I guess, for almost catching up to AMD/Intel. They only needed a large process advantage to do it.

poke01 · May 29, 2024

these chips will make for a nice comparison against A18 and 8 Gen 4.

poke01 · May 29, 2024

This also confirmed that Apple’s high clocks is just not due to N3E but their design permits higher clocks. If ARM really had a up to 4.4GHz core they would be shouting thru the rooftops.

soresu · May 29, 2024

SarahKerrigan said:
I'd expect SVE and NEON to use the same backend resources

Yes - V1 has 2x 256b SVE or 4x 128b NEON.

Be curious to see if the SVE2 unit config changes back to 256b for V4.

poke01 · May 29, 2024

Mmm.

Shivansps · May 29, 2024

Nothingness said:
I think Sarah answered this. I only will add that NEON has twice the number of registers, enough for the 16 AVX2 registers. Though that leaves you with no room for temporaries which might create complications, though nothing really blocking.

SVE2 is already there. It's unrelated with vector length.

Thats matter here is that in Windows, they only emulate x64 SSE... i dont remember what version, but it dosent expose AVX at all, not even the 128bit one.

SarahKerrigan · May 29, 2024

Shivansps said:
Thats matter here is that in Windows, they only emulate x64 SSE... i dont remember what version, but it dosent expose AVX at all, not even the 128bit one.

I would not be surprised to learn that the obstacle is patent-related.

Doug S · May 29, 2024

SarahKerrigan said:
Indeed. Not sure what adroc_thurston read.

That being said - I'd expect SVE and NEON to use the same backend resources. I believe this is six full 128b pipes.

I believe the six 128b pipes, but I don't buy it is the same backend resources - I doubt it can schedule 6 NEON instructions per cycle, but rather up to six 128b in a combination of at most 4 NEON and 2 SVE128 (no SVE256 support)

soresu · May 29, 2024

Doug S said:
I believe the six 128b pipes, but I don't buy it is the same backend resources - I doubt it can schedule 6 NEON instructions per cycle, but rather up to six 128b in a combination of at most 4 NEON and 2 SVE128 (no SVE256 support)

This says 2x increase in SIMD queues - dunno what it was before but I'd wager it fits the bill of the new backend resources.

soresu · May 29, 2024

poke01 said:
View attachment 99928

Mmm.

As I expected.

It doesn't exclude the possibility of a V4 based on X925 having those features, but I wouldn't wager so much as £1 on it.

soresu · May 29, 2024

Shivansps said:
Thats matter here is that in Windows, they only emulate x64 SSE... i dont remember what version, but it dosent expose AVX at all, not even the 128bit one.

From the way the FEX-emu devs talk about AVX on their discord I'm pretty sure that they don't emulate it either.

soresu · May 29, 2024

On further investigation of FEX's github repo it looks like they have implemented some of it and are in the process of more:

Documenting difference between AVX2 gather loadstores before they're implemented · Issue #3659 · FEX-Emu/FEX

Some prototyping of AVX2 gather loadstores in CPP just to show how ugly it is between SVE-256 and SVE-128 before FEX-Emu has implemented it. CPP #include <cstddef> #include <cstdint> #include <cstr...

github.com

trivik12 · May 29, 2024

Apple should call their next core M500 to make it bigger and AMD should call their next Zen 500 as well. /s

soresu · May 29, 2024

trivik12 said:
Apple should call their next core M500 to make it bigger and AMD should call their next Zen 500 as well. /s

If that is a zing on ARM for naming X5 as X925 then it is unwarranted.

They are basically just realigning branding, something they had already done in the past when they aligned Cortex and Mali branding with A76 and G76 back in 2018.

The Immortalis branding has likewise been changed to match Cortex X so that the top most IP is named Immortalis G925.

Contrasted to AMD's recent branding changes it's positively intelligible 🤣

FlameTail · May 29, 2024

poke01 said:
these chips will make for a nice comparison against A18 and 8 Gen 4.

You mean Dimensity 9400 and Exynos 2500.

FlameTail · May 29, 2024

poke01 said:
View attachment 99928

Mmm.

So Cortex X925 has SVE but no SME.

Apple M4 has SME but no SVE.

Funny situation.

adroc_thurston · May 29, 2024

FlameTail said:
Apple M4 has SME but no SVE.

Donan-P has sSVE which kinda can be a slow-ish substitute for SVE actual.

FlameTail · May 29, 2024

Why are new 2024 smartphone SoCs using the 4 year old Cortex A78.

https://www.mediatek.com/products/smartphones-2/mediatek-dimensity-7300x

ikjadoon · May 29, 2024

FlameTail said:
Why are new 2024 smartphone SoCs using the 4 year old Cortex A78.

https://www.mediatek.com/products/smartphones-2/mediatek-dimensity-7300x

Some combination of price, performance, area, efficiency. Arm Ltd. confirmed ARMv9 royalties are pricier vs ARMv8 & the A78 is the fastest ARMv8 core excluding the notably larger & pricier Cortex-X1.

Unfortunate for 1T perf & this trend stays alive.

That would likely explain why MediaTek aren't using a faster / newer A7xx core (all ARMv9). Yet, I still wish Arm Ltd. (and licensees) would be interested in cheaper X1, X2, X3, etc cores for mid-range SoCs.

// the bigger picture

Smartphones using the Dimensity 7xxx SoCs definitely do not have fat margins; smartphone manufacturers likely want to spend the extra $$ from a faster 1T CPU on better screens, better cameras, more features, etc.

Versus "this website loaded somewhat faster when I have great internet" or "these apps are super snappy".

A lot of negotiations, middlemen / margins, guesses about the market, etc.

Smartphone consumers -> smartphone manufacturers <-> SoC OEMs <-> CPU architects

//

Unrelatedly: I can't pretend to understand MediaTek's model numbering: the 7200 uses 2x A715 @ 2.8 GHz + 6x A510, meanwhile the 7300 uses 4x A78 @ 2.5 GHz + 4x A55. But, right, these are only the CPUs: I've not examined the complete SoC differences.

FlameTail · May 30, 2024

Mediatek isn't the only culprit. Samsung is still using A78 cores for their midrange SoCs.

Exynos 1280 : 2×A78 + 6×A55
Exynos 1380 : 4×A78 + 4×A55
Exynos 1480 : 4×A78 + 4×A55

3 generations of SoCs using the same cores. I am beyond outraged.

I would be somewhat sated if they had put an X1 in the 1480. But they didn't. The poor ST performance of the A78 really hurts the experience in the midrange Galaxy phones, becuase Samsung's OneUI android skin is really heavy and needs strong ST performance to have a smooth experience.

FlameTail · May 30, 2024

Are there any Geekbench 6 subtests that benefit from SVE2, like object detection does with SME?

SpudLobby · May 30, 2024

SarahKerrigan said:
So I'm thinking that the Client CSS stuff we're seeing now is basically what Qualcomm was talking about in their lawsuit a couple of years ago with the whole "you'll have to bundle Cortex with Mali" complaint. Going to be interesting to see what that ends up looking like in practice - I have a hard time believing that Nvidia and Samsung (and, for that matter, Renesas) will just be forced onto Mali for their lineups.

Yes, I agree. I think it was an exaggeration. But it is kind of interesting, where they’re going with CSS.

SarahKerrigan said:
Indeed. Not sure what adroc_thurston read.

That being said - I'd expect SVE and NEON to use the same backend resources. I believe this is six full 128b pipes.

Yes exactly. I don’t see why it would be SVE or NEON only. Should just be SVE x 6.

ikjadoon · May 30, 2024

On a different topic, Arm's hesitancy to put absolute numbers on its charts is not inspiring, but if we want to do some pixel peeping:

With the helpful note from @SarahKerrigan that "2023 Best-in-Class @ 3.8GHz" is likely referring to the late 2023 Apple A17 Pro (P-cores @ 3.78 GHz), as it lines up quite closely:

Each tick mark appears to be 10%. And, it seems to align up: the A17 Pro is ~28% faster in 1T than the 8G3 for Galaxy, which is close to the chart's ~26% faster via the rough bar width.

From @uzzi38's earlier image, I've added it, too, as 1.15x multipliers if "ISO" seemingly means frequency only.

Predicted 1T GB6.2 scores & "IPC" are in bold and were calculated from the bar widths. Thus, for the "Cortex-X925 @ 3.8 GHz" score, it's the X4 2287 base score * 1.33 bar width = 3041.71, rounded to 3042.

Arm Marketing Name	Rough Bar Width	Possible SoC	Clock	GB6.2 1T	GB6.2 1T Pts / GHz "IPC"	"IPC" Relative
2023 Premium Android	1.0	QC SD8G3 for Galaxy	3.39 GHz	2287	674.6	100.0%
2023 Best-in-Class @ 3.8 GHz	1.26x	Apple A17 Pro	3.78 GHz	2930	775.1	114.9%
Cortex-X925 @ 3.8 GHz	1.33x	??	3.80 GHz	3042	800.5	118.7%
Cortex-X925 @ 38 GHz (+ sw & sys "optimizations")	1.36x	??	3.80 GHz	3110	818.5	121.3%
X925 vs X4 "ISO"	1.15x	??	3.60 GHz	2793	775.8	115.0%
X925 vs X4 "ISO"	1.15x	??	3.80 GHz	2948	775.8	115.0%
n/a	n/a	Apple M4	4.38 GHz	3715	848.2	125.7%

Sources:

2023 Premium Android: Snapdragon 8G3 for Galaxy (X4 @ 3.39 GHz) = 1.0x
2023 Best-in-Class @ 3.8 GHz: Apple iPhone 15 Pro Max (P-core at 3.78 GHz)
Apple M4 GB6.2 run

//

So is X925 a GB6.2 1T Perf / GHz gain of +15% or +19% over X4? I can't argue that my pixel counting nor rounding are very precise.

Thus a range of GB6.2 1T runs, assuming +15% Pts / Ghz (2nd chart estimate from "ISO" comparisons) to +18.7% Pts / Ghz (1st chart estimates from "1.33x").

Hypothetical 3.6 GHz X925 core: ~2793 → ~2883
Hypothetical 3.8 GHz X925 core: ~2948 → ~3042

I'd rather not consider the "optimized" estimate (+21.3% IPC gain) as it's unclear what those optimizations actually are and how they would affect the Cortex-X4.

All this, when Arm could just label their silly charts. I might have some typos to fix tomorrow, too.

Discussion ARM Cortex/Neoverse IP + SoCs (no custom cores) Discussion

Diamond Member

Senior member

Diamond Member

Attachments

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Diamond Member

Golden Member

Senior member