Discussion ARM Cortex/Neoverse IP + SoCs (no custom cores) Discussion

gdansk · Sep 25, 2024

Barring one exception (Apple) it seems the x86/ARM SoCs end up pretty similar to each other overall.

There is a notable advantage on 1T power draw for Qualcomm. But in MT that disappears when even more "terribly inefficient" x86 decoders are active, with SMT (Strix) or without SMT (LNL). So how much of that 1T efficiency advantage is due to their mobile phone expertise and power efficient cache hierarchy rather than ISA? I have no way of knowing but that the efficiency advantage disappears in MT suggests it isn't the ISA itself at fault but something else.

GTracing · Sep 25, 2024

FlameTail said:
Does ISA matter after all?

An interesting trail of tweets;
View attachment 108206

I'm not sure if this is the best thread for this discussion, but the way I understand it is that, practically speaking, ISA didn't matter up until recently. With the new problem of decoding variable length instructions, it matters more and more. Nowadays it's not a question of "if", but of "how much". Multiple decoder designs like Zen5 and Skymont might solve this problem.

Nothingness · Sep 25, 2024

My take on this: at the lower-end, where performance is not a primary target, ISA matters as it helps using less silicon. At the higher-end, ISA still has impacts, but there are many techniques, that can potentially cost a lot in silicon area, to circumvent ISA shortcomings.

Doug S · Sep 25, 2024

Nothingness said:
My take on this: at the lower-end, where performance is not a primary target, ISA matters as it helps using less silicon. At the higher-end, ISA still has impacts, but there are many techniques, that can potentially cost a lot in silicon area, to circumvent ISA shortcomings.

I'm not even sure its true at the lower end (speaking in terms of E core as lower end, not embedded stuff)

The only real difference is an extra pipeline stage or two because of the more complex decode, but there are plenty of ways to hide those delays - everyone has a pretty long pipeline these days because everyone is clocking pretty high.

I've long said the people who think ARM's fixed instruction length confers a big advantage were wrong. People were blinded by Apple's IPC, but x86 designs have always been more focused on clock rate. Now that Apple and the rest of the ARM crowd have taken their clock rates closer and closer to x86 territory, the IPC advantage is going to shrink. Because the higher your clock rate the further away all levels of cache and main memory are from your CPU in clock cycles. That's the latency that really hurts, not the extra cycles to decode x86's ridiculous instruction encoding.

LightningZ71 · Sep 25, 2024

One other thing that I've seen profiled in the past when comparing the different ISAs is the number of "ops" per instruction. From what I've seen, x86 (and it's various extensions) TENDS to generate a notable percentage more "ops" per instruction than the latest versions of ARM. So, while there are often a cycle or more of delay when decoding x86 instructions as compared to ARM, it tends to generate more ops in the balance. The two ISAs tend to be relatively equivalent in actual operation throughput, with x86 paying a slight penalty in circuitry (including the related bit of heat and power draw) to handle decoding variable length instructions.

Doug S · Sep 25, 2024

LightningZ71 said:
One other thing that I've seen profiled in the past when comparing the different ISAs is the number of "ops" per instruction. From what I've seen, x86 (and it's various extensions) TENDS to generate a notable percentage more "ops" per instruction than the latest versions of ARM. So, while there are often a cycle or more of delay when decoding x86 instructions as compared to ARM, it tends to generate more ops in the balance. The two ISAs tend to be relatively equivalent in actual operation throughput, with x86 paying a slight penalty in circuitry (including the related bit of heat and power draw) to handle decoding variable length instructions.

"ops" per instruction is just a made up thing on top of the made up thing that is IPC.

If you want to measure two ISAs against each other start with the same source code, compile it, and see which finishes faster. That's the only test that matters.

jdubs03 · Sep 25, 2024

Doug S said:
…on top of the made up thing that is IPC.

And yet every vendor emphasizes IPC! It must mean something right?

MS_AT · Sep 25, 2024

Doug S said:
"ops" per instruction is just a made up thing on top of the made up thing that is IPC.

If you want to measure two ISAs against each other start with the same source code, compile it, and see which finishes faster. That's the only test that matters.

Your proposal will determine which microarchotecture is better not which ISA is better. To verify the latter you would need to have two identical CPUs that would differ only in ISA dependant parts. Such CPU doesn't exist yet.

Doug S · Sep 25, 2024

MS_AT said:
Your proposal will determine which microarchotecture is better not which ISA is better. To verify the latter you would need to have two identical CPUs that would differ only in ISA dependant parts. Such CPU doesn't exist yet.

Because the question of "which ISA is better" is ridiculous. Implementation trumps ISA and always has.

oba_rw · Sep 25, 2024

Doug S said:
Because the question of "which ISA is better" is ridiculous. Implementation trumps ISA and always has.

Then how will you explain these graphs, especially the M3? Being juiced out that much while still being more than twice as efficient as the most efficient x86 core.

soresu · Sep 26, 2024

Guys this thread is for discussions about ARM Ltd CPU (Cortex/Neoverse) IP.

Not Apple or Qualcomm - there are threads already for both of those.

Mahboi · Sep 26, 2024

FlameTail said:
Does ISA matter after all?

Everything "matters", it's more about people yapping endlessly about one piece of the stack that isn't that significant that got this debate marked red.

ikjadoon · Sep 27, 2024

A PR release about X925 this week.

Mediatek / Samsung launch perhaps soon?

The Ultimate CPU: Arm Cortex-X925’s Breakthrough with a 15 Percent IPC Improvement

The Arm Cortex-X925 CPU delivers a 15% IPC improvement and enhanced efficiency, ideal for a wide range of various applications.

newsroom.arm.com

FlameTail · Sep 27, 2024

ikjadoon said:
A PR release about X925 this week.

Mediatek / Samsung launch perhaps soon?

The Ultimate CPU: Arm Cortex-X925’s Breakthrough with a 15 Percent IPC Improvement

The Arm Cortex-X925 CPU delivers a 15% IPC improvement and enhanced efficiency, ideal for a wide range of various applications.

newsroom.arm.com

Mediatek Dimensity 9400 with X925 is slated to be announced on October 9th.

soresu · Sep 27, 2024

ikjadoon said:
A PR release about X925 this week.

Mediatek / Samsung launch perhaps soon?

The Ultimate CPU: Arm Cortex-X925’s Breakthrough with a 15 Percent IPC Improvement

The Arm Cortex-X925 CPU delivers a 15% IPC improvement and enhanced efficiency, ideal for a wide range of various applications.

newsroom.arm.com

I'd assume something is incoming as it seems rather strange to go on about it 4 months after the initial 2024 IP announcements.

soresu · Sep 27, 2024

There's a linked article about media compute with SVE2...

Accelerating Video Decode And Image Processing With Armv9 CPUs And SVE2

This blog post explores three video and image use cases demonstrating the proven impact of the Armv9 CPU architectural features.

community.arm.com

FlameTail · Sep 27, 2024

If ARM sticks to this +15% IPC CAGR for the next few years, they will soon outstrip Intel/AMD in absolute performance and may even catch up to Apple.

gdansk · Sep 27, 2024

They have been consistent and it's good trajectory.
I do worry about the pricing rumors, though.

MS_AT · Sep 27, 2024

It is interesting how much they can continue to extract ILP, I guess at some point dependency chains should start limiting them and they will need to push CLK. I wonder if there are any studies on the subject.

soresu · Sep 27, 2024

FlameTail said:
If ARM sticks to this +15% IPC CAGR for the next few years, they will soon outstrip Intel/AMD in absolute performance and may even catch up to Apple.

Would be nice, but I doubt they can keep up that kind of YOY improvement, unless their roadmap is very aggressive.

Either way X925 is already better than Apple in SIMD perf with its 6x NEON units.

SME as yet not having much in the way of implemented code out there it won't matter for a while, so NEON perf will continue to be the yardstick for everything but GeekBench until the software market catches up.

Even in scalar perf X925 is still pretty competitive in raw int IPC, if not in perf/watt.

I doubt that ARM Ltd cores will ever match Intel/AMD for SIMD perf tho - I just don't think that is their aim, though I could see some future Fujitsu ARM core punching at that target.

DZero · Sep 27, 2024

As long can catch Intel and AMD in brute force, x86 would be royally screwed.

gdansk · Sep 27, 2024

soresu said:
Would be nice, but I doubt they can keep up that kind of YOY improvement

Even 10% would be enough. Intel abandoned their aggressive plans and AMD hasn't been on schedule for 4 years now.

FlameTail · Sep 27, 2024

soresu said:
Would be nice, but I doubt they can keep up that kind of YOY improvement, unless their roadmap is very aggressive

I have thought about that. Remember when ARM CEO- Rene Haas claimed that ARM will hit 50% marketshare in PCs 5 years from now?

https://www.reuters.com/technology/arm-aims-capture-50-pc-market-five-years-ceo-says-2024-06-03/

It's an unbelievable statement. Why did he say that? 2 possibilities;

(1) For the laughs

(2) He knows ARM's Cortex cores have an aggressive roadmap for the next 5 years, which means 50% could atleast be in the realms of possibilty.

soresu · Sep 27, 2024

FlameTail said:
(2) He knows ARM's Cortex cores have an aggressive roadmap for the next 5 years, which means 50% could atleast be in the realms of possibilty.

Realistically that improved IPC isn't even necessary for that.

Current X925 IPC plus some serious improvements to perf/watt for longer battery life would be plenty for consumers if WoA drastically improves their native software library.

By that I absolute think that getting native ARM binary games need to be aggressively addressed to increase market share on top of the important proprietary DCC packages.

DZero · Sep 27, 2024

soresu said:
Realistically that improved IPC isn't even necessary for that.

Current X925 IPC plus some serious improvements to perf/watt for longer battery life would be plenty for consumers if WoA drastically improves their native software library.

By that I absolute think that getting native ARM binary games need to be aggressively addressed to increase market share on top of the important proprietary DCC packages.

And here comes Hoyoverse to start the trend.

Discussion ARM Cortex/Neoverse IP + SoCs (no custom cores) Discussion

Diamond Member

Senior member

Diamond Member

Diamond Member

Platinum Member

Diamond Member

Golden Member

Senior member

Diamond Member

Junior Member

Diamond Member

Golden Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Golden Member