Discussion ARM Cortex/Neoverse IP + SoCs (no custom cores) Discussion

Page 39 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

gdansk

Diamond Member
Feb 8, 2011
4,209
7,068
136
Barring one exception (Apple) it seems the x86/ARM SoCs end up pretty similar to each other overall.

There is a notable advantage on 1T power draw for Qualcomm. But in MT that disappears when even more "terribly inefficient" x86 decoders are active, with SMT (Strix) or without SMT (LNL). So how much of that 1T efficiency advantage is due to their mobile phone expertise and power efficient cache hierarchy rather than ISA? I have no way of knowing but that the efficiency advantage disappears in MT suggests it isn't the ISA itself at fault but something else.
 
  • Like
Reactions: Tlh97

GTracing

Senior member
Aug 6, 2021
478
1,112
106
Does ISA matter after all?

An interesting trail of tweets;
View attachment 108206
I'm not sure if this is the best thread for this discussion, but the way I understand it is that, practically speaking, ISA didn't matter up until recently. With the new problem of decoding variable length instructions, it matters more and more. Nowadays it's not a question of "if", but of "how much". Multiple decoder designs like Zen5 and Skymont might solve this problem.
 

Nothingness

Diamond Member
Jul 3, 2013
3,292
2,360
136
My take on this: at the lower-end, where performance is not a primary target, ISA matters as it helps using less silicon. At the higher-end, ISA still has impacts, but there are many techniques, that can potentially cost a lot in silicon area, to circumvent ISA shortcomings.
 

Doug S

Diamond Member
Feb 8, 2020
3,298
5,737
136
My take on this: at the lower-end, where performance is not a primary target, ISA matters as it helps using less silicon. At the higher-end, ISA still has impacts, but there are many techniques, that can potentially cost a lot in silicon area, to circumvent ISA shortcomings.

I'm not even sure its true at the lower end (speaking in terms of E core as lower end, not embedded stuff)

The only real difference is an extra pipeline stage or two because of the more complex decode, but there are plenty of ways to hide those delays - everyone has a pretty long pipeline these days because everyone is clocking pretty high.

I've long said the people who think ARM's fixed instruction length confers a big advantage were wrong. People were blinded by Apple's IPC, but x86 designs have always been more focused on clock rate. Now that Apple and the rest of the ARM crowd have taken their clock rates closer and closer to x86 territory, the IPC advantage is going to shrink. Because the higher your clock rate the further away all levels of cache and main memory are from your CPU in clock cycles. That's the latency that really hurts, not the extra cycles to decode x86's ridiculous instruction encoding.
 

LightningZ71

Platinum Member
Mar 10, 2017
2,315
2,906
136
One other thing that I've seen profiled in the past when comparing the different ISAs is the number of "ops" per instruction. From what I've seen, x86 (and it's various extensions) TENDS to generate a notable percentage more "ops" per instruction than the latest versions of ARM. So, while there are often a cycle or more of delay when decoding x86 instructions as compared to ARM, it tends to generate more ops in the balance. The two ISAs tend to be relatively equivalent in actual operation throughput, with x86 paying a slight penalty in circuitry (including the related bit of heat and power draw) to handle decoding variable length instructions.
 
  • Like
Reactions: Nothingness

Doug S

Diamond Member
Feb 8, 2020
3,298
5,737
136
One other thing that I've seen profiled in the past when comparing the different ISAs is the number of "ops" per instruction. From what I've seen, x86 (and it's various extensions) TENDS to generate a notable percentage more "ops" per instruction than the latest versions of ARM. So, while there are often a cycle or more of delay when decoding x86 instructions as compared to ARM, it tends to generate more ops in the balance. The two ISAs tend to be relatively equivalent in actual operation throughput, with x86 paying a slight penalty in circuitry (including the related bit of heat and power draw) to handle decoding variable length instructions.

"ops" per instruction is just a made up thing on top of the made up thing that is IPC.

If you want to measure two ISAs against each other start with the same source code, compile it, and see which finishes faster. That's the only test that matters.
 

MS_AT

Senior member
Jul 15, 2024
739
1,492
96
"ops" per instruction is just a made up thing on top of the made up thing that is IPC.

If you want to measure two ISAs against each other start with the same source code, compile it, and see which finishes faster. That's the only test that matters.
Your proposal will determine which microarchotecture is better not which ISA is better. To verify the latter you would need to have two identical CPUs that would differ only in ISA dependant parts. Such CPU doesn't exist yet.
 

Doug S

Diamond Member
Feb 8, 2020
3,298
5,737
136
Your proposal will determine which microarchotecture is better not which ISA is better. To verify the latter you would need to have two identical CPUs that would differ only in ISA dependant parts. Such CPU doesn't exist yet.

Because the question of "which ISA is better" is ridiculous. Implementation trumps ISA and always has.
 
  • Like
Reactions: Nothingness

oba_rw

Junior Member
May 9, 2024
1
0
11
Because the question of "which ISA is better" is ridiculous. Implementation trumps ISA and always has.
Then how will you explain these graphs, especially the M3? Being juiced out that much while still being more than twice as efficient as the most efficient x86 core.
Screenshot 2024-09-26 101630.pngScreenshot 2024-09-26 101652.png
 

FlameTail

Diamond Member
Dec 15, 2021
4,384
2,761
106
  • Like
Reactions: ikjadoon

soresu

Diamond Member
Dec 19, 2014
3,895
3,331
136
  • Like
Reactions: ikjadoon

gdansk

Diamond Member
Feb 8, 2011
4,209
7,068
136
They have been consistent and it's good trajectory.
I do worry about the pricing rumors, though.
 

MS_AT

Senior member
Jul 15, 2024
739
1,492
96
It is interesting how much they can continue to extract ILP, I guess at some point dependency chains should start limiting them and they will need to push CLK. I wonder if there are any studies on the subject.
 

soresu

Diamond Member
Dec 19, 2014
3,895
3,331
136
If ARM sticks to this +15% IPC CAGR for the next few years, they will soon outstrip Intel/AMD in absolute performance and may even catch up to Apple.
Would be nice, but I doubt they can keep up that kind of YOY improvement, unless their roadmap is very aggressive.

Either way X925 is already better than Apple in SIMD perf with its 6x NEON units.

SME as yet not having much in the way of implemented code out there it won't matter for a while, so NEON perf will continue to be the yardstick for everything but GeekBench until the software market catches up.

Even in scalar perf X925 is still pretty competitive in raw int IPC, if not in perf/watt.

I doubt that ARM Ltd cores will ever match Intel/AMD for SIMD perf tho - I just don't think that is their aim, though I could see some future Fujitsu ARM core punching at that target.
 

DZero

Golden Member
Jun 20, 2024
1,276
462
96
As long can catch Intel and AMD in brute force, x86 would be royally screwed.
 

FlameTail

Diamond Member
Dec 15, 2021
4,384
2,761
106
Would be nice, but I doubt they can keep up that kind of YOY improvement, unless their roadmap is very aggressive
I have thought about that. Remember when ARM CEO- Rene Haas claimed that ARM will hit 50% marketshare in PCs 5 years from now?


It's an unbelievable statement. Why did he say that? 2 possibilities;

(1) For the laughs

(2) He knows ARM's Cortex cores have an aggressive roadmap for the next 5 years, which means 50% could atleast be in the realms of possibilty.
 

soresu

Diamond Member
Dec 19, 2014
3,895
3,331
136
(2) He knows ARM's Cortex cores have an aggressive roadmap for the next 5 years, which means 50% could atleast be in the realms of possibilty.
Realistically that improved IPC isn't even necessary for that.

Current X925 IPC plus some serious improvements to perf/watt for longer battery life would be plenty for consumers if WoA drastically improves their native software library.

By that I absolute think that getting native ARM binary games need to be aggressively addressed to increase market share on top of the important proprietary DCC packages.
 
  • Like
Reactions: Io Magnesso

DZero

Golden Member
Jun 20, 2024
1,276
462
96
Realistically that improved IPC isn't even necessary for that.

Current X925 IPC plus some serious improvements to perf/watt for longer battery life would be plenty for consumers if WoA drastically improves their native software library.

By that I absolute think that getting native ARM binary games need to be aggressively addressed to increase market share on top of the important proprietary DCC packages.
And here comes Hoyoverse to start the trend.