Question x86 and ARM architectures comparison thread.

Page 17 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DavidC1

Golden Member
Dec 29, 2023
1,742
2,823
96
They aren't clock normalized by the way. You have to lop off 20% on Zen 4.

Also, comparison to Gracemont is important as doubled FP unit is the single biggest boost for FP performance that accounts for the 60-70% gain versus 30% that would be without. FP would have stayed at 30% without them!

If you take existing config(whatever it is) and double their numbers, performance in all existing applications would increase significantly, without recompiling code.

There are many talks in software where technical complexity is needing extra resources and burden in development. Hence, every time you introduce a new feature that requires such a thing, significantly less adoption happens each and every time. AVX is much less of an advantage of SSE, AVX2 is much less over AVX, and AVX512 is much less over AVX2. And every 2 years at that too!

Note that Intel were hinting at even wider 1024-bit extensions. You know what stopped them? When they stopped being a monopoly and is close to bankruptcy! If wider vectors are such a magic thing, why don't they keep expanding them?

This is stupid. Game developers especially keep talking about how much time and resources it takes to address just the technical side. More complexity, more resources, more compute power resulting in more complexity. AI too, with recent article talking about how it resulted in increased electricity for the residents of many places in the US. How is that a benefit? It's not far away from being a mass ponzi scheme.
 
Last edited:

Geddagod

Golden Member
Dec 28, 2021
1,493
1,588
106
Anyone know, or have any speculation, on why the x925 has such a smaller L1D cache compared to the ARM competition, despite running at lower clocks, and having the same latency in cycles?
 

johnsonwax

Senior member
Jun 27, 2024
308
472
96
This is stupid. Game developers especially keep talking about how much time and resources it takes to address just the technical side. More complexity, more resources, more compute power resulting in more complexity. AI too, with recent article talking about how it resulted in increased electricity for the residents of many places in the US. How is that a benefit? It's not far away from being a mass ponzi scheme.
The other problem is that you can't design a game for a subset of an architecture unless that's in a major console because it makes it unportable to other architectures and at least for AAA games, you usually need all the sales you can get. Apart from platform exclusives, you have to design to more or less to a lowest common denominator.
 
Jul 27, 2020
26,824
18,465
146
Apart from platform exclusives, you have to design to more or less to a lowest common denominator.
It's called being lazy. There's nothing stopping game developers from detecting the ISA extensions available on the CPU and then using the appropriately optimized DLL (which may sometimes be as simple as just targeting the desired ISA extensions during compilation) to take full advantage of the CPU's capabilities.
 

511

Diamond Member
Jul 12, 2024
3,577
3,394
106
Games are already complicated and if you have to support 3 different version it's going to be a pain for developers
 

MS_AT

Senior member
Jul 15, 2024
802
1,622
96
You do know it's not clock normalized right?
I do but LunarLake Skymont core and Zen5c have the same frequency. (In the second article, I mean the same max frequency, but that's the best we can count on under this circumstances).

You will of course say that Skymont on LunarLake is handicapped by lack of L3. That is why we have the first article, where Desktop Skymont is clocked at 4.6GHz, and has access to much more generous cache setup and access to lower latency memory than Zen5c in Strix Point and is still loosing in some FP subtests to Zen5c core.

Likewise with Zen4 comparisons if the fpu pipe mix was deciding factor and the ability to do 4 fp adds or 4 fp muls per cycle was dominating any benchmark, then Skymont would have pulled ahead of Zen4 despite the freq disadvantage( 20% freq disadvantage vs 100% per cycle execution advantage). Yet it cannot dominate even the Zen5c mobile core. What suggests the pipe arrangment other cores have is sufficient in practice.

Anyway the point is, I do not know of a benchmark that would single out and show Skymont FPU pipe arrangement to be in practice better than LionCove/Zen5. So I was asking if you know any?

Also, comparison to Gracemont is important as doubled FP unit is the single biggest boost for FP performance that accounts for the 60-70% gain versus 30% that would be without. FP would have stayed at 30% without them!
Well, I was not saying doubling does not bring benefits...

If you take existing config(whatever it is) and double their numbers, performance in all existing applications would increase significantly, without recompiling code.
That is questionable statement. It is too broad so can be easily proven wrong if we assume cache to be a resource that is getting doubled. Still that was not what I was asking about.

There are many talks in software where technical complexity is needing extra resources and burden in development. Hence, every time you introduce a new feature that requires such a thing, significantly less adoption happens each and every time. AVX is much less of an advantage of SSE, AVX2 is much less over AVX, and AVX512 is much less over AVX2. And every 2 years at that too!
The reason for low adoption rate is Intel business strategy. If they adapted new instructions sets as baselines across whole stack (meaning 5$ Celeron and Xeon would support the same ISA regardless of performance) AVX512 would be enjoying wide adoption by now.

But could you link please the talks you have mentioned? They are usually interesting to watch.

You know what stopped them? When they stopped being a monopoly and is close to bankruptcy! If wider vectors are such a magic thing, why don't they keep expanding them?
Vector register wider than cacheline does not make sense. Even Intel engineers commented as such. Marketing is well marketing. But from personal experience having the reg width match cacheline width makes the life easier.

The other problem is that you can't design a game for a subset of an architecture unless that's in a major console because it makes it unportable to other architectures and at least for AAA games, you usually need all the sales you can get. Apart from platform exclusives, you have to design to more or less to a lowest common denominator.
Teach people that object oriented design is a tool and not an universal answer to every problem and that data oriented design also exists. If they have cpu friendly code in the first place then regardless where they port the code it will do well as operating principles are the same across architectures.
 
  • Like
Reactions: CouncilorIrissa

poke01

Diamond Member
Mar 8, 2022
3,989
5,308
106
The video shows why the Intel laptop is priced higher. Just a better overall experience if you are not married to an internet browser. And the ARM laptop emitting more fan noise? Wow. Trust Qualcomm to make ARM look bad!
Qualcomm should’ve held off till V3 was ready. Oh well, they totally rushed their WoA launch once again. Hopefully, the 6th times the charm right?
 
  • Haha
Reactions: igor_kavinski

camel-cdr

Member
Feb 23, 2024
31
97
51
AVX is much less of an advantage of SSE, AVX2 is much less over AVX, and AVX512 is much less over AVX2
While I agree with the principle, AVX512 was a much bigger upgrade over AVX2 than AVX2 over SSE4 imo, because it added much more powerful instructions.

The better solution is obviously proper ISA design that doesn't require a full rewrite if you want to widen your vector units.