Question x86 and ARM architectures comparison thread.

Page 11 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DavidC1

Golden Member
Dec 29, 2023
1,693
2,773
96
Both Intel's E core architecture AND 18A with BSPDN are likely to limit the clock speeds that CWF can accomplish IMO.
Skymont on Arrowlake runs at 4.6GHz, while Sierra Forest runs it at 3GHz at max Turbo.

Why would it be limited? Me thinks you have a bias clouding judgment.
 

adroc_thurston

Diamond Member
Jul 2, 2023
6,174
8,694
106
Why wouldn't it last?
Because ARM wants to make real money (aka sell merchant Si).
Presumably ARM by now has figured out how much the license need to cost to keep it going
You don't "keep it going" as a publicly traded company.
It's like a freebie crack sample.
You get CSPs addicted to cheap IP, then you rugpull them. Pretty simple.
The problem is on AWS side to achieve the necessary scale/value to produce enough revenue to pay for the increasing development cost.
The hard, sweaty core R&D is outsourced to ARM.
With Neoverse CSS, everything but TO and post-Si costs is outsourced to ARM.
It's not like AMD isn't having the same problem given their need to offer so many SKUs.
They've been actually going in reverse, offering more and more specialized platforms and SKU options every gen.
 

OneEng2

Senior member
Sep 19, 2022
730
978
106
Die area, but adds something bigger in the negative which is increased difficulty of validation and risk made worse by Meltdown/Spectre era.
It doesn't seem to bother AMD, Intel, and IBM. 5% die increase for 40% mt performance? Seems like a no-brainer to me.

144 core Gracemont-based Sierra Forest is already pretty competitive in Integer workloads such as Cloud, Virtualization, and Kernel Compile: https://www.servethehome.com/wp-con...rin-Linux-Kernel-Compile-Benchmark-scaled.jpg

288 cores Clearwater Forest with Skymont cores will equal Turin D even if you assume only 2x gains. We would have seen most of the 2x gains had they released 288 core Sierra Forest.

There are many cases where Sierra Forest is substantially behind, and will continue to remain behind even against Turin D, but will improve substantially due to:
-2x FP capability in addition to substantial uarch advancements
-Large, under the core cache, connected by Foveros.
-Additional optimizations specific to certain workloads that we do not know.

That's why @511 said it's already close in SpecCPU.
Yea, well seems like it's getting spanked by lower core count Zen processors in DC tasks. Maybe I am missing something here?

I would hate to see what an EPYC 9965 would look like in this chart.... but then that would likely skew the chart so the other comparisons were much harder to see.


geometric-mean-of-all-test-results-result-composite-ix6sfvaavae.svgz

Skymont on Arrowlake runs at 4.6GHz, while Sierra Forest runs it at 3GHz at max Turbo.

Why would it be limited? Me thinks you have a bias clouding judgment.
Yea, exactly. On a DC processor it runs at 3Ghz "max turbo" while AMD's Turin D Zen 5c cores boost to 3.7Ghz with 192 cores.

Of course, we are talking Intel 3 vs N3E so it isn't exactly apples to apples.
 
  • Like
Reactions: Tlh97

mikegg

Golden Member
Jan 30, 2010
1,910
527
136
Software based power readings.
Exactly what I thought. Can't rely on software power readings since they can be measuring different things.

Notebookcheck's 3.6x M4 efficiency over Zen5 is likely far more accurate than David Huang's measurements. Measure it from the wall. It's by far the better way to evaluate real world efficiency.
 

mikegg

Golden Member
Jan 30, 2010
1,910
527
136
That's completely useless unless iso chassis.
You're comparing two different laptops with different panels, SSDs, networking, PDN and SOCs.
What? That's nuts. Do you have any clue what you're talking about? That's why you measure idle power first. Load power minus idle power. That's how Anandtech used to do it. That's how Notebookcheck does it. That's how any competent reviewer would do it.

Using software core measurements is completely useless since software core readings differ between designs and what it is mesuring.

Clearly the M4 laptop experience is more than the 30% ST power efficient claimed over Zen5. The reality is closer to 3.6x when they're running at stock performance.
 
  • Like
Reactions: DavidC1

johnsonwax

Senior member
Jun 27, 2024
261
421
96
Because ARM wants to make real money (aka sell merchant Si).

You don't "keep it going" as a publicly traded company.
It's like a freebie crack sample.
You get CSPs addicted to cheap IP, then you rugpull them. Pretty simple.
Well, they're 35 years in now without the pull.
The hard, sweaty core R&D is outsourced to ARM.
With Neoverse CSS, everything but TO and post-Si costs is outsourced to ARM.
No sh*t. And they're paying ARM a license, and ARM distributes that design cost across all of their licenses.
They've been actually going in reverse, offering more and more specialized platforms and SKU options every gen.
Sure, they've been growing at Intel's expense. Look at Intel once they stopped growing. Are there more specialized platforms and SKU options across x86 every gen? That'll be an interesting thing to look at in 2-3 years. Not fair now while Intel is in turmoil, but x86 units are not growing meaningfully. I think x86 server units peaked in 2022 and x86 desktop in 2012.
 

adroc_thurston

Diamond Member
Jul 2, 2023
6,174
8,694
106
Clearly the M4 laptop experience is more than the 30% ST power efficient claimed over Zen5. The reality is closer to 3.6x when they're running at stock performance.
wait, 1t power? ah who cares, this is going outta the window the next few generations.
Well, they're 35 years in now without the pull.
?
And they're paying ARM a license, and ARM distributes that design cost across all of their licenses.
They pay ARM pennies, and pocket all the resulting margin that would go towards merchant Si.
Which is not something that's gonna last.
Are there more specialized platforms and SKU options across x86 every gen?
yeah?
but x86 units are not growing meaningfully
Server CPU units are just not growing at all during the GPU spending crunch.
 

mikegg

Golden Member
Jan 30, 2010
1,910
527
136
You don't know for sure,
Exactly. The most efficient Windows laptop in the world is an M4 Mac running Parallels and Windows on ARM.

So we shouldn't pin M4's efficiency advantage on just because it makes sense for macOS to be more optimized for its own Silicon.

If we're going by this, we can also say tests where Apple Silicon loses is mostly likely due to much longer x86 optimization. We can only go by the results. Apple doesn't sell Apple Silicon chips by themselves. AMD and Intel don't have their own OS.

Besides, we have load tests where scheduling matter little to none. In those tests, Apple Silicon still wins by a lot.
 
Last edited:

Geddagod

Golden Member
Dec 28, 2021
1,440
1,551
106
calculating that by dropping SMT you can make a core smaller and stuff more of them on chip to compensate, that is likely a very valid approach
The area savings are miniscule. Adding SMT is like a ~5% increase in area.
No I don’t believe they do. Qualcomm X Elite can access the entire L2 cache within the same cluster but cannot access another P cores cluster L2 cache.
Ah you are right. My bad.
 

Anacapols

Junior Member
Mar 2, 2025
4
3
41
Replying to milkegg, spam filter being wonky again.

Firstly, notebookcheck measures a 7.8 watt idle power difference between the 395 and M4 pro (both with the display on), which should be a state where the cores are barely doing anything, and most of the power difference is from the SOC, memory, screen, idle tasks etc.
1754383308932.png

Secondly, the 395 in particular does very bad at their cinebench R24 single core efficiency test, with the 365 they tested being 53% more efficient despite having the same cores, so clearly parts other than the CPU core alone contribute a lot to the efficiency according to these tests, I don't think you can reasonably state that these tests are highly indicative of the cpu architecture efficiency.
1754383335792.png

Lastly, while the 14 core M4 pro is stated to be 324% more efficient in their single core CB24 benchmark, it ends up at only 54% more efficient in the CB24 multi-core benchmark
1754383454901.png
So to me there can only be 3 things going on here:
- Apple cores get massively less efficient in multicore compared to singlecore, either due to some architectural trick that only works in singlecore scenarios or very poor multicore scaling. While I wouldn't be suprised to see some of the former, the absolute multicore performance in properly scaling benchmarks makes me believe this is not very significant.
- Amd has some magic going on that allows them very superlinear scaling (which can be easily proven to be false, they scale a little better at best, assuming equal core counts) or their cores are massively less efficient in 1t tasks (which is true, they boost further up their efficiency curve to compete in ST performance at the cost of efficiency, whereas both are at good points in their efficiency curve in MT tests).
- Other factors influence 1t power draw such as the idle power state, background work, other componennts etc. This conclusion is reinforced by point 1 and 2 also.

To me it seems obvious that based on the MT tests, where non-core power is a smaller fraction of the total power draw and where AMD is not boosting as far outside their efficiency sweet spot that the core-to-core difference in CB24 is far closer to the 54% seen in multicore (possibly lower if apple still has some uncore power advantage here, as would seem to be the case as the HX 370 with the same zen 5 cores is only 20% away from the best M4 result) than the 324% seen in single-core.
If you can rationally explain why the ST tests are representative of core efficiency and the MT tests are not I'd love to learn, but I just don't see any evidence pointing that way.
 
Last edited:
  • Like
Reactions: MS_AT

DavidC1

Golden Member
Dec 29, 2023
1,693
2,773
96
Replying to milkegg, spam filter being wonky again.

Firstly, notebookcheck measures a 7.8 watt idle power difference between the 395 and M4 pro (both with the display on), which should be a state where the cores are barely doing anything, and most of the power difference is from the SOC, memory, screen, idle tasks etc.
View attachment 128289
That's inaccurate. You can't use AC wall power measurements. The Rog system gets 10 hours of battery life on web browsing, which isn't possible with 13.8W use.

Power management is only all active under battery, not AC.
So to me there can only be 3 things going on here:
- Apple cores get massively less efficient in multicore compared to singlecore, either due to some architectural trick that only works in singlecore scenarios or very poor multicore scaling. While I wouldn't be suprised to see some of the former, the absolute multicore performance in properly scaling benchmarks makes me believe this is not very significant.
It's not inefficient. Apple only has 10 cores versus AMD's 16, and AMD adds SMT on top of that. If you want higher MT efficiency, you put more cores and clock them lower, that simple.

The fact that it still has a massive advantage is a testament to Apple design.
What? That's nuts. Do you have any clue what you're talking about? That's why you measure idle power first. Load power minus idle power. That's how Anandtech used to do it. That's how Notebookcheck does it. That's how any competent reviewer would do it.
Doing Wall measurements aren't accurate either.

Software measurement is actually pretty accurate, as long as you measure the same thing. You simply measure whole system power, that's it. Saying you are power efficient means nothing, if the battery life isn't better.

Apple systems actually offer that.
It doesn't seem to bother AMD, Intel, and IBM. 5% die increase for 40% mt performance? Seems like a no-brainer to me.
And they are substantially behind in CPU uarch. Pretty sure that's no coincidence.

IBM is different because they cater entirely to a niche market, but profitable for them.
 

Anacapols

Junior Member
Mar 2, 2025
4
3
41
While I agree with the 10v16 core argument, it doesn't really apply here as the 4+8 core zen 5 part (370) scores significantly better against the 10+4 in terms of multicore efficiency than the 16 core part does