Question x86 and ARM architectures comparison thread.

DavidC1 · Aug 4, 2025

OneEng2 said:
Both Intel's E core architecture AND 18A with BSPDN are likely to limit the clock speeds that CWF can accomplish IMO.

Skymont on Arrowlake runs at 4.6GHz, while Sierra Forest runs it at 3GHz at max Turbo.

Why would it be limited? Me thinks you have a bias clouding judgment.

adroc_thurston · Aug 4, 2025

johnsonwax said:
Why wouldn't it last?

Because ARM wants to make real money (aka sell merchant Si).

johnsonwax said:
Presumably ARM by now has figured out how much the license need to cost to keep it going

You don't "keep it going" as a publicly traded company.
It's like a freebie crack sample.
You get CSPs addicted to cheap IP, then you rugpull them. Pretty simple.

johnsonwax said:
The problem is on AWS side to achieve the necessary scale/value to produce enough revenue to pay for the increasing development cost.

The hard, sweaty core R&D is outsourced to ARM.
With Neoverse CSS, everything but TO and post-Si costs is outsourced to ARM.

johnsonwax said:
It's not like AMD isn't having the same problem given their need to offer so many SKUs.

They've been actually going in reverse, offering more and more specialized platforms and SKU options every gen.

OneEng2 · Aug 4, 2025

DavidC1 said:
Die area, but adds something bigger in the negative which is increased difficulty of validation and risk made worse by Meltdown/Spectre era.

It doesn't seem to bother AMD, Intel, and IBM. 5% die increase for 40% mt performance? Seems like a no-brainer to me.

DavidC1 said:
144 core Gracemont-based Sierra Forest is already pretty competitive in Integer workloads such as Cloud, Virtualization, and Kernel Compile: https://www.servethehome.com/wp-con...rin-Linux-Kernel-Compile-Benchmark-scaled.jpg

288 cores Clearwater Forest with Skymont cores will equal Turin D even if you assume only 2x gains. We would have seen most of the 2x gains had they released 288 core Sierra Forest.

There are many cases where Sierra Forest is substantially behind, and will continue to remain behind even against Turin D, but will improve substantially due to:
-2x FP capability in addition to substantial uarch advancements
-Large, under the core cache, connected by Foveros.
-Additional optimizations specific to certain workloads that we do not know.

That's why @511 said it's already close in SpecCPU.

Yea, well seems like it's getting spanked by lower core count Zen processors in DC tasks. Maybe I am missing something here?

I would hate to see what an EPYC 9965 would look like in this chart.... but then that would likely skew the chart so the other comparisons were much harder to see.

geometric-mean-of-all-test-results-result-composite-ix6sfvaavae.svgz

DavidC1 said:
Skymont on Arrowlake runs at 4.6GHz, while Sierra Forest runs it at 3GHz at max Turbo.

Why would it be limited? Me thinks you have a bias clouding judgment.

Yea, exactly. On a DC processor it runs at 3Ghz "max turbo" while AMD's Turin D Zen 5c cores boost to 3.7Ghz with 192 cores.

Of course, we are talking Intel 3 vs N3E so it isn't exactly apples to apples.

511 · Aug 4, 2025

OneEng2 said:
Yea, exactly. On a DC processor it runs at 3Ghz "max turbo" while AMD's Turin D Zen 5c cores boost to 3.7Ghz with 192 cores.

3.7 GHz is max boost and at 500W TDP not all core boost vs 144C 3 GHz @ 250W TDP

adroc_thurston · Aug 4, 2025

511 said:
3.7 GHz is max boost and at 500W TDP not all core boost

It is an all-core boost (same as Bergamo. Oh and it's far off 500W outside of heavy FP crunch).
They're CSP parts, they need 100% consistent, fully stable boosting behavior 100% of the time.

mikegg · Aug 4, 2025

Geddagod said:
Software based power readings.

Exactly what I thought. Can't rely on software power readings since they can be measuring different things.

Notebookcheck's 3.6x M4 efficiency over Zen5 is likely far more accurate than David Huang's measurements. Measure it from the wall. It's by far the better way to evaluate real world efficiency.

adroc_thurston · Aug 4, 2025

mikegg said:
It's by far the better way to evaluate real world efficiency.

That's completely useless unless iso chassis.
You're comparing two different laptops with different panels, SSDs, networking, PDN and SOCs.

mikegg · Aug 4, 2025

adroc_thurston said:
That's completely useless unless iso chassis.
You're comparing two different laptops with different panels, SSDs, networking, PDN and SOCs.

What? That's nuts. Do you have any clue what you're talking about? That's why you measure idle power first. Load power minus idle power. That's how Anandtech used to do it. That's how Notebookcheck does it. That's how any competent reviewer would do it.

Using software core measurements is completely useless since software core readings differ between designs and what it is mesuring.

Clearly the M4 laptop experience is more than the 30% ST power efficient claimed over Zen5. The reality is closer to 3.6x when they're running at stock performance.

johnsonwax · Aug 4, 2025

adroc_thurston said:
Because ARM wants to make real money (aka sell merchant Si).

You don't "keep it going" as a publicly traded company.
It's like a freebie crack sample.
You get CSPs addicted to cheap IP, then you rugpull them. Pretty simple.

Well, they're 35 years in now without the pull.

adroc_thurston said:
The hard, sweaty core R&D is outsourced to ARM.
With Neoverse CSS, everything but TO and post-Si costs is outsourced to ARM.

No sh*t. And they're paying ARM a license, and ARM distributes that design cost across all of their licenses.

adroc_thurston said:
They've been actually going in reverse, offering more and more specialized platforms and SKU options every gen.

Sure, they've been growing at Intel's expense. Look at Intel once they stopped growing. Are there more specialized platforms and SKU options across x86 every gen? That'll be an interesting thing to look at in 2-3 years. Not fair now while Intel is in turmoil, but x86 units are not growing meaningfully. I think x86 server units peaked in 2022 and x86 desktop in 2012.

adroc_thurston · Aug 5, 2025

mikegg said:
Clearly the M4 laptop experience is more than the 30% ST power efficient claimed over Zen5. The reality is closer to 3.6x when they're running at stock performance.

wait, 1t power? ah who cares, this is going outta the window the next few generations.

johnsonwax said:
Well, they're 35 years in now without the pull.

?

johnsonwax said:
And they're paying ARM a license, and ARM distributes that design cost across all of their licenses.

They pay ARM pennies, and pocket all the resulting margin that would go towards merchant Si.
Which is not something that's gonna last.

johnsonwax said:
Are there more specialized platforms and SKU options across x86 every gen?

yeah?

johnsonwax said:
but x86 units are not growing meaningfully

Server CPU units are just not growing at all during the GPU spending crunch.

mikegg · Aug 5, 2025

johnsonwax said:
You don't know for sure,

Exactly. The most efficient Windows laptop in the world is an M4 Mac running Parallels and Windows on ARM.

So we shouldn't pin M4's efficiency advantage on just because it makes sense for macOS to be more optimized for its own Silicon.

If we're going by this, we can also say tests where Apple Silicon loses is mostly likely due to much longer x86 optimization. We can only go by the results. Apple doesn't sell Apple Silicon chips by themselves. AMD and Intel don't have their own OS.

Besides, we have load tests where scheduling matter little to none. In those tests, Apple Silicon still wins by a lot.

mikegg · Aug 5, 2025

adroc_thurston said:
wait, 1t power? ah who cares, this is going outta the window the next few generations.

Yes 1t. What? 1t matters a lot.

What are we doing here?

511 · Aug 5, 2025

adroc_thurston said:
It is an all-core boost (same as Bergamo. Oh and it's far off 500W outside of heavy FP crunch).
They're CSP parts, they need 100% consistent, fully stable boosting behavior 100% of the time.

https://www.amd.com/en/products/processors/server/epyc/9005-series/amd-epyc-9965.html

No?

poke01 · Aug 5, 2025

mikegg said:
Yes 1t. What? 1t matters a lot.

What are we doing here?

I think adroc means 1t power consumption will go out the window, not that 1t is not important.

mikegg · Aug 5, 2025

poke01 said:
I think adroc means 1t power consumption will go out the window, not that 1t is not important.

That hasn't happened. Even if it does, some architectures will still be more efficient than others.

adroc_thurston · Aug 5, 2025

mikegg said:
That hasn't happened

Already did. Will get worse once aa64 club ships 5Ghz cores.

511 · Aug 5, 2025

adroc_thurston said:
Already did. Will get worse once aa64 club ships 5Ghz cores.

that is due next year anyway

Geddagod · Aug 5, 2025

Jan Olšan said:
calculating that by dropping SMT you can make a core smaller and stuff more of them on chip to compensate, that is likely a very valid approach

The area savings are miniscule. Adding SMT is like a ~5% increase in area.

poke01 said:
No I don’t believe they do. Qualcomm X Elite can access the entire L2 cache within the same cluster but cannot access another P cores cluster L2 cache.

Ah you are right. My bad.

Anacapols · Aug 5, 2025

Replying to milkegg, spam filter being wonky again.

Firstly, notebookcheck measures a 7.8 watt idle power difference between the 395 and M4 pro (both with the display on), which should be a state where the cores are barely doing anything, and most of the power difference is from the SOC, memory, screen, idle tasks etc.

Secondly, the 395 in particular does very bad at their cinebench R24 single core efficiency test, with the 365 they tested being 53% more efficient despite having the same cores, so clearly parts other than the CPU core alone contribute a lot to the efficiency according to these tests, I don't think you can reasonably state that these tests are highly indicative of the cpu architecture efficiency.

Lastly, while the 14 core M4 pro is stated to be 324% more efficient in their single core CB24 benchmark, it ends up at only 54% more efficient in the CB24 multi-core benchmark

So to me there can only be 3 things going on here:
- Apple cores get massively less efficient in multicore compared to singlecore, either due to some architectural trick that only works in singlecore scenarios or very poor multicore scaling. While I wouldn't be suprised to see some of the former, the absolute multicore performance in properly scaling benchmarks makes me believe this is not very significant.
- Amd has some magic going on that allows them very superlinear scaling (which can be easily proven to be false, they scale a little better at best, assuming equal core counts) or their cores are massively less efficient in 1t tasks (which is true, they boost further up their efficiency curve to compete in ST performance at the cost of efficiency, whereas both are at good points in their efficiency curve in MT tests).
- Other factors influence 1t power draw such as the idle power state, background work, other componennts etc. This conclusion is reinforced by point 1 and 2 also.

To me it seems obvious that based on the MT tests, where non-core power is a smaller fraction of the total power draw and where AMD is not boosting as far outside their efficiency sweet spot that the core-to-core difference in CB24 is far closer to the 54% seen in multicore (possibly lower if apple still has some uncore power advantage here, as would seem to be the case as the HX 370 with the same zen 5 cores is only 20% away from the best M4 result) than the 324% seen in single-core.
If you can rationally explain why the ST tests are representative of core efficiency and the MT tests are not I'd love to learn, but I just don't see any evidence pointing that way.

DavidC1 · Aug 5, 2025

Anacapols said:
Replying to milkegg, spam filter being wonky again.

Firstly, notebookcheck measures a 7.8 watt idle power difference between the 395 and M4 pro (both with the display on), which should be a state where the cores are barely doing anything, and most of the power difference is from the SOC, memory, screen, idle tasks etc.
View attachment 128289

That's inaccurate. You can't use AC wall power measurements. The Rog system gets 10 hours of battery life on web browsing, which isn't possible with 13.8W use.

Power management is only all active under battery, not AC.

Anacapols said:
So to me there can only be 3 things going on here:
- Apple cores get massively less efficient in multicore compared to singlecore, either due to some architectural trick that only works in singlecore scenarios or very poor multicore scaling. While I wouldn't be suprised to see some of the former, the absolute multicore performance in properly scaling benchmarks makes me believe this is not very significant.

It's not inefficient. Apple only has 10 cores versus AMD's 16, and AMD adds SMT on top of that. If you want higher MT efficiency, you put more cores and clock them lower, that simple.

The fact that it still has a massive advantage is a testament to Apple design.

mikegg said:
What? That's nuts. Do you have any clue what you're talking about? That's why you measure idle power first. Load power minus idle power. That's how Anandtech used to do it. That's how Notebookcheck does it. That's how any competent reviewer would do it.

Doing Wall measurements aren't accurate either.

Software measurement is actually pretty accurate, as long as you measure the same thing. You simply measure whole system power, that's it. Saying you are power efficient means nothing, if the battery life isn't better.

Apple systems actually offer that.

OneEng2 said:
It doesn't seem to bother AMD, Intel, and IBM. 5% die increase for 40% mt performance? Seems like a no-brainer to me.

And they are substantially behind in CPU uarch. Pretty sure that's no coincidence.

IBM is different because they cater entirely to a niche market, but profitable for them.

Anacapols · Aug 5, 2025

While I agree with the 10v16 core argument, it doesn't really apply here as the 4+8 core zen 5 part (370) scores significantly better against the 10+4 in terms of multicore efficiency than the 16 core part does

Panino Manino · Aug 5, 2025

I don't want to believe, I'm having cold chills... the main Anandtech website was DELETED...?

poke01 · Aug 5, 2025

Anacapols said:
While I agree with the 10v16 core argument, it doesn't really apply here as the 4+8 core zen 5 part (370) scores significantly better against the 10+4 in terms of multicore efficiency than the 16 core part does

until you match it with the base M4. the HX 370 isn't that great imo.

The M4 4+6/10t is 16% slower compared to the 4+8/24t HX 370. I've used R23 which also favours x86.

Anacapols · Aug 5, 2025

poke01 said:
until you match it with the base M4. the HX 370 isn't that great imo.
View attachment 128297

View attachment 128298

The M4 4+6/10t is 16% slower compared to the 4+8/24t HX 370. I've used R23 which also favours x86.

Certainly, my comparison was mostly about efficiency against the 395 numbers on which milkegg based the 2.6 or greater efficiency claims. Anything in the 1.2 - 1.6 times efficiency range is what I'd expect for core only / multicore, and apple cores are absolutely very fast.

mikegg · Aug 5, 2025

Anacapols said:
Secondly, the 395 in particular does very bad at their cinebench R24 single core efficiency test, with the 365 they tested being 53% more efficient despite having the same cores, so clearly parts other than the CPU core alone contribute a lot to the efficiency according to these tests, I don't think you can reasonably state that these tests are highly indicative of the cpu architecture efficiency.

That's because 395 clocks higher than 365 and probably has more cache that needs to be fired up.

Anacapols said:
Lastly, while the 14 core M4 pro is stated to be 324% more efficient in their single core CB24 benchmark, it ends up at only 54% more efficient in the CB24 multi-core benchmark

That's because 395 has more cores and MT. More cores at a lower clock will increase MT efficiency greatly. AMD has been good at MT.

Question x86 and ARM architectures comparison thread.

Platinum Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Platinum Member

Diamond Member

Platinum Member

Senior member

Diamond Member

Platinum Member

Platinum Member

Diamond Member

Diamond Member

Platinum Member

Diamond Member

Diamond Member

Golden Member

Junior Member

Platinum Member

Junior Member

Golden Member

Diamond Member

Junior Member

Platinum Member