Question x86 and ARM architectures comparison thread.

OneEng2 · Aug 2, 2025

poke01 said:
Yes buts that only if you ignore the number of physical cores and logical cores differences. But that’s the point, you cannot do so when comparing uArch against real world nT applications.

Why?

I guarantee that people will be judging Zen 6's 24 cores against Intel's 52. There is nothing unfair about that IMO.

In fact, that is going to be a very interesting comparison since it is likely both processors will be built on the same N2 process!

It isn't how many processors you have, it's what you pay for it, how well it works, and what it costs the company to make.

Also, as far as efficiency goes, this is only important to a certain extent. In the DC market (as an example) it is still about how much performance you can get from the amount of power that you can supply to a socket. If the socket can supply (Venice will have 600W / socket as an example). I don't think anyone gives you credit if you only use 400W but lose the performance by 65%.

Geddagod · Aug 2, 2025

Zen 4 and Zen 5 both only have perf/watt ~matching M1.

A hypothetical Zen 5 on N3E will likely not give you the ~30% perf/watt jump to match the M4 in single thread perf/watt.
Credit to David Huang for both tests

gdansk · Aug 2, 2025

poke01 said:
Use this. https://www.seense.com/menubarstats/mxpg/

It uses macOS powermetrics.

Best I can do right now.

MacOS 15.4.1 CPU Only Mac Studio M4.txt Benchmarks - OpenBenchmarking.org

openbenchmarking.org

So 12C Strix Halo doesn't look too bad in comparison to 12C M4 Pro? Of course, not great either if you consider power. But does at least manage to put up better scores in the chop config.

Edit: Updated with M4P 14C. It about matches 12C Strix Halo (Linux) in this test.

MacOS 15.4.1 CPU Only Mac Studio M4.txt Benchmarks - OpenBenchmarking.org

openbenchmarking.org

Covfefe · Aug 2, 2025

Doug S said:
But a lot of people here seem to trying to argue that ARM cores in general, or Apple's in particular, are somehow unsuitable for DC. That's ridiculous on its face. No one can point to a benchmark that shows Apple cores as not being appropriate for DC tasks. Test x number of Apple P cores against the same number of x86 cores P cores and unless you're talking tasks that "just happen" to be all about AVX512 (or alternatively all about SVE2) you can't find any big difference in either direction. The only thing people can point to is "well x86 scales up to 192 cores and Apple doesn't" trying to imply that this is proof that Apple can't. That's just ignorant reasoning. It is the exact same reasoning people used to use claiming Apple's cores weren't appropriate for PCs, because Apple was only using them in phones.

Phoronix's M4 benchmarks have several examples where the it loses badly to x86 CPUs.

Apple M4 Mac Mini With macOS vs. Intel / AMD With Ubuntu Linux Performance Review - Phoronix

www.phoronix.com

gdansk · Aug 2, 2025

Covfefe said:
Phoronix's M4 benchmarks have several examples where the it loses badly to x86 CPUs.

Apple M4 Mac Mini With macOS vs. Intel / AMD With Ubuntu Linux Performance Review - Phoronix

www.phoronix.com

View attachment 128153

There will be many of these but this particular example might be because Linux is the only operating system anyone* uses to run a web server. If you're comparing across operating systems you might accidentally be benching IO syscalls. One can say MacOS might be unfit for DC use but I doubt it's a hardware problem.

* except the weird few who still use Windows/IIS

Covfefe · Aug 2, 2025

gdansk said:
There will be many of these but this particular example might be because Linux is the only operating system anyone* uses to run a web server. If you're comparing across operating systems you might accidentally be benching IO syscalls. One can say MacOS might be unfit for DC use but I doubt it's a hardware problem.

* except the weird few who still use Windows/IIS

Could be, but its worth noting that macos does (or did) ship with a version of apache server natively installed. While that doesn't guarantee perfectly optimized software, it does show that Apple has some level of confidence in their platform as a web server.

johnsonwax · Aug 3, 2025

gdansk said:
It's hard to compare microarchitecture when AMD is behind on process by their own profiteering choice. But people will always do it anyway because that's what we can.

All I ask is that the ARM fans try to understand that AMD has more or less slapped all ARM server attempts back to the safety of the hypervisors big enough to exploit ARM's discount IP. And why that might be. One contributing factor is that Apple is the exception to ARM implementations so far. Maybe Nuvia/Qualcomm can also pull it off but they haven't yet. It seems just as much a matter of time as it did five years ago, which is odd for something inevitable.

My argument has always been that there is a lesson in why Apple is the exception to ARM implementations.

It's obviously not an intrinsic feature of the ARM ISA or else all implementations would show it, but it's possibly an enabling feature. Nobody else is getting Apple's decode width, which may not be possible with x86 right now, and that favors single core over AMD scale SMP. And what are we considering in this comparison? Memory architecture? Asymmetric/heterogeneous cores?

I have argued that the reason why Apple Silicon is the exception is more due to business model than engineering. For example, how did Apple get a 5x improvement on releasing memory in M1 over x86? Is that an inherent property of the design or is that something they sought because it was more critical to performance than it would be on x86, and how did it come to be that was more critical?

There's been a debate raging in F1 regarding the Red Bull car performance difference between their #1 and #2 drivers. See, their #1 driver Max Verstappen does great in the car, and their #2 drivers have been generally terrible - like really terrible - averaging something like 10 positions behind Max. What does this say about the performance of the car relative to other teams? One driver can finish on the podium and the other can't even get in the points - and they've gone through 3 drivers like this. Nobody has ever seen anything like it. The leading theory is that the car is pretty mid but if you set it up precisely for Maxs driving style and talent he, and only he, can get good results out of it. Essentially, the car is mid, but has the capacity to be great, but only in a controlled environment.

So is Apple Silicon performant because Apple has so much control over the entire environment and can tune it for that environment, and is the rest of ARM and x86 less performant because their business models don't allow for that and they need to engineer to a broad set of applications such that trade offs for performance can't be well controlled? Because Apple better controls the nature of how the CPU performs, they can make tradeoffs in favor of what it is most likely to be doing and how. So it can accept worse performance in areas that it sees infrequently in exchange for better performance in areas it sees frequently. By comparison, the component suppliers don't have that agency and need to balance performance across all potential situations which means they have to forgo that peak potential. They make up for some of that with their SKU spam by having variants that are better suited for some applications than others, but ultimately can't fully make up for it.

Microsoft quite a while ago figured out that they needed to follow Apple. So I think they by and large understand the benefits of Apple's model and their arrangement with Qualcomm could get them there. But I'm not sure Microsoft is able to go as far as Apple has. They don't have the degree of influence over their developers. x86 is still the primary business. But we'll see.

johnsonwax · Aug 3, 2025

Covfefe said:
Could be, but its worth noting that macos does (or did) ship with a version of apache server natively installed. While that doesn't guarantee perfectly optimized software, it does show that Apple has some level of confidence in their platform as a web server.

I've run web servers on MacOS. There's lots of different web profiles. 250K requests per second is a high volume static site, fetching pages from cache. Nobody doing that is picking Apple. Hell, nobody doing that is doing it on prem. And Apache is not in any way optimized for Apple Silicon. It's there as a convenience. Docker is similarly terrible, which is why Apple has their own solution. I would not be surprised if Apple's was dramatically more performant, mainly because these kinds of applications really rely on very tightly optimized code.

I've run Nginx as a front-end for data science tools. Low requests per second, performance mixed to SSD read and data compute. Ran pretty well. Wanted it on prem inside the firewalls. That's much closer to the use case for a web server on Mac.

Your profile reminded me that Apple also shipped Java on MacOS for a year that wasn't Apple Silicon native because they had a contract with Oracle who couldn't get their act together. Literally everyone installed Azul and still does. Just because Apple ships it doesn't mean that it's good - sometimes that's what the contract is.

mikegg · Aug 3, 2025

Geddagod said:
View attachment 128141
Zen 4 and Zen 5 both only have perf/watt ~matching M1.
View attachment 128142
A hypothetical Zen 5 on N3E will likely not give you the ~30% perf/watt jump to match the M4 in single thread perf/watt.
Credit to David Huang for both tests

What is the actual SoC power? I'm not convinced that Zen5 is only 30% away from M4. How does David Huang measure power? Is it through the wall doing load - idle for both Zen5 and M4?

Cinebench ST perf/watt:

M4 Pro: 9.52 pts/W
Strix Halo 395: 2.62 pts/W

3.6x better ST perf/watt is closer to the real world experience of using a Zen5 laptop vs an M4 laptop.

S'renne · Aug 3, 2025

johnsonwax said:
I've run web servers on MacOS. There's lots of different web profiles. 250K requests per second is a high volume static site, fetching pages from cache. Nobody doing that is picking Apple. Hell, nobody doing that is doing it on prem. And Apache is not in any way optimized for Apple Silicon. It's there as a convenience. Docker is similarly terrible, which is why Apple has their own solution. I would not be surprised if Apple's was dramatically more performant, mainly because these kinds of applications really rely on very tightly optimized code.

I've run Nginx as a front-end for data science tools. Low requests per second, performance mixed to SSD read and data compute. Ran pretty well. Wanted it on prem inside the firewalls. That's much closer to the use case for a web server on Mac.

Your profile reminded me that Apple also shipped Java on MacOS for a year that wasn't Apple Silicon native because they had a contract with Oracle who couldn't get their act together. Literally everyone installed Azul and still does. Just because Apple ships it doesn't mean that it's good - sometimes that's what the contract is.

Yeah pretty sure part of what makes Apple Silicon works so well is also from the MacOS optimisations for the target audience

mikegg · Aug 3, 2025

Covfefe said:
Phoronix's M4 benchmarks have several examples where the it loses badly to x86 CPUs.

Apple M4 Mac Mini With macOS vs. Intel / AMD With Ubuntu Linux Performance Review - Phoronix

www.phoronix.com

View attachment 128153

Is this because M4 is actually slower or just highly unoptimized where M4 is running Asahi Linux (not official), and nginx has x86 hand coded optimizations?

Anacapols · Aug 3, 2025

Replying to Milkegg
It's stated in the figure that it's core (+cache for z4/5) power only, which is in line with zen generally being more ppw competitive in larger many threaded loads as the soc power is less of an overhead there.

My understanding is that apple has a decent core ppw advantage (0 to 50% depending on the workload) and a very large soc and operating system power management advantage that shows itself in 1t and lighter tasks (daily use).

mikegg · Aug 3, 2025

Anacapols said:
My understanding is that apple has a decent core ppw advantage (0 to 50% depending on the workload) and a very large soc and operating system power management advantage that shows itself in 1t and lighter tasks (daily use).

1. How do we know macOS has better power management? For all we know, macOS has worse power management than Windows and Linux but the SoC carries the OS. How does OS power management lower total SoC power when running ST loads? We seem to be making a ton of assumptions without any proof.

2. How do we know that AMD's ST power efficiency is truly only measuring the core in the exact same way the Apple Silicon is? I think the best way is to take ST load power subtract idle power taken from the wall. This is how Notebookcheck does it. When power is measured this way, M4 is 3.6x more efficient than Zen5 in Cinebench. That seems way more in line with real world usage experience.

Anacapols · Aug 3, 2025

Windows is quite infamous for its background services and bloat, whereas apple (at least on the iphone front) has always had quite the opposite reputation, which coupled with hardware specific optimisation leads me to believe apple has a significant advantage there. (edit: this also aligns with idle and 1t power/ppw benchmarks, really can't imagine windows has an advantage here)

As for power measurement, while I don't know if the two are exactly similar, I cannot imagine there are that many ways to define core only power, which under the assumption that both are decently good at measuring what they are designed to measure means the difference there should be reasonable. I can look for more concrete sources when I get home, someone must have done os service comparisons at some point.

Covfefe · Aug 3, 2025

The spam filter isn't letting me quote anyone, but to answer some of the responses about the phoronix apache benchmark.

All the benchmarks in the article were run on MacOS, not Asahi Linux. Nor were they testing nginx.

The anecdote about Oracle is interesting, but that's not happening here. Apache was natively compiled for ARM using Apple's own toolchain.

The Apache server project contains no assembly code, so its not exactly optimized for x86, except for any compiler optimizations which the M4 would also have.

And ARM servers have been a thing for a while now, so the idea that Apache server is somehow half baked for ARM CPUs doesn't pass the smell test.

poke01 · Aug 3, 2025

Covfefe said:
The spam filter isn't letting me quote anyone, but to answer some of the responses about the phoronix apache benchmark.

All the benchmarks in the article were run on MacOS, not Asahi Linux. Nor were they testing nginx.

The anecdote about Oracle is interesting, but that's not happening here. Apache was natively compiled for ARM using Apple's own toolchain.

The Apache server project contains no assembly code, so its not exactly optimized for x86, except for any compiler optimizations which the M4 would also have.

And ARM servers have been a thing for a while now, so the idea that Apache server is somehow half baked for ARM CPUs doesn't pass the smell test.

It’s macOS that’s the problem. Linux is truly the best getting the best performance out of any CPU based tasks.

These Apple chips are so powerful that macOS is the limiter in some cases. Take a look at this, this was on M1 testing on bare Linux vs macOS.

Apple M1 Performance On Linux: Benchmarks Better Than Expected For Its Alpha State Review - Phoronix

www.phoronix.com

yottabit · Aug 3, 2025

Talking about server space, think the most feasible thing Apple could target (with minimal adjustment to their architecture) would be HPC workloads. Either with a high memory bandwidth, all P-Core die for CPU workloads or more probably an MI300 type product which could probably just be an M- Ultra with some extra IO.

I mean people are already building their own clusters of Macs for this even before LLMs took off. I doubt they have aspirations for it but it would be cool to see Apple silicon on the supercomputer charts. Also would get around their hesitancy to sell servers broadly, and be good PR. Tim Cook are you listening??

poke01 · Aug 3, 2025

gdansk said:
Best I can do right now.

MacOS 15.4.1 CPU Only Mac Studio M4.txt Benchmarks - OpenBenchmarking.org

openbenchmarking.org

So 12C Strix Halo doesn't look too bad in comparison to 12C M4 Pro? Of course, not great either if you consider power. But does at least manage to put up better scores in the chop config.

Edit: Updated with M4P 14C. It about matches 12C Strix Halo (Linux) in this test.

MacOS 15.4.1 CPU Only Mac Studio M4.txt Benchmarks - OpenBenchmarking.org

openbenchmarking.org

Thanks for testing.
I think Apple does well despite the thread count difference.

AmericanLocomotive · Aug 3, 2025

mikegg said:
1. How do we know macOS has better power management? For all we know, macOS has worse power management than Windows and Linux but the SoC carries the OS. How does OS power management lower total SoC power when running ST loads? We seem to be making a ton of assumptions without any proof.

2. How do we know that AMD's ST power efficiency is truly only measuring the core in the exact same way the Apple Silicon is? I think the best way is to take ST load power subtract idle power taken from the wall. This is how Notebookcheck does it. When power is measured this way, M4 is 3.6x more efficient than Zen5 in Cinebench. That seems way more in line with real world usage experience.

I don't believe Apple's "powermetrics" in MacOS gives any more finer-grained data then "CPU power". I do feel like not enough people talk about Apple's "uncore" advantage when comparing to other architectures in terms of efficiency. The SoC/"uncore" stuff matters hugely at idle, low power, and in ST power consumption.

The tighter you couple everything, the more efficient everything else can be. Look at how much more efficient Strix Halo in low power situations compared to Desktop Zen5, even though they both use separate chiplets. They achieved tremendous gains in efficiency just by moving the cores *closer* to the main SOC die, not even completely integrating them (yes I know the lesser mobile chips are monolithic). With the RAM being on-package with a hugely-wide bus on M-series chips, Apple takes that a step further. Short signal paths, lower power, more efficiency.

Now does all of that account for the ~2x ST ppw advantage? Unsure, but it definitely contributes.

poke01 · Aug 3, 2025

Let’s do a fun exercise. Let’s create a “Zen5 CCD” sorta chiplet, what would be its die area. Include the full caches etc needed for a hypothetical M4 16 P core CPU for the mm2 calculation.

We often hear that M4 P core is too big for DC, so it would be interesting to see people’s perspectives.

Credit for die shot is Tech insights.

511 · Aug 3, 2025

on decode both x86 vendors are set on cluster decode Coyote Cove/Zen5 atom has been doing it for ages

gdansk · Aug 3, 2025

There is an 16C N3E Zen 5 which includes a last level cache and dual GMI at a known die size (85mm²). Weibo had posted these measurements months ago. Do you really think M4 P cores with a LLC and interconnects can beat that density? In any case, it's a mythical configuration.

If you're only looking for MT performance per area (i.e. some portion of the DC market) it's hard to beat. That it also has a better v/f curve than GR, especially at low voltages, is just a bonus.

poke01 · Aug 3, 2025

gdansk said:
There is an 16C N3E Zen 5 which includes a last level cache and dual GMI at a known die size (85mm²). Do you really think M4 P cores with a LLC and interconnects can beat that density?

If you go look up the die shots you will see that when including L3 each Zen 5 core is literally 0.5x the size of Zen 5 on GR.
If you're only looking for MT performance per area (i.e. some portion of the DC market) it's hard to beat. That it also has a better v/f curve than GR, especially at low voltages, is just a bonus.

It sure is a beauty.

OneEng2 · Aug 3, 2025

gdansk said:
Edit: Updated with M4P 14C. It about matches 12C Strix Halo (Linux) in this test.

Interesting. Thanks. Note: This is with the M4 having a full node advantage.

gdansk said:
There will be many of these but this particular example might be because Linux is the only operating system anyone* uses to run a web server. If you're comparing across operating systems you might accidentally be benching IO syscalls. One can say MacOS might be unfit for DC use but I doubt it's a hardware problem.

* except the weird few who still use Windows/IIS

In general, my belief is that the x86 infrastructure (OS, utilities, mother boards, off core, interconnects, etc, etc, etc) outside the actual core structure is better developed than ARM.

Perhaps this will change in the future? For now, it seems like anything remotely resembling DC ARM gets pretty pulverized by x86.

johnsonwax said:
Nobody else is getting Apple's decode width, which may not be possible with x86 right now, and that favors single core over AMD scale SMP.

10 wide vs 8 wide. Perhaps someone can explain to me how the extra 2 would help Zen6 or NVL? In Zen 5, it seems like the paths are pretty optimized and already designed for more work than they can current efficiently perform. Perhaps I am wrong?

johnsonwax said:
So is Apple Silicon performant because Apple has so much control over the entire environment and can tune it for that environment, and is the rest of ARM and x86 less performant because their business models don't allow for that and they need to engineer to a broad set of applications such that trade offs for performance can't be well controlled?

I am certain that there are great advantages of having vertically integrated designs that x86 can't hope to achieve.

yottabit said:
Talking about server space, think the most feasible thing Apple could target (with minimal adjustment to their architecture) would be HPC workloads. Either with a high memory bandwidth, all P-Core die for CPU workloads or more probably an MI300 type product which could probably just be an M- Ultra with some extra IO.

That might not be a bad idea; however, how would an M4 compete with Threadripper that can scale up to 96 cores (and a butt ton of memory channels to feed it)?

Covfefe · Aug 3, 2025

poke01 said:
It’s macOS that’s the problem. Linux is truly the best getting the best performance out of any CPU based tasks.

These Apple chips are so powerful that macOS is the limiter in some cases. Take a look at this, this was on M1 testing on bare Linux vs macOS.

Apple M1 Performance On Linux: Benchmarks Better Than Expected For Its Alpha State Review - Phoronix

www.phoronix.com

View attachment 128170
View attachment 128172

I saw those Linux M2 and M1 articles. I figured they weren't worth sharing for a few reasons. They're 2+ year out of date , the core counts aren't directly comparable, and most importantly the software may be optimized for x86 more than ARM. But since you brought it up, this is how the M2 compared against its x86 counterpart.

In terms of the Apache benchmark, we can't say with certainty that its all due to MacOS. Will OS make a difference in some benchmarks? undoubtedly. Is it enough to give the M4 a 3x speedup and catch the 9600x? maybe, but probably not. At the very least I don't think its fair to blame MacOS whenever the M4 loses.

Question x86 and ARM architectures comparison thread.

Senior member

Golden Member

Diamond Member

Member

Diamond Member

Member

Senior member

Senior member

Golden Member

Member

Golden Member

Junior Member

Golden Member

Junior Member

Member

Diamond Member

Golden Member

Diamond Member

Member

Diamond Member

Attachments

Diamond Member

Diamond Member

Diamond Member

Senior member

Member