Question x86 and ARM architectures comparison thread.

Page 8 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

OneEng2

Senior member
Sep 19, 2022
725
970
106
Yes buts that only if you ignore the number of physical cores and logical cores differences. But that’s the point, you cannot do so when comparing uArch against real world nT applications.
Why?

I guarantee that people will be judging Zen 6's 24 cores against Intel's 52. There is nothing unfair about that IMO.

In fact, that is going to be a very interesting comparison since it is likely both processors will be built on the same N2 process!

It isn't how many processors you have, it's what you pay for it, how well it works, and what it costs the company to make.

Also, as far as efficiency goes, this is only important to a certain extent. In the DC market (as an example) it is still about how much performance you can get from the amount of power that you can supply to a socket. If the socket can supply (Venice will have 600W / socket as an example). I don't think anyone gives you credit if you only use 400W but lose the performance by 65%.
 
  • Like
Reactions: booklib28 and Tlh97

Geddagod

Golden Member
Dec 28, 2021
1,430
1,541
106
1754186405419.png
Zen 4 and Zen 5 both only have perf/watt ~matching M1.
1754186783617.png
A hypothetical Zen 5 on N3E will likely not give you the ~30% perf/watt jump to match the M4 in single thread perf/watt.
Credit to David Huang for both tests
 

gdansk

Diamond Member
Feb 8, 2011
4,330
7,254
136
Use this. https://www.seense.com/menubarstats/mxpg/

It uses macOS powermetrics.
Best I can do right now.
So 12C Strix Halo doesn't look too bad in comparison to 12C M4 Pro? Of course, not great either if you consider power. But does at least manage to put up better scores in the chop config.

Edit: Updated with M4P 14C. It about matches 12C Strix Halo (Linux) in this test.

 
Last edited:

Covfefe

Junior Member
Jul 23, 2025
11
20
36
But a lot of people here seem to trying to argue that ARM cores in general, or Apple's in particular, are somehow unsuitable for DC. That's ridiculous on its face. No one can point to a benchmark that shows Apple cores as not being appropriate for DC tasks. Test x number of Apple P cores against the same number of x86 cores P cores and unless you're talking tasks that "just happen" to be all about AVX512 (or alternatively all about SVE2) you can't find any big difference in either direction. The only thing people can point to is "well x86 scales up to 192 cores and Apple doesn't" trying to imply that this is proof that Apple can't. That's just ignorant reasoning. It is the exact same reasoning people used to use claiming Apple's cores weren't appropriate for PCs, because Apple was only using them in phones.
Phoronix's M4 benchmarks have several examples where the it loses badly to x86 CPUs.


1000007767.jpg
 

gdansk

Diamond Member
Feb 8, 2011
4,330
7,254
136
Phoronix's M4 benchmarks have several examples where the it loses badly to x86 CPUs.


View attachment 128153
There will be many of these but this particular example might be because Linux is the only operating system anyone* uses to run a web server. If you're comparing across operating systems you might accidentally be benching IO syscalls. One can say MacOS might be unfit for DC use but I doubt it's a hardware problem.

* except the weird few who still use Windows/IIS
 
  • Like
Reactions: Gideon and Tlh97

Covfefe

Junior Member
Jul 23, 2025
11
20
36
There will be many of these but this particular example might be because Linux is the only operating system anyone* uses to run a web server. If you're comparing across operating systems you might accidentally be benching IO syscalls. One can say MacOS might be unfit for DC use but I doubt it's a hardware problem.

* except the weird few who still use Windows/IIS
Could be, but its worth noting that macos does (or did) ship with a version of apache server natively installed. While that doesn't guarantee perfectly optimized software, it does show that Apple has some level of confidence in their platform as a web server.
 
  • Like
Reactions: Tlh97

johnsonwax

Senior member
Jun 27, 2024
252
410
96
It's hard to compare microarchitecture when AMD is behind on process by their own profiteering choice. But people will always do it anyway because that's what we can.

All I ask is that the ARM fans try to understand that AMD has more or less slapped all ARM server attempts back to the safety of the hypervisors big enough to exploit ARM's discount IP. And why that might be. One contributing factor is that Apple is the exception to ARM implementations so far. Maybe Nuvia/Qualcomm can also pull it off but they haven't yet. It seems just as much a matter of time as it did five years ago, which is odd for something inevitable.
My argument has always been that there is a lesson in why Apple is the exception to ARM implementations.

It's obviously not an intrinsic feature of the ARM ISA or else all implementations would show it, but it's possibly an enabling feature. Nobody else is getting Apple's decode width, which may not be possible with x86 right now, and that favors single core over AMD scale SMP. And what are we considering in this comparison? Memory architecture? Asymmetric/heterogeneous cores?

I have argued that the reason why Apple Silicon is the exception is more due to business model than engineering. For example, how did Apple get a 5x improvement on releasing memory in M1 over x86? Is that an inherent property of the design or is that something they sought because it was more critical to performance than it would be on x86, and how did it come to be that was more critical?

There's been a debate raging in F1 regarding the Red Bull car performance difference between their #1 and #2 drivers. See, their #1 driver Max Verstappen does great in the car, and their #2 drivers have been generally terrible - like really terrible - averaging something like 10 positions behind Max. What does this say about the performance of the car relative to other teams? One driver can finish on the podium and the other can't even get in the points - and they've gone through 3 drivers like this. Nobody has ever seen anything like it. The leading theory is that the car is pretty mid but if you set it up precisely for Maxs driving style and talent he, and only he, can get good results out of it. Essentially, the car is mid, but has the capacity to be great, but only in a controlled environment.

So is Apple Silicon performant because Apple has so much control over the entire environment and can tune it for that environment, and is the rest of ARM and x86 less performant because their business models don't allow for that and they need to engineer to a broad set of applications such that trade offs for performance can't be well controlled? Because Apple better controls the nature of how the CPU performs, they can make tradeoffs in favor of what it is most likely to be doing and how. So it can accept worse performance in areas that it sees infrequently in exchange for better performance in areas it sees frequently. By comparison, the component suppliers don't have that agency and need to balance performance across all potential situations which means they have to forgo that peak potential. They make up for some of that with their SKU spam by having variants that are better suited for some applications than others, but ultimately can't fully make up for it.

Microsoft quite a while ago figured out that they needed to follow Apple. So I think they by and large understand the benefits of Apple's model and their arrangement with Qualcomm could get them there. But I'm not sure Microsoft is able to go as far as Apple has. They don't have the degree of influence over their developers. x86 is still the primary business. But we'll see.
 
  • Like
Reactions: Tlh97

johnsonwax

Senior member
Jun 27, 2024
252
410
96
Could be, but its worth noting that macos does (or did) ship with a version of apache server natively installed. While that doesn't guarantee perfectly optimized software, it does show that Apple has some level of confidence in their platform as a web server.
I've run web servers on MacOS. There's lots of different web profiles. 250K requests per second is a high volume static site, fetching pages from cache. Nobody doing that is picking Apple. Hell, nobody doing that is doing it on prem. And Apache is not in any way optimized for Apple Silicon. It's there as a convenience. Docker is similarly terrible, which is why Apple has their own solution. I would not be surprised if Apple's was dramatically more performant, mainly because these kinds of applications really rely on very tightly optimized code.

I've run Nginx as a front-end for data science tools. Low requests per second, performance mixed to SSD read and data compute. Ran pretty well. Wanted it on prem inside the firewalls. That's much closer to the use case for a web server on Mac.

Your profile reminded me that Apple also shipped Java on MacOS for a year that wasn't Apple Silicon native because they had a contract with Oracle who couldn't get their act together. Literally everyone installed Azul and still does. Just because Apple ships it doesn't mean that it's good - sometimes that's what the contract is.
 
  • Like
Reactions: 511

mikegg

Golden Member
Jan 30, 2010
1,903
521
136
View attachment 128141
Zen 4 and Zen 5 both only have perf/watt ~matching M1.
View attachment 128142
A hypothetical Zen 5 on N3E will likely not give you the ~30% perf/watt jump to match the M4 in single thread perf/watt.
Credit to David Huang for both tests
What is the actual SoC power? I'm not convinced that Zen5 is only 30% away from M4. How does David Huang measure power? Is it through the wall doing load - idle for both Zen5 and M4?

Cinebench ST perf/watt:

M4 Pro: 9.52 pts/W
Strix Halo 395: 2.62 pts/W

3.6x better ST perf/watt is closer to the real world experience of using a Zen5 laptop vs an M4 laptop.
 
Last edited:

S'renne

Member
Oct 30, 2022
143
108
86
I've run web servers on MacOS. There's lots of different web profiles. 250K requests per second is a high volume static site, fetching pages from cache. Nobody doing that is picking Apple. Hell, nobody doing that is doing it on prem. And Apache is not in any way optimized for Apple Silicon. It's there as a convenience. Docker is similarly terrible, which is why Apple has their own solution. I would not be surprised if Apple's was dramatically more performant, mainly because these kinds of applications really rely on very tightly optimized code.

I've run Nginx as a front-end for data science tools. Low requests per second, performance mixed to SSD read and data compute. Ran pretty well. Wanted it on prem inside the firewalls. That's much closer to the use case for a web server on Mac.

Your profile reminded me that Apple also shipped Java on MacOS for a year that wasn't Apple Silicon native because they had a contract with Oracle who couldn't get their act together. Literally everyone installed Azul and still does. Just because Apple ships it doesn't mean that it's good - sometimes that's what the contract is.
Yeah pretty sure part of what makes Apple Silicon works so well is also from the MacOS optimisations for the target audience
 

Anacapols

Junior Member
Mar 2, 2025
2
2
41
Replying to Milkegg
It's stated in the figure that it's core (+cache for z4/5) power only, which is in line with zen generally being more ppw competitive in larger many threaded loads as the soc power is less of an overhead there.

My understanding is that apple has a decent core ppw advantage (0 to 50% depending on the workload) and a very large soc and operating system power management advantage that shows itself in 1t and lighter tasks (daily use).
 
  • Like
Reactions: Gideon

mikegg

Golden Member
Jan 30, 2010
1,903
521
136
My understanding is that apple has a decent core ppw advantage (0 to 50% depending on the workload) and a very large soc and operating system power management advantage that shows itself in 1t and lighter tasks (daily use).
1. How do we know macOS has better power management? For all we know, macOS has worse power management than Windows and Linux but the SoC carries the OS. How does OS power management lower total SoC power when running ST loads? We seem to be making a ton of assumptions without any proof.

2. How do we know that AMD's ST power efficiency is truly only measuring the core in the exact same way the Apple Silicon is? I think the best way is to take ST load power subtract idle power taken from the wall. This is how Notebookcheck does it. When power is measured this way, M4 is 3.6x more efficient than Zen5 in Cinebench. That seems way more in line with real world usage experience.
 
Last edited:

Anacapols

Junior Member
Mar 2, 2025
2
2
41
Windows is quite infamous for its background services and bloat, whereas apple (at least on the iphone front) has always had quite the opposite reputation, which coupled with hardware specific optimisation leads me to believe apple has a significant advantage there. (edit: this also aligns with idle and 1t power/ppw benchmarks, really can't imagine windows has an advantage here)

As for power measurement, while I don't know if the two are exactly similar, I cannot imagine there are that many ways to define core only power, which under the assumption that both are decently good at measuring what they are designed to measure means the difference there should be reasonable. I can look for more concrete sources when I get home, someone must have done os service comparisons at some point.
 
Last edited:
  • Like
Reactions: OneEng2

Covfefe

Junior Member
Jul 23, 2025
11
20
36
The spam filter isn't letting me quote anyone, but to answer some of the responses about the phoronix apache benchmark.

All the benchmarks in the article were run on MacOS, not Asahi Linux. Nor were they testing nginx.

The anecdote about Oracle is interesting, but that's not happening here. Apache was natively compiled for ARM using Apple's own toolchain.

The Apache server project contains no assembly code, so its not exactly optimized for x86, except for any compiler optimizations which the M4 would also have.

And ARM servers have been a thing for a while now, so the idea that Apache server is somehow half baked for ARM CPUs doesn't pass the smell test.
 
  • Like
Reactions: OneEng2 and Gideon

poke01

Diamond Member
Mar 8, 2022
3,877
5,203
106
The spam filter isn't letting me quote anyone, but to answer some of the responses about the phoronix apache benchmark.

All the benchmarks in the article were run on MacOS, not Asahi Linux. Nor were they testing nginx.

The anecdote about Oracle is interesting, but that's not happening here. Apache was natively compiled for ARM using Apple's own toolchain.

The Apache server project contains no assembly code, so its not exactly optimized for x86, except for any compiler optimizations which the M4 would also have.

And ARM servers have been a thing for a while now, so the idea that Apache server is somehow half baked for ARM CPUs doesn't pass the smell test.
It’s macOS that’s the problem. Linux is truly the best getting the best performance out of any CPU based tasks.

These Apple chips are so powerful that macOS is the limiter in some cases. Take a look at this, this was on M1 testing on bare Linux vs macOS.


IMG_2348.jpeg
IMG_2350.jpeg
 

yottabit

Golden Member
Jun 5, 2008
1,633
761
146
Talking about server space, think the most feasible thing Apple could target (with minimal adjustment to their architecture) would be HPC workloads. Either with a high memory bandwidth, all P-Core die for CPU workloads or more probably an MI300 type product which could probably just be an M- Ultra with some extra IO.

I mean people are already building their own clusters of Macs for this even before LLMs took off. I doubt they have aspirations for it but it would be cool to see Apple silicon on the supercomputer charts. Also would get around their hesitancy to sell servers broadly, and be good PR. Tim Cook are you listening??
 
  • Like
Reactions: Gideon and poke01

poke01

Diamond Member
Mar 8, 2022
3,877
5,203
106
Best I can do right now.
So 12C Strix Halo doesn't look too bad in comparison to 12C M4 Pro? Of course, not great either if you consider power. But does at least manage to put up better scores in the chop config.

Edit: Updated with M4P 14C. It about matches 12C Strix Halo (Linux) in this test.

Thanks for testing.
I think Apple does well despite the thread count difference.
 
Apr 30, 2020
69
173
106
1. How do we know macOS has better power management? For all we know, macOS has worse power management than Windows and Linux but the SoC carries the OS. How does OS power management lower total SoC power when running ST loads? We seem to be making a ton of assumptions without any proof.

2. How do we know that AMD's ST power efficiency is truly only measuring the core in the exact same way the Apple Silicon is? I think the best way is to take ST load power subtract idle power taken from the wall. This is how Notebookcheck does it. When power is measured this way, M4 is 3.6x more efficient than Zen5 in Cinebench. That seems way more in line with real world usage experience.
I don't believe Apple's "powermetrics" in MacOS gives any more finer-grained data then "CPU power". I do feel like not enough people talk about Apple's "uncore" advantage when comparing to other architectures in terms of efficiency. The SoC/"uncore" stuff matters hugely at idle, low power, and in ST power consumption.

The tighter you couple everything, the more efficient everything else can be. Look at how much more efficient Strix Halo in low power situations compared to Desktop Zen5, even though they both use separate chiplets. They achieved tremendous gains in efficiency just by moving the cores *closer* to the main SOC die, not even completely integrating them (yes I know the lesser mobile chips are monolithic). With the RAM being on-package with a hugely-wide bus on M-series chips, Apple takes that a step further. Short signal paths, lower power, more efficiency.

Now does all of that account for the ~2x ST ppw advantage? Unsure, but it definitely contributes.
 

poke01

Diamond Member
Mar 8, 2022
3,877
5,203
106
Let’s do a fun exercise. Let’s create a “Zen5 CCD” sorta chiplet, what would be its die area. Include the full caches etc needed for a hypothetical M4 16 P core CPU for the mm2 calculation.

We often hear that M4 P core is too big for DC, so it would be interesting to see people’s perspectives.

Credit for die shot is Tech insights.
IMG_2351.jpeg
 

Attachments

  • IMG_2352.jpeg
    IMG_2352.jpeg
    81.1 KB · Views: 8

511

Diamond Member
Jul 12, 2024
3,240
3,176
106
on decode both x86 vendors are set on cluster decode Coyote Cove/Zen5 atom has been doing it for ages
 

gdansk

Diamond Member
Feb 8, 2011
4,330
7,254
136
There is an 16C N3E Zen 5 which includes a last level cache and dual GMI at a known die size (85mm²). Weibo had posted these measurements months ago. Do you really think M4 P cores with a LLC and interconnects can beat that density? In any case, it's a mythical configuration.

If you're only looking for MT performance per area (i.e. some portion of the DC market) it's hard to beat. That it also has a better v/f curve than GR, especially at low voltages, is just a bonus.
 
Last edited:

poke01

Diamond Member
Mar 8, 2022
3,877
5,203
106
There is an 16C N3E Zen 5 which includes a last level cache and dual GMI at a known die size (85mm²). Do you really think M4 P cores with a LLC and interconnects can beat that density?

If you go look up the die shots you will see that when including L3 each Zen 5 core is literally 0.5x the size of Zen 5 on GR.
If you're only looking for MT performance per area (i.e. some portion of the DC market) it's hard to beat. That it also has a better v/f curve than GR, especially at low voltages, is just a bonus.
IMG_2353.png

It sure is a beauty.
 

OneEng2

Senior member
Sep 19, 2022
725
970
106
Edit: Updated with M4P 14C. It about matches 12C Strix Halo (Linux) in this test.
Interesting. Thanks. Note: This is with the M4 having a full node advantage.
There will be many of these but this particular example might be because Linux is the only operating system anyone* uses to run a web server. If you're comparing across operating systems you might accidentally be benching IO syscalls. One can say MacOS might be unfit for DC use but I doubt it's a hardware problem.

* except the weird few who still use Windows/IIS
In general, my belief is that the x86 infrastructure (OS, utilities, mother boards, off core, interconnects, etc, etc, etc) outside the actual core structure is better developed than ARM.

Perhaps this will change in the future? For now, it seems like anything remotely resembling DC ARM gets pretty pulverized by x86.
Nobody else is getting Apple's decode width, which may not be possible with x86 right now, and that favors single core over AMD scale SMP.
10 wide vs 8 wide. Perhaps someone can explain to me how the extra 2 would help Zen6 or NVL? In Zen 5, it seems like the paths are pretty optimized and already designed for more work than they can current efficiently perform. Perhaps I am wrong?
So is Apple Silicon performant because Apple has so much control over the entire environment and can tune it for that environment, and is the rest of ARM and x86 less performant because their business models don't allow for that and they need to engineer to a broad set of applications such that trade offs for performance can't be well controlled?
I am certain that there are great advantages of having vertically integrated designs that x86 can't hope to achieve.

Talking about server space, think the most feasible thing Apple could target (with minimal adjustment to their architecture) would be HPC workloads. Either with a high memory bandwidth, all P-Core die for CPU workloads or more probably an MI300 type product which could probably just be an M- Ultra with some extra IO.
That might not be a bad idea; however, how would an M4 compete with Threadripper that can scale up to 96 cores (and a butt ton of memory channels to feed it)?
 

Covfefe

Junior Member
Jul 23, 2025
11
20
36
It’s macOS that’s the problem. Linux is truly the best getting the best performance out of any CPU based tasks.

These Apple chips are so powerful that macOS is the limiter in some cases. Take a look at this, this was on M1 testing on bare Linux vs macOS.


View attachment 128170
View attachment 128172
I saw those Linux M2 and M1 articles. I figured they weren't worth sharing for a few reasons. They're 2+ year out of date , the core counts aren't directly comparable, and most importantly the software may be optimized for x86 more than ARM. But since you brought it up, this is how the M2 compared against its x86 counterpart.

1754242115871.jpeg

In terms of the Apache benchmark, we can't say with certainty that its all due to MacOS. Will OS make a difference in some benchmarks? undoubtedly. Is it enough to give the M4 a 3x speedup and catch the 9600x? maybe, but probably not. At the very least I don't think its fair to blame MacOS whenever the M4 loses.
 
  • Like
Reactions: Schmide