Solved! ARM Apple High-End CPU - Intel replacement

Page 49 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Richie Rich

Senior member
Jul 28, 2019
470
229
76
There is a first rumor about Intel replacement in Apple products:
  • ARM based high-end CPU
  • 8 cores, no SMT
  • IPC +30% over Cortex A77
  • desktop performance (Core i7/Ryzen R7) with much lower power consumption
  • introduction with new gen MacBook Air in mid 2020 (considering also MacBook PRO and iMac)
  • massive AI accelerator

Source Coreteks:
 
  • Like
Reactions: vspalanki
Solution
What an understatement :D And it looks like it doesn't want to die. Yet.


Yes, A13 is competitive against Intel chips but the emulation tax is about 2x. So given that A13 ~= Intel, for emulated x86 programs you'd get half the speed of an equivalent x86 machine. This is one of the reasons they haven't yet switched.

Another reason is that it would prevent the use of Windows on their machines, something some say is very important.

The level of ignorance in this thread would be shocking if it weren't depressing.
Let's state some basics:

(a) History. Apple has never let backward compatibility limit what they do. They are not Intel, they are not Windows. They don't sell perpetual compatibility as a feature. Christ, the big...

itsmydamnation

Platinum Member
Feb 6, 2011
2,762
3,131
136
Zen2 has 2x 256 bit AVX2 SIMD units, so really only X1 from ARM themselves can compare to it directly now that it has 4x 128 bit NEON units.

I'll be really interested to see how those 2 compare on SIMD heavy loads once a server oriented variant of X1 is wrought in silicon.
Zen 2 has 4x 256bit AVX2 SIMD units, but they are not symmetric, Zen2 can do 2x 256bit FMA a cycle.
 

Gideon

Golden Member
Nov 27, 2007
1,619
3,645
136
Which is exactly the point. If it runs so well, then it means that Rosetta2 is way better at compiling the code, and the performance hit is way lower, than it is from fellow forum members calculations.

So that 800 points may not be 75% like people WANT it to be, but for example 90-95% of performance of Apple silicon on MacOS.

And it may be 1%, too. That has just as much evidence as 90-95%. Nobody gets 90%+ of native perf in a translation environment, even a really good one.

Go back to proclaiming that Apple is never, ever going to bring the Mac to ARM. That was more entertaining.

We actually have enough evidence to conclude that it must be very close to 75%:

1. We have A12Z SoC IPad Pro test results for geekbench running at 2.5 Ghz (just add .gb5 to the end to see clock rates in JSON format)
2. We have the A12Z Rosetta results on MacOS which (despite listing 2.4 Ghz) is running essentially at the same frequency of 2.5 Ghz(.gb5 link to compare)

Here is the human readabe comparison link of both results:

arm: 1115 ST / 4670 MT (4x small cores + 4x BIG cores)
x86: 844 ST / 2943 MT (only 4x BIG cores)

Bear in mind iOS multithreaded result is also using the small cores (8 in total) while the MacOS result is not, so these results are not directly comparable.
Single threaded results however are comparable and amount to:
844 / 1115 = 0.7569, which is ... 75.69%


That's the very same SoC, at the same frequency. So unless you're claiming that the same SoC is somehow faster in the Ipad than in the developer kit, Rosetta cannot really be more efficient than ~75% tops.
 

SarahKerrigan

Senior member
Oct 12, 2014
352
506
136
We actually have enough evidence to conclude that it must be very close to 75%:

1. We have A12Z SoC IPad Pro test results for geekbench running at 2.5 Ghz (just add .gb5 to the end to see clock rates in JSON format)
2. We have the A12Z Rosetta results on MacOS which (despite listing 2.4 Ghz) is running essentially at the same frequency of 2.5 Ghz(.gb5 link to compare)

Here is the human readabe comparison link of both results:

arm: 1115 ST / 4670 MT (4x small cores + 4x BIG cores)
x86: 844 ST / 2943 MT (only 4x BIG cores)

Bear in mind iOS multithreaded result is also using the small cores (8 in total) while the MacOS result is not, so these results are not directly comparable.
Single threaded results however are comparable and amount to:
844 / 1115 = 0.7569, which is ... 75.69%

Yep. But our friend Glo is insisting that it's actually because iOS is magically making benchmarks run faster because it's "heavily optimized for the hardware" and that won't apply to macOS. It reeks of desperation from someone who doesn't want to believe Apple is actually performance-competitive and who never thought we'd even be having this conversation.

I don't personally like Geekbench or think it means much, but it's better than nothing, and what we're seeing right now indicates a highly capable core and very good translation performance.
 

Gideon

Golden Member
Nov 27, 2007
1,619
3,645
136
As for the Tomb Raider arguments, these are also grasping at straws.

The game was running at a low resolution and detail-level at around 30 FPS. At such low FPS that game can be quite heavily GPU bound. Let's not forget it runs on Jaguar cores on consoles. Getting it to ~30FPS at these setting shouldn't be a problem for even ~70% of A12Z performance.

Luckily Anandtech has Bench numbers for the game. Let's see:

A dual-core Ivy Bridge Intel Core i3-3225 (2C/4T, 55W, 3.3 Ghz, Ivy Bridge) coupled with a 1080 GTX gets:
38.0 FPS @ 1080p Low.
28.4 FPS @ 4K Ultra (so, extermely CPU bound)

Newer dual-core cpus at 1080p low get:
63.1 FPS for 3.4 Ghz Core i3-7100T
57.7 FPS for 3.2 Ghz AMD Athllon 200GE

A 4-core A12Z can surely hold it's own agaist a dual core Ivy Bridge in these CPU limited settins (~30FPS low detail/res) even when running at 70% efficiency. Especially when considering how well this game scales with cores.
 
Last edited:

Gideon

Golden Member
Nov 27, 2007
1,619
3,645
136
Yep. But our friend Glo is insisting that it's actually because iOS is magically making benchmarks run faster because it's "heavily optimized for the hardware" and that won't apply to macOS.
(emphasis mine)

That's a particularily misguided belief considering MacOS 11 is essentially iOS with a more capable GUI (the kernel is the same and there are even rumors of iphones running full-fledged macOS). Why on earth would it not get the same optimizations for tha same CPUs?
 
  • Like
Reactions: Tlh97

SarahKerrigan

Senior member
Oct 12, 2014
352
506
136
(emphasis mine)

That's a particularily misguided belief considering MacOS 11 is essentially iOS with a more capable GUI (the kernel is the same and there are even rumors of iphones running full-fledged macOS). Why on earth would it not get the same optimizations for tha same CPUs?

Indeed. Thus the full binary compatibility between iOS and macOS/ARM.

Additionally, both GB and SPEC attempt to avoid syscalls in critical loops precisely to take the OS out of the equation as much as possible. The idea that macOS is imposing some kind of massive overhead versus iOS is ridiculous.
 

Richie Rich

Senior member
Jul 28, 2019
470
229
76
Perhaps I had misjudged the IPC of modern phones. Out of curiosity I ran our test suite from work. It's single-threaded and essentially matches HTTP requests to firewall actions. I was able to run it on an (Android) Snapdragon 865 phone and a (Debian) desktop with LuaJIT v2.1.

Snapdragon 865 (~2.84 GHz) took on average of three runs: 81.74s (projected at 4.2GHz 55.27s)
R5 3600 (~4.2 GHz) took on average of three runs: 46.94s

So A77 is closer than I expected to Zen2 but of course it's always what you do with it.
Almost everybody misjudge the IPC of latest ARMs and especially Apple's cores. I lived in heavy denial too until somebody pointed out that Apple A12 has 6xALU so I forced myself to read article about A12. I expected a lot of laugh and prove this Apple ARM is really slow snail like every other ARM core. So don't worry, it hurts to accept that iPhone has faster ST performance than Ryzen 3950@4.6 GHz :D

Your results are interesting, showing A77 achieves 85% IPC of Zen2.
  • According to SPECint2006................. A77 achieves 108% IPC of Zen2.
  • According to GeekBench5.1............... A77 achieves 103% IPC of Zen2.
  • .............................................................. A78 achieves 110% IPC of Zen2 (+7% IPC, -5% area of A77).
  • ................................................................ X1 achieves 133% IPC of Zen2 (+30% IPC, +50 area of A77).

Probably Snapdragon 875 based on Cortex X1 will beat Zen2 even in your test suite. However Apple A13 is different beast.

  • According to GeekBench5.1................. A13 achieves 182% IPC of Zen2.
  • According to GeekBench5.1................. A14 achieves 209% IPC of Zen2 (assuming +15% IPC jump).

  • In smart phones A14 will increase clocks from 2.65 GHz to 2.8 GHz (same 5% uplift like Cortex A78 2.84 -> 3.0 GHz).
  • But in laptops A14X can use full of those 15% clock uplift available from 5nm process..... 2.65*1.15= 3.05 Ghz.
Projected GeekBench A14X score is (A13's 502 * 1.15 * 3.05) ............ 1760 pts in ST.
A14X Rosseta2 emulation
0.75% efficiency 1760 * 0.75 = 1320 pts.

A14X could beat most desktop Ryzens/Intels despite of runing x86 emulation. It's like Matrix became true for x86.
 
Last edited:

Eug

Lifer
Mar 11, 2000
23,586
1,000
126
Projected GeekBench A14X score is (A13's 502 * 1.15 * 3.05) ............ 1760 pts in ST.
A14X Rosseta2 emulation
0.75% efficiency 1760 * 0.75 = 1320 pts.

A14X could beat most desktop Ryzens/Intels despite of runing x86 emulation. It's like Matrix became true for x86.
That seems a bit optimistic.

Pulling numbers outta my azz, I'll predict A14X achieves 1550 ST, 6200 MT native.

Anything above 1500 would be pretty damn impressive though.
 
Last edited:
  • Like
Reactions: Tlh97 and defferoo

Richie Rich

Senior member
Jul 28, 2019
470
229
76
That seems a bit optimistic.

Pulling numbers outta my azz, I'll predict A14X achieves 1550 ST, 6200 MT native.

Anything above 1500 would be pretty damn impressive though.
Your numbers are outta your azz, but not mine.
My prediction is based on GeekBench PPC/IPC results: https://forums.anandtech.com/thread...u-cores-ipc-ppc-comparison.2580622/?view=date
  • A9 ...... 305 pts/GHz
  • A10 .... 329 pts/GHz .... 8% IPC uplift
  • A11 .... 390 pts/GHz .... 19% IPC uplift
  • A12 .... 441 pts/GHz .... 13% IPC uplift
  • A13 .... 502 pts/GHz .... 14% IPC uplift

  • A14 .... conservative 10% IPC uplift is 552 pts/GHz ...... and iPhone freq jump 2.65 -> 2.8 Ghz .... 552*2.8 = 1545 pts
  • A14 .... conservative 10% IPC uplift is 552 pts/GHz ...... and LAPTOP freq jump 2.65 -> 3.0 Ghz .... 552*3 = 1656 pts

1600 pts for A14X is expected conservative value. Don't forget that SPECfp2006 shows that FPU is weaker (around 160% Zen2) than INT IPC (184% Zen2) so they may add fourth FPU just like Cortex X1. And if A14 has SVE2 support the IPC uplift in FPU heavy tasks could be easily 15%. Don't forget A14 was planned from start for Laptops so stronger FPU is also expected for such a deployment.
 

gdansk

Platinum Member
Feb 8, 2011
2,078
2,559
136
Almost everybody misjudge the IPC of latest ARMs and especially Apple's cores. I lived in heavy denial too until somebody pointed out that Apple A12 has 6xALU so I forced myself to read article about A12. I expected a lot of laugh and prove this Apple ARM is really slow snail like every other ARM core. So don't worry, it hurts to accept that iPhone has faster ST performance than Ryzen 3950@4.6 GHz :D

Your results are interesting, showing A77 achieves 85% IPC of Zen2.
  • According to SPECint2006................. A77 achieves 108% IPC of Zen2.
  • According to GeekBench5.1............... A77 achieves 103% IPC of Zen2.
  • .............................................................. A78 achieves 110% IPC of Zen2 (+7% IPC, -5% area of A77).
  • ................................................................ X1 achieves 133% IPC of Zen2 (+30% IPC, +50 area of A77).
I consider those all to be highly optimistic estimates. Graviton2 is Neoverse N1 and doesn't approach Zen 2 in "real world" benchmarks. And Graviton2 is said to run at higher nominal clocks (2.5 vs 2.25 GHz, though I don't know the actual clock rates achieved). Zen 2 is on average 51% faster at nominally lower clock speeds over 143 different benchmarks (much more than SPECint or Geekbench).
 

Eug

Lifer
Mar 11, 2000
23,586
1,000
126
Your numbers are outta your azz, but not mine.
My prediction is based on GeekBench PPC/IPC results: https://forums.anandtech.com/thread...u-cores-ipc-ppc-comparison.2580622/?view=date
  • A9 ...... 305 pts/GHz
  • A10 .... 329 pts/GHz .... 8% IPC uplift
  • A11 .... 390 pts/GHz .... 19% IPC uplift
  • A12 .... 441 pts/GHz .... 13% IPC uplift
  • A13 .... 502 pts/GHz .... 14% IPC uplift

  • A14 .... conservative 10% IPC uplift is 552 pts/GHz ...... and iPhone freq jump 2.65 -> 2.8 Ghz .... 552*2.8 = 1545 pts
  • A14 .... conservative 10% IPC uplift is 552 pts/GHz ...... and LAPTOP freq jump 2.65 -> 3.0 Ghz .... 552*3 = 1656 pts

1600 pts for A14X is expected conservative value. Don't forget that SPECfp2006 shows that FPU is weaker (around 160% Zen2) than INT IPC (184% Zen2) so they may add fourth FPU just like Cortex X1. And if A14 has SVE2 support the IPC uplift in FPU heavy tasks could be easily 15%. Don't forget A14 was planned from start for Laptops so stronger FPU is also expected for such a deployment.
They're outta your azz too because you're making up A14 "uplifts" by vigorous handwaving.
 

SarahKerrigan

Senior member
Oct 12, 2014
352
506
136
I consider those all to be highly optimistic estimates. Graviton2 is Neoverse N1 and doesn't approach Zen 2 in "real world" benchmarks. And Graviton2 is said to run at higher nominal clocks (2.5 vs 2.25 GHz, though I don't know the actual clock rates achieved). Zen 2 is on average 51% faster at nominally lower clock speeds over 143 different benchmarks (much more than SPECint or Geekbench).

Three points.

  • Some of those tests have x86-specific optimizations.
  • The Epyc has SMT; the N1 doesn't. That makes no difference for ST perf but has a significant impact on MT.
  • The Grav2 is, for some MT workloads, pretty badly limited by having a 32MB L3. This isn't a reflection on the N1 core itself.
 

gdansk

Platinum Member
Feb 8, 2011
2,078
2,559
136
The Epyc has SMT; the N1 doesn't. That makes no difference for ST perf but has a significant impact on MT.
Even without SMT it's 46% faster on average, Phoronix considered it and ran Epyc with and without SMT.

Some of those tests have x86-specific optimizations.
Yes, software compatibility is x64's defining feature. How nice it is not to have to rewrite and re-optimize your entire stack.
 
  • Like
Reactions: Tlh97

SarahKerrigan

Senior member
Oct 12, 2014
352
506
136
Even without SMT it's 46% faster on average, Phoronix considered it and ran Epyc with and without SMT.


Yes, software compatibility is x64's defining feature. How nice it is not to have to rewrite and re-optimize your entire stack.

So Epyc2 is only 3.4% faster with SMT than without (1.51 / 1.46)? Seems legit.
 

gdansk

Platinum Member
Feb 8, 2011
2,078
2,559
136
So Epyc2 is only 3.4% faster with SMT than without (1.51 / 1.46)? Seems legit.
Many of the tests are not multithreaded, and many more suffer beyond 64 threads (due to bus contention and lower achieved clock speeds) to the extent that the Epyc with SMT is actually slower in some multi-threaded tests and most single-threaded tests.

A better argument would be that perhaps it is more power efficient.
 
Last edited:

SarahKerrigan

Senior member
Oct 12, 2014
352
506
136
Many of the tests are not multithreaded, and many more suffer beyond 64 threads due to bus contention to the extent that the Epyc with SMT is slower in some tests.

I'm avoiding a sarcastic reply here; I will simply note that comparing 64-core chips, one of which has eight times the LLC of the other, on a set of tests including Coremark (seriously?) and Stress-NG is not what I would call a conclusive way to compare the core performance of two microarchitectures.

I've tended to see higher iso-clock perf on the loads I've run on N1's than on Zen2. That's almost entirely integer, but it's still the data I have to work with.
 

gdansk

Platinum Member
Feb 8, 2011
2,078
2,559
136
I'm avoiding a sarcastic reply here; I will simply note that comparing 64-core chips, one of which has eight times the LLC of the other, on a set of tests including Coremark (seriously?) and Stress-NG is not what I would call a conclusive way to compare the core performance of two microarchitectures.

I've tended to see higher iso-clock perf on the loads I've run on N1's than on Zen2. That's almost entirely integer, but it's still the data I have to work with.
I'm avoiding a sarcastic reply: Your workload is limited. Far less than a wide selection of tests included in the open benchmarking suite. Of course, the applicability of the results depends on your workload which you seem to know. So in your case, wonderful. In other cases, it's worth investigating but don't expect the miracles promised by Geekbench.
 

SarahKerrigan

Senior member
Oct 12, 2014
352
506
136
I'm avoiding a sarcastic reply: Your workload is limited. Far less than a wide selection of tests included in the open benchmarking suite. Of course, the applicability of the results depends on your workload which you seem to know. So in your case, wonderful. In other cases, it's worth investigating but don't expect the miracles promised by Geekbench.

Let's take a look at some of the ST tests in the Phoronix tests, then. Comparing Epyc 7352 (3.2GHz ST boost) to Grav2 (2.5GHz fixed), here's what we find...

PHPbench: Epyc2 does 22% higher performance at 28% higher clock.
PyPerformance Float: Epyc2 does 16% faster completion time, 28% higher clock
PyPerformance Regex Compile: Epyc2 does ~1.5% slower completion time, 28% higher clock
MLpack scikit_ica: Epyc2 does 37% faster completion time, 28% higher clock (this is a clear win for Zen2)
MLpack scikit_svm: Epyc2 does 30% faster completion time, 28% higher clock
MLpack scikit_linearridgeregression: Epyc2 does 30% slower completion time, 28% higher clock (ouch)
Pybench: Epyc2 does 31% faster completion time, 28% higher clock

I'm not seeing "N1 doesn't approach Zen2 IPC" here. I'm only seeing it on multithreaded loads where Grav2 is hobbled by its undersized L3. That's a valid criticism of the Graviton2 chip, but it doesn't say a whole lot about the N1 core itself. Based on the ST tests, it looks like they're at approximate parity.

Are there other single-threaded tests I'm missing in the Phoronix article that would be a better basis for comparison?
 

name99

Senior member
Sep 11, 2010
404
303
136

800 Pts in Geekbench v5 on MacOS Big Sur single core score, 2600 pts Multicore score. A14Z, latest, and greatest from Apple. 4C/4C design.

2020 MacBook Air with 2C/4T:
1005 Pts, single threaded, 2000 pts multithreaded score.

Essentially means that ARM still has large disadvantage to x86, even in Apple designs.

Secondly, the scores in iOS platform, are extremely skewed by the platform's performance. On MacOs, the scores in GB5 are lower than on iOS. Which means, that simply iOS platform is extremely well optimized.

There is no more equal level comparison right now between both arch's on one platform, now. So yeah, ARM v9 at best will be tying with x86 Intel's. But still might be losing to AMD's designs.

I wonder what will Richie Rich say about those scores of A14Z under MacOS in GB5...

Oh FFS.
- That's an A12Z NOT an A14
- And it's the EMULATOR performance.

In other words
- x86 EMULATED
- on the chip from two years ago
- is 80% of the best Intel can offer in that class today

And you think this is a problem?

You clearly have no clue what you are talking about, and no interest in understanding tech.
 

Doug S

Platinum Member
Feb 8, 2020
2,248
3,478
136
They're outta your azz too because you're making up A14 "uplifts" by vigorous handwaving.

Its also silly and pointless since there will be millions of phones with A14s in people's hands by the end of September, and we'll have actual GB5 results. That won't tell us how fast the chips inside the ARM Macs due a few months after that will be, but it will put a lower bound on their performance when compared to the A12Z in the developer Macs.
 

name99

Senior member
Sep 11, 2010
404
303
136
Yep. But our friend Glo is insisting that it's actually because iOS is magically making benchmarks run faster because it's "heavily optimized for the hardware" and that won't apply to macOS. It reeks of desperation from someone who doesn't want to believe Apple is actually performance-competitive and who never thought we'd even be having this conversation.

I don't personally like Geekbench or think it means much, but it's better than nothing, and what we're seeing right now indicates a highly capable core and very good translation performance.

According to Glo Apple has no magic bean that can make HW faster, but Apple DOES have a magic bean that can make all aspects of SW (emulator, compiler, OS) faster.

Hey, who cares where the magic bean lives exactly? I'll take a SW magic bean if that's what give me SYSTEM performance...
 

Glo.

Diamond Member
Apr 25, 2015
5,703
4,548
136
Oh FFS.
- That's an A12Z NOT an A14
- And it's the EMULATOR performance.

In other words
- x86 EMULATED
- on the chip from two years ago
- is 80% of the best Intel can offer in that class today

And you think this is a problem?

You clearly have no clue what you are talking about, and no interest in understanding tech.
What makes you believe that Small cores, are completely unused in Geekbench on iOS, and are not yielding performance?

Wouldn't that 25% of performance lack compared to iOS come from the small cores?

I will answer, yes they were used for performance improvement of large cores, both in ST, and in Multicore. That was the whole point of Apple touting the benefits of small cores I think in A10 chips and so forth.

Everybody jumped to conclusion that it must be because of the emulation. Based only on armchair rough estimates based on the scores, themselves, without putting the TECHNOLOGY behind it in the context.

As I have said many times. Rosetta 2 is way more efficient in yielding performance, than you guys believe, based on the performance of Shadow of the Tomb Raider's performance, that Apple demoed on this very development kit with A12Z.

So let me put it to your minds this thought. What if the reality is different than your beliefs are, and those scores are actually legit, but only showing Big core performance, and IPC, excluding the performance of smaller cores, which may actually have yielded pretty decent performance boost, on iOS? Both ST, and Multicore. Apple touted many, many times that the benefit of their implementation of big.LITTLE arch is that those cores can work simlutaneously.

In the benchmarks on MacOS Big Sur - the small cores are not working.

Be it architecture, or... the fact that it is different platform, than iOS?

If all of this is correct, then everything falls into place.

P.S. If any of you would stop, how can I put this... love Apple design teams, for a second you would yourselves see that there might be different perspective for those benchmarks.

But jumping to conclusions that it must be rosettas fault, and that no way in hell it may be more than 80% of performance efficiency, is way to premature, considering what Apple wants to achieve. If they want to migrate their whole platform, they have to offer at least 90% of performance in Rosettas compiler.
 

Eug

Lifer
Mar 11, 2000
23,586
1,000
126
What makes you believe that Small cores, are completely unused in Geekbench on iOS, and are not yielding performance?
Geekbench states only 4 cores are used on macOS.
All 8 cores are used in iPadOS.

Also, the multi-core multiplier on macOS (3.5X) is lower than iPadOS (4.2X), so it makes sense.
 
  • Like
Reactions: Tlh97 and Glo.

Glo.

Diamond Member
Apr 25, 2015
5,703
4,548
136
Which would prove the point.

And if this is correct, then Rosetta 2 is way more efficient that people believe it to be.
 

name99

Senior member
Sep 11, 2010
404
303
136
(emphasis mine)

That's a particularily misguided belief considering MacOS 11 is essentially iOS with a more capable GUI (the kernel is the same and there are even rumors of iphones running full-fledged macOS). Why on earth would it not get the same optimizations for tha same CPUs?

There is more-or-less a single OS kernel across Apple's OS's, ie Darwin. But there's a lot hidden in that "more or less".
There are different technologies that surface first on either the Mac or the iOS side for one reason or another (eg different driver model), the iOS model now being essentially moved over to macOS starting with Catalina.
More significantly the OS provides MECHANISM, but different devices choose different policies (eg whether or not to provide writing out swap) from the mechanisms that are available.

Of COURSE both platforms will pick up the same optimizations where relevant (cf, eg, the various optimizations made to the ObjC/Swift runtime this year)! But whether the OS' is "the same" is not a sound-bite question; it depends on exactly what you mean by "the OS" and exactly what you consider to matter for the purposes of "being the same".

As for scalability, this is an issue where comparing Darwin with Linux or Windows obscures more than it reveals. Apple appears to be trying to resurrect the underlying philosophy of Mach, easier to make performant now that they control the CPU and can add helper functionality to the cores when relevant.
So Apple is apparently trying to undo the monolithic BSD image on top of Mach, decomposing it into server processes. We've seen that with various security demons, with XPC, now with moving drivers into separate user-space processes. Some big ticket items like networking and VFS are still monolithic, but who knows what the ultimate plan for them is.

So right now Apple is kinda partway through a transition. Half the OS is still monolithic with all that implies
- higher performance, lower memory footprint
- horrible security, horrible RAS generally
- very difficult to modify the code
while half the OS has transitioned to the new world
- lower performance, higher memory footprint, yes BUT
- much easier to scale across multiple cores, and ultimately that may win as a more relevant advantage
- better security and RAS, and
- much easier to modify code (and thus to experiment with alternative, possibly better, ways of doing things)

Obviously this was the original Mach vision, never really realized. It was also the vision of a few experimental OS's like MS' Barrelfish (but like everything out of MS, some combination of fear of change and group territoriality killed Barrelfish and nothing much appears to have moved from it into production -- same story as the way large parts of iOS security are based on work done, then abandoned, by MS).

TL;DR
- yeah, Apple's OS performance is going to look lousy, especially for many-core, for a few years, as they modify a LOT of plumbing
- but there are good reasons to believe that the end result will be something spectacular. Fast on many-core scenarios (ie modern reality), much better RAS, and much easier to maintain and improve.