Discussion Apple Silicon SoC thread

Page 404 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Eug

Lifer
Mar 11, 2000
24,048
1,679
126
M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:

Screen-Shot-2021-10-18-at-1.20.47-PM.jpg

M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:


M1 Ultra discussion here:


M2 discussion here:


Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:


M4 Family discussion here:

 
Last edited:

poke01

Diamond Member
Mar 8, 2022
3,780
5,120
106
Does anyone know if we have visual comparisons already between Mac version, Switch2 version and x64 versions?
The only reputable source that will do that is probably Notebookcheck that will do Mac and Windows. I doubt they will test M4 models now, maybe when the M5 macbooks release
 
  • Like
Reactions: Mopetar

name99

Senior member
Sep 11, 2010
612
506
136
Is it true...
Even with the M3PRO, it's surprisingly difficult...
I'm not a gamer but that seems in line with the results I see for PC integrated graphics, which is the comparable tier to M4.
Apparently Intel integrated simply cannot handle, while AMD struggles, at the same sort of performance.

Seems like an interesting question would be how well Apple's upscaling works; if you dropped "real" resolution to something more like 800*600 and used MetalFX to upscale, is the result preferable?
 

Io Magnesso

Senior member
Jun 12, 2025
578
152
71
View attachment 127336View attachment 127337

Interesting, anyone else notice the same? Going from A17 Pro to A18 Pro?

Or M3 to M4?

More confirmation that N3B was a troubled node
Well, no matter how much Apple silicone or high-performance MAX, it can't be helped if throttling occurs due to the setting.
What machine is this person using?
Is it Mac Studio? or Mac Book Pro?
If it's the latter, of course there's a high chance that throttling will occur.
 

Io Magnesso

Senior member
Jun 12, 2025
578
152
71
Although it generates less heat, it seems that M4 max consumes more power than the M3 max. The high operating frequency seems to be an advantage
 

johnsonwax

Senior member
Jun 27, 2024
216
358
96
I love that the Mac gets one AAA game and it becomes the benchmarking holy grail overnight.

As a Mac gamer, if you want to see single core/RAM performance, benchmark Factorio. Its internal benchmark (UPS) is not based on display but on how fast it can run the simulation with many players seeking how much they can push the engine before UPS falls below its max 60. On x86 there are entire benchmarking suites just for this, but none on Mac. GPU plays almost no role, and while the game has some degree of multithreading, it's still highly single thread constrained because maintaining coherency in a simulation like this is pretty expensive.

Anecdotally during an event last year where the game was pushed to its max via a mod that allowed the game to be clustered, my M1 Max MBP was 15%-20% faster than a 7950X3D, which was the fastest benchmarked at the time. My sense is that M4 Max would be even faster relative to a 9950X3D.
 

OneEng2

Senior member
Sep 19, 2022
676
924
106
Why would anyone use a consumer design in DC? That’s like saying let’s use the 200mm2+ Strix Point design in DC. You are making no sense.
I am pointing out that you don't get something for nothing. You can't just say ARM is better, you have to acknowledge what it is better for and what design decisions made it better.

Also, note that design decisions that work for a mobile device don't work as well for a DC device ..... or high end desktop.
 
  • Like
Reactions: Io Magnesso

poke01

Diamond Member
Mar 8, 2022
3,780
5,120
106
You can't just say ARM is better, you have to acknowledge what it is better for and what design decisions made it better.

Also, note that design decisions that work for a mobile device don't work as well for a DC device ..... or high end desktop.

Here is the confusion. You bring up DC for consumer products, why would Apple or other ARM vendors add extra features that is useless for smartphones or ARM SoCs with big iGPU laptops. For example, Lunar lake doesn't have SMT or dGPU support just like M4 because these are targeted towards laptops/tablets.


What we discussed is the CPU core itself, not the whole product like M4. @DavidC1 mentioned "other ARM vendors" as well and there is nothing in the ARM ISA that prevents adding most of the features below.

Not x86 compatible - You can recomplie to ARM
Tied to apple everything - There are other ARM cores that reach M4 IPC
Not designed well for heavy sustained loads (better at burst of processing power) - What does even mean, this has nothing do with a CPU core but rather the cooling solution. M4 goes in fanless tablets and laptops that have active cooling.
Real world multi-core work suffers (does great on synthetic benchmarks like geekbench) - yeah cause it only has 4P cores, if it had 96 M4-P cores it would be a different story. If we take a look at actual real world tests, M4 Pro and M4 Max which have more P cores, they are on par or better performance than a 16 core Strix Halo.
No external graphics interface - Not related to ISA
No PCI external interface - Not related to ISA
No AVX (only proprietary SIMD and limited at that). - ARM has SVE2 and can be added to M4 revisions
Relatively large die (168mm2) on N3E (Turin D 16c CCD is about 85mm2) - Because it includes a LOT more IP other than just CPU cores and cache. Its like comparing Lunar Lake 140mm2 to Turin-D CCD.

TLDR: No one will use M4 SoC in a server, they will use muliple M4 P-cores (NOT the whole M4 SoC) like AMD does with Turin and add a bunch of stuff related to DC like SMT and SVE2 and ECC memory and PCIe slots if they are needed.
 
  • Like
Reactions: ashFTW

Geddagod

Golden Member
Dec 28, 2021
1,391
1,480
106
Here is the confusion. You bring up DC for consumer products, why would Apple or other ARM vendors add extra features that is useless for smartphones or ARM SoCs with big iGPU laptops. For example, Lunar lake doesn't have SMT or dGPU support just like M4 because these are targeted towards laptops/tablets.


What we discussed is the CPU core itself, not the whole product like M4. @DavidC1 mentioned "other ARM vendors" as well and there is nothing in the ARM ISA that prevents adding most of the features below.



TLDR: No one will use M4 SoC in a server, they will use muliple M4 P-cores (NOT the whole M4 SoC) like AMD does with Turin and add a bunch of stuff related to DC like SMT and SVE2 and ECC memory and PCIe slots if they are needed.
Hard to tell if Apple cores will perform as well in server as they are doing currently in client.
I don't think Apple's cache hierarchy will cut it in servers. Which is also why I'm very interested in seeing how Qualcomm's server CPUs go, since they use a similar hierarchy.
That and also relevant asterisks surrounding vectorized perf ofc.
 

poke01

Diamond Member
Mar 8, 2022
3,780
5,120
106
Hard to tell if Apple cores will perform as well in server as they are doing currently in client.
I don't think Apple's cache hierarchy will cut it in servers. Which is also why I'm very interested in seeing how Qualcomm's server CPUs go, since they use a similar hierarchy.
That and also relevant asterisks surrounding vectorized perf ofc.
I'm sure Qualcomm/Apple will tweak the core if they ever want to do busniess in DC. I remember the same rhetoric was used if Apple's cores will ever perform as good on a real desktop OS and not in a closed mobile OS.

Apple's CPU core itself is good. Even with half the shared L2 cache the difference is less than ~10%.

CodenameTupaiTahiti
SoCA18A18 Pro
P-core cluster L28 MB16 MB
SLC12 MB24 MB
GPU5-core6-core
ProRes Encode/DecodeNoYes
 
  • Like
Reactions: Mopetar

Doug S

Diamond Member
Feb 8, 2020
3,316
5,766
136
I love that the Mac gets one AAA game and it becomes the benchmarking holy grail overnight.

Well it is because it is a game already used for benchmarking on Windows PCs so it gives data points people have never had before.

All these people rushing to do benchmarks are ignoring whether that is a quality port of Cyberpunk - to what extent did they hand optimize some assembly language sequences in that (i.e. for SIMD code where the compiler often doesn't generate the best code or even use SIMD at all) on x86? Because I'm willing to bet they didn't do the same on the Mac. How much is Apple doing deferred rendering affecting its performance when it is trying to run something that was designed for the way DX12 renders?

Maybe this is a top quality port that put as much effort into getting it running well on the Mac as they did into getting it running well on the PC. But I doubt it - primarily because that doesn't make sense, the Windows gaming market is massive compared to the Mac gaming market, so it wouldn't make sense to put more than a fraction of the effort forth for the port.
 

Io Magnesso

Senior member
Jun 12, 2025
578
152
71
Here is the confusion. You bring up DC for consumer products, why would Apple or other ARM vendors add extra features that is useless for smartphones or ARM SoCs with big iGPU laptops. For example, Lunar lake doesn't have SMT or dGPU support just like M4 because these are targeted towards laptops/tablets.


What we discussed is the CPU core itself, not the whole product like M4. @DavidC1 mentioned "other ARM vendors" as well and there is nothing in the ARM ISA that prevents adding most of the features below.



TLDR: No one will use M4 SoC in a server, they will use muliple M4 P-cores (NOT the whole M4 SoC) like AMD does with Turin and add a bunch of stuff related to DC like SMT and SVE2 and ECC memory and PCIe slots if they are needed.
It would be nice if Apple's architecture had that much design freedom.
So far I don't think Apple's architecture is being considered for servers.
If you want to divert it, you have to reconsider the architecture.

And in fact, LUNAR LAKE is not unusable with dgpu
DGPU does not use a special communication bus, it uses a general-purpose PCIe
LUNAR LAKE has 8 lanes for PCIe, 4 lanes for 5.0 and 4 lanes for 4.0.
If you use it, you can use the dgpu even though the bandwidth is narrow.
I think Acer had an LNL laptop with a DGPU.
 

Jan Olšan

Senior member
Jan 12, 2017
542
1,076
136
Well it is because it is a game already used for benchmarking on Windows PCs so it gives data points people have never had before.

All these people rushing to do benchmarks are ignoring whether that is a quality port of Cyberpunk - to what extent did they hand optimize some assembly language sequences in that (i.e. for SIMD code where the compiler often doesn't generate the best code or even use SIMD at all) on x86? Because I'm willing to bet they didn't do the same on the Mac. How much is Apple doing deferred rendering affecting its performance when it is trying to run something that was designed for the way DX12 renders?

Maybe this is a top quality port that put as much effort into getting it running well on the Mac as they did into getting it running well on the PC. But I doubt it - primarily because that doesn't make sense, the Windows gaming market is massive compared to the Mac gaming market, so it wouldn't make sense to put more than a fraction of the effort forth for the port.
What you are talking about is CPU performance bottleneck in games. That is a thing but it is more of an uncommon case than the default. Most AAA games are going to be GPU limited even on high performance dedicated GPUs. If we are talking integrated GPUs, CPU bottleneck actually being your worry is orders of magnitude rarer. AND since Apple cores are supposed to be so great, that should mean there's less of that (unless it turns out their performance profile is not gaming-friendly, see how plain cores VS. same cores with 3D V-Cache behave, for illustration).

It is going to be almost exclusively GPU performance that is performance limitation and what determines what FPS you get on Apple hardware (that's also why the binary translation works with games decently - the CPU cost is hidden ecause game is GPU bottlenecked anyway and you have CPU performance headroom anyway; importantly, the GPU driver running is a native binary).

The SIMD and other optimizations you mention... I would not really expect such game to be extensively tuned like that on either platform. What you will want is the GPU driver being optimised well but that is on Apple. And performance of its compiler and other code is still smaller factor than optimization of the actual GPU acceleration (scheduling to the GPU units and so on). This part is the rocket science of game GPU. This part is why there is almost a duopoly in the market as only AMD and Nvidia are able to really do state of the art (and gotta say Intel did a great job getting as close as they now are, as the only other vendor in the world).

The GPU performance of a game is something that generally speaking the GPU vendors assist with. They optimise their driver complex for individual games, or assist the devs with changing (GPU-bound) code of the game to perform better. How well Apple works there, I have no idea, but it is kinda on them to step up. Honestly I would expect Apple to go beyond the usual here and heavily support this effort given how it is gonna be a poster benchmark game of the platform.
 
Last edited:

poke01

Diamond Member
Mar 8, 2022
3,780
5,120
106
How well Apple works there, I have no idea, but it is kinda on them to step up. Honestly I would expect Apple to go beyond the usual here and heavily support this effort given how it is gonna be a poster benchmark game of the platform.
I think this game will become the standard on Mac for benchmarking Ray tracing and raster improvements for Apple GPUs to showcase in reviews of new hardware. They certainly added new features to Metal like denosing and Frame gen which are coming later this year.
This might be the closest Apple has ever worked with a game dev.
 
  • Like
Reactions: Mopetar

name99

Senior member
Sep 11, 2010
612
506
136
That would be worth the Nobel Prize in physics.
No, just knowing what words mean... Hint - what's the relationship between power and energy...

What original poster said is essentially correct. I explained this repeatedly with the 2 came out, I'm not going to waster my time doing so again.
 

johnsonwax

Senior member
Jun 27, 2024
216
358
96
What you are talking about is CPU performance bottleneck in games. That is a thing but it is more of an uncommon case than the default. Most AAA games are going to be GPU limited even on high performance dedicated GPUs. If we are talking integrated GPUs, CPU bottleneck actually being your worry is orders of magnitude rarer. AND since Apple cores are supposed to be so great, that should mean there's less of that (unless it turns out their performance profile is not gaming-friendly, see how plain cores VS. same cores with 3D V-Cache behave, for illustration).

It is going to be almost exclusively GPU performance that is performance limitation and what determines what FPS you get on Apple hardware (that's also why the binary translation works with games decently - the CPU cost is hidden ecause game is GPU bottlenecked anyway and you have CPU performance headroom anyway; importantly, the GPU driver running is a native binary).

The SIMD and other optimizations you mention... I would not really expect such game to be extensively tuned like that on either platform. What you will want is the GPU driver being optimised well but that is on Apple. And performance of its compiler and other code is still smaller factor than optimization of the actual GPU acceleration (scheduling to the GPU units and so on). This part is the rocket science of game GPU. This part is why there is almost a duopoly in the market as only AMD and Nvidia are able to really do state of the art (and gotta say Intel did a great job getting as close as they now are, as the only other vendor in the world).

The GPU performance of a game is something that generally speaking the GPU vendors assist with. They optimise their driver complex for individual games, or assist the devs with changing (GPU-bound) code of the game to perform better. How well Apple works there, I have no idea, but it is kinda on them to step up. Honestly I would expect Apple to go beyond the usual here and heavily support this effort given how it is gonna be a poster benchmark game of the platform.
It's trickier than that. The reason CP performed so poorly on the PS4 is that the game was designed to stream assets and the relatively slow spinning drives in stock PS4 couldn't do what even a crappy SSD, let alone a good PS5 stock SSD could do. On PS4 the game was IO constrained, not GPU or CPU constrained. CP on a PS4 often didn't even turn the fan on because the game was so frequently just waiting on HD seek. How you shove 168GB of assets across a storage bus, then keep in CPU RAM, then shove over a PCIe bus, and prioritize to stay on the GPU is no small feat as any one of them that goes into the weeds tanks your performance.

But agreed the burden falls on Apple. They've got Metal as a new entity to deal with and with a wildly different memory model. Some of that will be much easier (that whole memory/memory pipeline for one) and some harder. Metal will have entirely different tradeoffs. Feeding the GPU is a lot simpler, but Apples GPU memory speed is much slower than Nvidia's. These are all design decisions of Apple's choosing and they are all unique in the industry. A least PS5/XBox/PC all have the same general shape and bottlenecks. Apple is presenting the industry with the Cell processor all over again. In theory it might be better but if you have to completely reoptimize your game to get to better, the economics may not exist to allow that to happen.