Discussion Apple Silicon SoC thread

Eug · Nov 10, 2020

M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:

M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:

Page 78 - Discussion - Apple Silicon SoC thread

Page 78 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M1 Ultra discussion here:

Page 109 - Discussion - Apple Silicon SoC thread

Page 109 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M2 discussion here:

Page 127 - Discussion - Apple Silicon SoC thread

Page 127 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:

Page 215 - Discussion - Apple Silicon SoC thread

Page 215 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M4 Family discussion here:

Page 263 - Discussion - Apple Silicon SoC thread

Page 263 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

poke01 · Thursday at 9:51 PM

Io Magnesso said:
Is it true...
Even with the M3PRO, it's surprisingly difficult...

Yep it’s real

https://www.reddit.com/r/macgaming/s/CWlybBwyM3

mvprod123 · Friday at 4:19 AM

poke01 said:
View attachment 127293
Base M4 performance in Cyberpunk. No frame gen and RT disabled at 1080p nativ. High textures

I'm curious to wait for the game to be updated to Metal 4 and evaluate the performance. So far, everything is pretty good.

MS_AT · Friday at 5:36 AM

Does anyone know if we have visual comparisons already between Mac version, Switch2 version and x64 versions?

poke01 · Friday at 6:21 AM

MS_AT said:
Does anyone know if we have visual comparisons already between Mac version, Switch2 version and x64 versions?

The only reputable source that will do that is probably Notebookcheck that will do Mac and Windows. I doubt they will test M4 models now, maybe when the M5 macbooks release

name99 · Friday at 11:55 AM

Io Magnesso said:
Is it true...
Even with the M3PRO, it's surprisingly difficult...

I'm not a gamer but that seems in line with the results I see for PC integrated graphics, which is the comparable tier to M4.
Apparently Intel integrated simply cannot handle, while AMD struggles, at the same sort of performance.

Seems like an interesting question would be how well Apple's upscaling works; if you dropped "real" resolution to something more like 800*600 and used MetalFX to upscale, is the result preferable?

poke01 · Friday at 7:53 PM

Interesting, anyone else notice the same? Going from A17 Pro to A18 Pro?

Or M3 to M4?

More confirmation that N3B was a troubled node

Eug · Friday at 8:31 PM

poke01 said:
View attachment 127336 View attachment 127337

Interesting, anyone else notice the same? Going from A17 Pro to A18 Pro?

Or M3 to M4?

More confirmation that N3B was a troubled node

Battery life of iPhone 15 Pro is not the greatest compared to 16 Pro.
(However, the 15 Pro also has a smaller battery.)

Io Magnesso · Friday at 10:34 PM

poke01 said:
View attachment 127336 View attachment 127337

Interesting, anyone else notice the same? Going from A17 Pro to A18 Pro?

Or M3 to M4?

More confirmation that N3B was a troubled node

Well, no matter how much Apple silicone or high-performance MAX, it can't be helped if throttling occurs due to the setting.
What machine is this person using?
Is it Mac Studio? or Mac Book Pro?
If it's the latter, of course there's a high chance that throttling will occur.

poke01 · Friday at 11:07 PM

Io Magnesso said:
Is it Mac Studio? or Mac Book Pro?

macbook pro

poke01 · Friday at 11:14 PM

poke01 said:
macbook pro

but he also tested on M4 Max Macbook pro and that did not thottle.

Io Magnesso · Saturday at 9:37 AM

Although it generates less heat, it seems that M4 max consumes more power than the M3 max. The high operating frequency seems to be an advantage

smalM · Saturday at 12:01 PM

Io Magnesso said:
Although it generates less heat, it seems that M4 max consumes more power than the M3 max.

That would be worth the Nobel Prize in physics.

Eug · Saturday at 1:20 PM

smalM said:
That would be worth the Nobel Prize in physics.

M4 Max generates less heat at the same performance as M3 Max but M4 Max consumes more power at maximum performance.

johnsonwax · Saturday at 3:11 PM

I love that the Mac gets one AAA game and it becomes the benchmarking holy grail overnight.

As a Mac gamer, if you want to see single core/RAM performance, benchmark Factorio. Its internal benchmark (UPS) is not based on display but on how fast it can run the simulation with many players seeking how much they can push the engine before UPS falls below its max 60. On x86 there are entire benchmarking suites just for this, but none on Mac. GPU plays almost no role, and while the game has some degree of multithreading, it's still highly single thread constrained because maintaining coherency in a simulation like this is pretty expensive.

Anecdotally during an event last year where the game was pushed to its max via a mod that allowed the game to be clustered, my M1 Max MBP was 15%-20% faster than a 7950X3D, which was the fastest benchmarked at the time. My sense is that M4 Max would be even faster relative to a 9950X3D.

OneEng2 · Saturday at 6:28 PM

poke01 said:
Why would anyone use a consumer design in DC? That’s like saying let’s use the 200mm2+ Strix Point design in DC. You are making no sense.

I am pointing out that you don't get something for nothing. You can't just say ARM is better, you have to acknowledge what it is better for and what design decisions made it better.

Also, note that design decisions that work for a mobile device don't work as well for a DC device ..... or high end desktop.

poke01 · Saturday at 11:53 PM

OneEng2 said:
You can't just say ARM is better, you have to acknowledge what it is better for and what design decisions made it better.

Also, note that design decisions that work for a mobile device don't work as well for a DC device ..... or high end desktop.

Here is the confusion. You bring up DC for consumer products, why would Apple or other ARM vendors add extra features that is useless for smartphones or ARM SoCs with big iGPU laptops. For example, Lunar lake doesn't have SMT or dGPU support just like M4 because these are targeted towards laptops/tablets.

What we discussed is the CPU core itself, not the whole product like M4. @DavidC1 mentioned "other ARM vendors" as well and there is nothing in the ARM ISA that prevents adding most of the features below.

Not x86 compatible - You can recomplie to ARM
Tied to apple everything - There are other ARM cores that reach M4 IPC
Not designed well for heavy sustained loads (better at burst of processing power) - What does even mean, this has nothing do with a CPU core but rather the cooling solution. M4 goes in fanless tablets and laptops that have active cooling.
Real world multi-core work suffers (does great on synthetic benchmarks like geekbench) - yeah cause it only has 4P cores, if it had 96 M4-P cores it would be a different story. If we take a look at actual real world tests, M4 Pro and M4 Max which have more P cores, they are on par or better performance than a 16 core Strix Halo.
No external graphics interface - Not related to ISA
No PCI external interface - Not related to ISA
No AVX (only proprietary SIMD and limited at that). - ARM has SVE2 and can be added to M4 revisions
Relatively large die (168mm2) on N3E (Turin D 16c CCD is about 85mm2) - Because it includes a LOT more IP other than just CPU cores and cache. Its like comparing Lunar Lake 140mm2 to Turin-D CCD.

TLDR: No one will use M4 SoC in a server, they will use muliple M4 P-cores (NOT the whole M4 SoC) like AMD does with Turin and add a bunch of stuff related to DC like SMT and SVE2 and ECC memory and PCIe slots if they are needed.

Geddagod · Sunday at 12:19 AM

poke01 said:
Here is the confusion. You bring up DC for consumer products, why would Apple or other ARM vendors add extra features that is useless for smartphones or ARM SoCs with big iGPU laptops. For example, Lunar lake doesn't have SMT or dGPU support just like M4 because these are targeted towards laptops/tablets.

What we discussed is the CPU core itself, not the whole product like M4. @DavidC1 mentioned "other ARM vendors" as well and there is nothing in the ARM ISA that prevents adding most of the features below.

TLDR: No one will use M4 SoC in a server, they will use muliple M4 P-cores (NOT the whole M4 SoC) like AMD does with Turin and add a bunch of stuff related to DC like SMT and SVE2 and ECC memory and PCIe slots if they are needed.

Hard to tell if Apple cores will perform as well in server as they are doing currently in client.
I don't think Apple's cache hierarchy will cut it in servers. Which is also why I'm very interested in seeing how Qualcomm's server CPUs go, since they use a similar hierarchy.
That and also relevant asterisks surrounding vectorized perf ofc.

poke01 · Sunday at 1:04 AM

Geddagod said:
Hard to tell if Apple cores will perform as well in server as they are doing currently in client.
I don't think Apple's cache hierarchy will cut it in servers. Which is also why I'm very interested in seeing how Qualcomm's server CPUs go, since they use a similar hierarchy.
That and also relevant asterisks surrounding vectorized perf ofc.

I'm sure Qualcomm/Apple will tweak the core if they ever want to do busniess in DC. I remember the same rhetoric was used if Apple's cores will ever perform as good on a real desktop OS and not in a closed mobile OS.

Apple's CPU core itself is good. Even with half the shared L2 cache the difference is less than ~10%.

iPhone 16 Pro Max vs iPhone 16 Plus - Geekbench

Codename	Tupai	Tahiti
SoC	A18	A18 Pro
P-core cluster L2	8 MB	16 MB
SLC	12 MB	24 MB
GPU	5-core	6-core
ProRes Encode/Decode	No	Yes

Doug S · Sunday at 2:47 AM

johnsonwax said:
I love that the Mac gets one AAA game and it becomes the benchmarking holy grail overnight.

Well it is because it is a game already used for benchmarking on Windows PCs so it gives data points people have never had before.

All these people rushing to do benchmarks are ignoring whether that is a quality port of Cyberpunk - to what extent did they hand optimize some assembly language sequences in that (i.e. for SIMD code where the compiler often doesn't generate the best code or even use SIMD at all) on x86? Because I'm willing to bet they didn't do the same on the Mac. How much is Apple doing deferred rendering affecting its performance when it is trying to run something that was designed for the way DX12 renders?

Maybe this is a top quality port that put as much effort into getting it running well on the Mac as they did into getting it running well on the PC. But I doubt it - primarily because that doesn't make sense, the Windows gaming market is massive compared to the Mac gaming market, so it wouldn't make sense to put more than a fraction of the effort forth for the port.

Io Magnesso · Sunday at 2:52 AM

poke01 said:
Here is the confusion. You bring up DC for consumer products, why would Apple or other ARM vendors add extra features that is useless for smartphones or ARM SoCs with big iGPU laptops. For example, Lunar lake doesn't have SMT or dGPU support just like M4 because these are targeted towards laptops/tablets.

What we discussed is the CPU core itself, not the whole product like M4. @DavidC1 mentioned "other ARM vendors" as well and there is nothing in the ARM ISA that prevents adding most of the features below.

TLDR: No one will use M4 SoC in a server, they will use muliple M4 P-cores (NOT the whole M4 SoC) like AMD does with Turin and add a bunch of stuff related to DC like SMT and SVE2 and ECC memory and PCIe slots if they are needed.

It would be nice if Apple's architecture had that much design freedom.
So far I don't think Apple's architecture is being considered for servers.
If you want to divert it, you have to reconsider the architecture.

And in fact, LUNAR LAKE is not unusable with dgpu
DGPU does not use a special communication bus, it uses a general-purpose PCIe
LUNAR LAKE has 8 lanes for PCIe, 4 lanes for 5.0 and 4 lanes for 4.0.
If you use it, you can use the dgpu even though the bandwidth is narrow.
I think Acer had an LNL laptop with a DGPU.

Jan Olšan · Sunday at 8:01 AM

Doug S said:
Well it is because it is a game already used for benchmarking on Windows PCs so it gives data points people have never had before.

All these people rushing to do benchmarks are ignoring whether that is a quality port of Cyberpunk - to what extent did they hand optimize some assembly language sequences in that (i.e. for SIMD code where the compiler often doesn't generate the best code or even use SIMD at all) on x86? Because I'm willing to bet they didn't do the same on the Mac. How much is Apple doing deferred rendering affecting its performance when it is trying to run something that was designed for the way DX12 renders?

Maybe this is a top quality port that put as much effort into getting it running well on the Mac as they did into getting it running well on the PC. But I doubt it - primarily because that doesn't make sense, the Windows gaming market is massive compared to the Mac gaming market, so it wouldn't make sense to put more than a fraction of the effort forth for the port.

What you are talking about is CPU performance bottleneck in games. That is a thing but it is more of an uncommon case than the default. Most AAA games are going to be GPU limited even on high performance dedicated GPUs. If we are talking integrated GPUs, CPU bottleneck actually being your worry is orders of magnitude rarer. AND since Apple cores are supposed to be so great, that should mean there's less of that (unless it turns out their performance profile is not gaming-friendly, see how plain cores VS. same cores with 3D V-Cache behave, for illustration).

It is going to be almost exclusively GPU performance that is performance limitation and what determines what FPS you get on Apple hardware (that's also why the binary translation works with games decently - the CPU cost is hidden ecause game is GPU bottlenecked anyway and you have CPU performance headroom anyway; importantly, the GPU driver running is a native binary).

The SIMD and other optimizations you mention... I would not really expect such game to be extensively tuned like that on either platform. What you will want is the GPU driver being optimised well but that is on Apple. And performance of its compiler and other code is still smaller factor than optimization of the actual GPU acceleration (scheduling to the GPU units and so on). This part is the rocket science of game GPU. This part is why there is almost a duopoly in the market as only AMD and Nvidia are able to really do state of the art (and gotta say Intel did a great job getting as close as they now are, as the only other vendor in the world).

The GPU performance of a game is something that generally speaking the GPU vendors assist with. They optimise their driver complex for individual games, or assist the devs with changing (GPU-bound) code of the game to perform better. How well Apple works there, I have no idea, but it is kinda on them to step up. Honestly I would expect Apple to go beyond the usual here and heavily support this effort given how it is gonna be a poster benchmark game of the platform.

poke01 · Sunday at 9:33 AM

Jan Olšan said:
How well Apple works there, I have no idea, but it is kinda on them to step up. Honestly I would expect Apple to go beyond the usual here and heavily support this effort given how it is gonna be a poster benchmark game of the platform.

I think this game will become the standard on Mac for benchmarking Ray tracing and raster improvements for Apple GPUs to showcase in reviews of new hardware. They certainly added new features to Metal like denosing and Frame gen which are coming later this year.
This might be the closest Apple has ever worked with a game dev.

name99 · Sunday at 12:09 PM

smalM said:
That would be worth the Nobel Prize in physics.

No, just knowing what words mean... Hint - what's the relationship between power and energy...

What original poster said is essentially correct. I explained this repeatedly with the 2 came out, I'm not going to waster my time doing so again.

Io Magnesso · Sunday at 1:36 PM

Well, the M4MAX macbook PRO probably has better cooling performance.

johnsonwax · Sunday at 10:02 PM

Jan Olšan said:
What you are talking about is CPU performance bottleneck in games. That is a thing but it is more of an uncommon case than the default. Most AAA games are going to be GPU limited even on high performance dedicated GPUs. If we are talking integrated GPUs, CPU bottleneck actually being your worry is orders of magnitude rarer. AND since Apple cores are supposed to be so great, that should mean there's less of that (unless it turns out their performance profile is not gaming-friendly, see how plain cores VS. same cores with 3D V-Cache behave, for illustration).

It is going to be almost exclusively GPU performance that is performance limitation and what determines what FPS you get on Apple hardware (that's also why the binary translation works with games decently - the CPU cost is hidden ecause game is GPU bottlenecked anyway and you have CPU performance headroom anyway; importantly, the GPU driver running is a native binary).

The SIMD and other optimizations you mention... I would not really expect such game to be extensively tuned like that on either platform. What you will want is the GPU driver being optimised well but that is on Apple. And performance of its compiler and other code is still smaller factor than optimization of the actual GPU acceleration (scheduling to the GPU units and so on). This part is the rocket science of game GPU. This part is why there is almost a duopoly in the market as only AMD and Nvidia are able to really do state of the art (and gotta say Intel did a great job getting as close as they now are, as the only other vendor in the world).

The GPU performance of a game is something that generally speaking the GPU vendors assist with. They optimise their driver complex for individual games, or assist the devs with changing (GPU-bound) code of the game to perform better. How well Apple works there, I have no idea, but it is kinda on them to step up. Honestly I would expect Apple to go beyond the usual here and heavily support this effort given how it is gonna be a poster benchmark game of the platform.

It's trickier than that. The reason CP performed so poorly on the PS4 is that the game was designed to stream assets and the relatively slow spinning drives in stock PS4 couldn't do what even a crappy SSD, let alone a good PS5 stock SSD could do. On PS4 the game was IO constrained, not GPU or CPU constrained. CP on a PS4 often didn't even turn the fan on because the game was so frequently just waiting on HD seek. How you shove 168GB of assets across a storage bus, then keep in CPU RAM, then shove over a PCIe bus, and prioritize to stay on the GPU is no small feat as any one of them that goes into the weeds tanks your performance.

But agreed the burden falls on Apple. They've got Metal as a new entity to deal with and with a wildly different memory model. Some of that will be much easier (that whole memory/memory pipeline for one) and some harder. Metal will have entirely different tradeoffs. Feeding the GPU is a lot simpler, but Apples GPU memory speed is much slower than Nvidia's. These are all design decisions of Apple's choosing and they are all unique in the industry. A least PS5/XBox/PC all have the same general shape and bottlenecks. Apple is presenting the industry with the Cell processor all over again. In theory it might be better but if you have to completely reoptimize your game to get to better, the economics may not exist to allow that to happen.

Discussion Apple Silicon SoC thread

Lifer

Diamond Member

Senior member

Senior member

Diamond Member

Senior member

Diamond Member

Lifer

Senior member

Diamond Member

Diamond Member

Senior member

Member

Lifer

Senior member

Senior member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Senior member

Senior member

Diamond Member

Senior member

Senior member

Senior member