Discussion Apple Silicon SoC thread

Eug · Nov 10, 2020

M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:

M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:

Discussion - Apple Silicon SoC thread

Page 78 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M1 Ultra discussion here:

Discussion - Apple Silicon SoC thread

Page 109 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M2 discussion here:

Discussion - Apple Silicon SoC thread

Page 127 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:

Discussion - Apple Silicon SoC thread

Page 215 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M4 Family discussion here:

Discussion - Apple Silicon SoC thread

Page 263 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M5 Family discussion here:

Page 431 - Discussion - Apple Silicon SoC thread

Page 431 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

mvprod123 · Thursday at 7:59 AM

https://twitter.com/x/status/2039590151027069048

Nothingness · Thursday at 8:12 AM

mvprod123 said:
https://twitter.com/x/status/2039590151027069048

This was already posted two days ago ;-)

mvprod123 · Thursday at 7:43 PM

https://twitter.com/x/status/2039831157198524518

Doug S · 2026-04-03T04:51:33-0400

mvprod123 said:
https://twitter.com/x/status/2039831157198524518

Now do the GPU. Every M5/M5P/M5M has one so there's no reason this NumKong library shouldn't be leveraging it.

Nothingness · 2026-04-03T06:41:49-0400

Doug S said:
Now do the GPU. Every M5/M5P/M5M has one so there's no reason this NumKong library shouldn't be leveraging it.

NumKong focuses on CPU only in C/C++ and already targets many architectures.
I'm considering using GPU for some computational projects but what should I use? CUDA, OpenCL, Metal, something else? If I'm only interested in what I can run locally, Metal would be the way to go but then my code wouldn't run anywhere else. IMHO CPU vs GPU computations are very different targets.

EDIT: Though I agree with you that if what you want is to demonstrate is highest performance on a given platform, you should consider all possibilities.

Doug S · 2026-04-03T17:53:33-0400

Nothingness said:
NumKong focuses on CPU only in C/C++ and already targets many architectures.
I'm considering using GPU for some computational projects but what should I use? CUDA, OpenCL, Metal, something else? If I'm only interested in what I can run locally, Metal would be the way to go but then my code wouldn't run anywhere else. IMHO CPU vs GPU computations are very different targets.

EDIT: Though I agree with you that if what you want is to demonstrate is highest performance on a given platform, you should consider all possibilities.

If NumKong is just a benchmark which only claims to measure CPU speed that's fine, leave out the GPU. If it is intended for real work, offering a way to leverage SME but not a way to leverage the GPU that every single Apple Silicon Mac has is just plain idiotic. Real work doesn't care what it is run on, it cares about getting done faster and/or more power efficiently.

For your own work if you want it to get the best performance on the Mac it makes sense to use the GPU. It is more difficult on x86 because you can't count on every x86 system to have a GPU at all, or if it does which of the three major GPU types it is. If it is YOUR project then you should probably target whatever system you have that has the highest performance GPU, and not care if it runs faster than "CPU only" speeds on anything else.

johnsonwax · 2026-04-03T20:48:42-0400

Nothingness said:
NumKong focuses on CPU only in C/C++ and already targets many architectures.
I'm considering using GPU for some computational projects but what should I use? CUDA, OpenCL, Metal, something else? If I'm only interested in what I can run locally, Metal would be the way to go but then my code wouldn't run anywhere else. IMHO CPU vs GPU computations are very different targets.

EDIT: Though I agree with you that if what you want is to demonstrate is highest performance on a given platform, you should consider all possibilities.

If you write to MLX it can target both Metal and CUDA. So you can run it locally and also deploy on Blackwell, etc.

Discussion Apple Silicon SoC thread

Eug

Lifer

Discussion - Apple Silicon SoC thread

Discussion - Apple Silicon SoC thread

Discussion - Apple Silicon SoC thread

Discussion - Apple Silicon SoC thread

Discussion - Apple Silicon SoC thread

Page 431 - Discussion - Apple Silicon SoC thread

mvprod123

Senior member

Nothingness

Diamond Member

mvprod123

Senior member

Doug S

Diamond Member

Nothingness

Diamond Member

Doug S

Diamond Member

johnsonwax

Senior member

TRENDING THREADS