Discussion Apple Silicon SoC thread

Page 344 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Eug

Lifer
Mar 11, 2000
23,829
1,397
126
M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:

Screen-Shot-2021-10-18-at-1.20.47-PM.jpg

M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:


M1 Ultra discussion here:


M2 discussion here:


Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:


M4 Family discussion here:

 
Last edited:

smalM

Member
Sep 9, 2019
73
79
91
I always thought the Pro/Max would go chiplett and it would look like this:
"i/o die" - E-cluster, NPU, USB, TB, en-/decoder, display controller etc.
"compute die" - MI, SLC, P-cluster, GPU-cluster
Pro 1x i/o + 1x compute
Max 1x i/o + 2x compute
Ultra 1x i/o + 4x compute + 1x i/o

And along came the M3 Pro and the M3 Max didn't include UltraFusion.
So much for my ability to predict anything Apple....
 
  • Haha
Reactions: FlameTail and Eug

FlameTail

Diamond Member
Dec 15, 2021
3,974
2,389
106
I always thought the Pro/Max would go chiplett and it would look like this:
"i/o die" - E-cluster, NPU, USB, TB, en-/decoder, display controller etc.
"compute die" - MI, SLC, P-cluster, GPU-cluster
Pro 1x i/o + 1x compute
Max 1x i/o + 2x compute
Ultra 1x i/o + 4x compute + 1x i/o
There's power and latency penalties when using so many tiles. That's why Intel reduced the number of tiles in Lunar Lake to two.
 

Doug S

Platinum Member
Feb 8, 2020
2,795
4,763
136
I thought about chop, but the separate design for the M3 Pro makes me think otherwise. I don’t see how they could include CPU resources in the chop and I also don’t see how the Pro and Max go back to having the same CPU. 12 P cores is way too many for the Pro, yet how could they give the Max any less?

Who says they have to chop it in the same way as M1? The "chop" portion doesn't have to just include more GPU cores, it could include another P core cluster, etc.
 

Doug S

Platinum Member
Feb 8, 2020
2,795
4,763
136
There's power and latency penalties when using so many tiles. That's why Intel reduced the number of tiles in Lunar Lake to two.

I think a lot of that is due to Intel's ring bus, since communication between chiplets will often have to pass through other chiplets. Having eight chiplets on something like that is gonna be a problem. If all dies were directly connected (like Apple's patent for a four chip Ultra like solution) or everything passed through an intermediate switch die (maybe that's what AMD's IOD is but I'm not sure) the power and latency issues are reduced.