• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Discussion Apple Silicon SoC thread

Page 483 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Eug

Lifer
M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:

Screen-Shot-2021-10-18-at-1.20.47-PM.jpg

M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:


M1 Ultra discussion here:


M2 discussion here:


Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:


M4 Family discussion here:


M5 Family discussion here:

 
Last edited:
Now do the GPU. Every M5/M5P/M5M has one so there's no reason this NumKong library shouldn't be leveraging it.
NumKong focuses on CPU only in C/C++ and already targets many architectures.
I'm considering using GPU for some computational projects but what should I use? CUDA, OpenCL, Metal, something else? If I'm only interested in what I can run locally, Metal would be the way to go but then my code wouldn't run anywhere else. IMHO CPU vs GPU computations are very different targets.

EDIT: Though I agree with you that if what you want is to demonstrate is highest performance on a given platform, you should consider all possibilities.
 
NumKong focuses on CPU only in C/C++ and already targets many architectures.
I'm considering using GPU for some computational projects but what should I use? CUDA, OpenCL, Metal, something else? If I'm only interested in what I can run locally, Metal would be the way to go but then my code wouldn't run anywhere else. IMHO CPU vs GPU computations are very different targets.

EDIT: Though I agree with you that if what you want is to demonstrate is highest performance on a given platform, you should consider all possibilities.

If NumKong is just a benchmark which only claims to measure CPU speed that's fine, leave out the GPU. If it is intended for real work, offering a way to leverage SME but not a way to leverage the GPU that every single Apple Silicon Mac has is just plain idiotic. Real work doesn't care what it is run on, it cares about getting done faster and/or more power efficiently.

For your own work if you want it to get the best performance on the Mac it makes sense to use the GPU. It is more difficult on x86 because you can't count on every x86 system to have a GPU at all, or if it does which of the three major GPU types it is. If it is YOUR project then you should probably target whatever system you have that has the highest performance GPU, and not care if it runs faster than "CPU only" speeds on anything else.
 
NumKong focuses on CPU only in C/C++ and already targets many architectures.
I'm considering using GPU for some computational projects but what should I use? CUDA, OpenCL, Metal, something else? If I'm only interested in what I can run locally, Metal would be the way to go but then my code wouldn't run anywhere else. IMHO CPU vs GPU computations are very different targets.

EDIT: Though I agree with you that if what you want is to demonstrate is highest performance on a given platform, you should consider all possibilities.
If you write to MLX it can target both Metal and CUDA. So you can run it locally and also deploy on Blackwell, etc.
 
Back
Top