Discussion Apple Silicon SoC thread

Page 251 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Eug

Lifer
Mar 11, 2000
23,586
1,000
126
M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:

Screen-Shot-2021-10-18-at-1.20.47-PM.jpg

M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:


M1 Ultra discussion here:


M2 discussion here:


Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:

 
Last edited:

soresu

Platinum Member
Dec 19, 2014
2,657
1,858
136
ya they are newish as in there are a few notable changes, but bulk of the architecture is still the same. I hope we see a new architecture from ground up with A18. Same as how AMD is doing with Zen 5.
They are already sporting a super wide core - I don't think it's going to gain them much by going significantly wider a la Zen 5 at this point.

Beyond a certain width you hit diminishing returns.

Going from 4 -> 6 doesn't net the same gain as 6 -> 8, let alone 8 -> 10 even though you are increasing by the same amount.

Unless they can somehow architect 13-16 wide µArch without explosive power draw and area increase maybe, but that seems like a stretch.

Unless some big breakthrough in CPU design happens I think we will see perf hit a hard wall without a drastic change to the underlying hardware device and materials, perhaps something like antiferromagnetic or photonic logic with topological insulator based metal layers.
 
  • Like
Reactions: Apokalupt0

soresu

Platinum Member
Dec 19, 2014
2,657
1,858
136
Not always true. Esp. if they manage to enhance the accompanying blocks.

The point was about taking it purely on the point of just throwing silicon at the problem.

The impact of diminishing returns is already starting to be a buzzkill for Apple I imagine.

Given ARM Ltd's best 4 wide CPU core far outperforms Apple's initial 6 wide design that point should be kinda obvious by now to anyone paying attention.

On that note, are the A7xx cores still 4 wide?

Anyone got the spec sheet on this? I seem to remember a Google Docs thing floating around some time ago.

If so I wonder if Chaberton/A730 will continue to be 4 wide.
 

Doug S

Platinum Member
Feb 8, 2020
2,254
3,487
136
Not always true. Esp. if they manage to enhance the accompanying blocks.

Doesn't matter, there are always diminishing returns for widening, because not all code has sufficient parallelism. Doesn't help as much to go from 8 to 10 wide if even under ideal circumstances the code you're running only exceeds 8 instructions that can be issued/retired at once 10 or 20 percent of the time. But maybe when you went from 6 to 8 it was 20 to 30 percent that could benefit.
 
  • Like
Reactions: Mopetar and Ajay

naukkis

Senior member
Jun 5, 2002
705
576
136
Doesn't matter, there are always diminishing returns for widening, because not all code has sufficient parallelism. Doesn't help as much to go from 8 to 10 wide if even under ideal circumstances the code you're running only exceeds 8 instructions that can be issued/retired at once 10 or 20 percent of the time. But maybe when you went from 6 to 8 it was 20 to 30 percent that could benefit.

There's something that might give good results from very wide cores that aren't yet utilized - like hardware loop unrolling. Complex to do - but when done it makes possible to run every iteration of loop in it's own hardware making well use of very wide execution hardware. Though proper ISA support would make implementing that kind of parallelism much easier.
 
  • Like
Reactions: soresu
Jul 27, 2020
16,208
10,261
106
There's something that might give good results from very wide cores that aren't yet utilized - like hardware loop unrolling. Complex to do - but when done it makes possible to run every iteration of loop in it's own hardware making well use of very wide execution hardware. Though proper ISA support would make implementing that kind of parallelism much easier.
Sounds intriguing. On that note, why don't compilers automatically write SIMD code for loops where it's "obvious" that the task can be parallelized? Or how about generating executable code with its own virtual machine that analyzes the code as it runs. There will be an overhead for small data inputs but if a large input is given, the VM will "see" that it's taking too long to execute and thus, it pauses the execution of the code, parallelizes the task so the loop runs in parallel as multiple threads and then resumes from where it left off.
 

FlameTail

Platinum Member
Dec 15, 2021
2,239
1,209
106
Doesn't matter, there are always diminishing returns for widening, because not all code has sufficient parallelism. Doesn't help as much to go from 8 to 10 wide if even under ideal circumstances the code you're running only exceeds 8 instructions that can be issued/retired at once 10 or 20 percent of the time. But maybe when you went from 6 to 8 it was 20 to 30 percent that could benefit.
So if making the core wider will only bring diminishing returns, what are they gonna do!?

Are IPC gains dead?
 

Apokalupt0

Junior Member
Feb 14, 2024
10
10
41
The point was about taking it purely on the point of just throwing silicon at the problem.

The impact of diminishing returns is already starting to be a buzzkill for Apple I imagine.

Given ARM Ltd's best 4 wide CPU core far outperforms Apple's initial 6 wide design that point should be kinda obvious by now to anyone paying attention.

On that note, are the A7xx cores still 4 wide?

Anyone got the spec sheet on this? I seem to remember a Google Docs thing floating around some time ago.

If so I wonder if Chaberton/A730 will continue to be 4 wide.
The A715 went from 4 wide to 5 wide
 
  • Like
Reactions: soresu