Discussion Apple Silicon SoC thread

Eug · Nov 10, 2020

M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:

M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:

Discussion - Apple Silicon SoC thread

Page 78 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M1 Ultra discussion here:

Discussion - Apple Silicon SoC thread

Page 109 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M2 discussion here:

Discussion - Apple Silicon SoC thread

Page 127 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:

Discussion - Apple Silicon SoC thread

Page 215 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M4 Family discussion here:

Discussion - Apple Silicon SoC thread

Page 263 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M5 Family discussion here:

Page 431 - Discussion - Apple Silicon SoC thread

Page 431 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

Doug S · Friday at 4:00 PM

trivik12 said:
What does Apple use for AI compute needs? just Cloud credits from Amazon/Alphabet? They are not creating their own models and so its all inference at the moment.

They are creating their own models, they just aren't good enough yet hence the deal to essentially buy a fork of Gemini to use for now.

They've been using M2 Ultra based servers for running their own stuff for a while now. Undoubtedly they also use some external cloud compute as well. They seem to be pretty judicious about building their own capacity, they always use third party cloud as "overflow" even for iCloud storage.

That has turned out to be a really good move in hindsight since more and more countries look to be following China's lead of requiring cloud storage for their their citizens be on servers controlled by companies that are subject to their laws and not those of the US.

Jan Olšan · Friday at 4:14 PM

johnsonwax said:
Because Microsoft dead-ended their own hardware. So consumers then face a choice to move forward - a new PC that Microsoft may dead-end again, or a Mac.

In reality, today's Macs will be "dead-ended" by Apple long before today's x86 PCs lose Windows support. Just look at history.

You can still put Windows 11 on 2008-2011 computers. It's not officially supported but it works and they get all the security patches. What do you get from Apple for comparable machines?
(And Linux users can add unknown number of extra years after any hypothetical date at which these computers actually stop working with Windows, whereas I can't confidently expect a "life after Apple" for M series MacBooks/MacMinis with how the Linux support limps on, currently.)

Possible exception: The Qualcomm devices due to the same lack of standards in Arm world. Those are probably the only Windows computers where I worry about them possibly staying usable for less than 10 years due to potential end of Windows support.

johnsonwax said:
Enter one at $599.

Like there aren't enough Windows laptops at that price point...

poke01 · Friday at 4:25 PM

Jan Olšan said:
You can still put Windows 11 on 2008-2011 computers. It's not officially supported but it works and they get all the security patches. What do you get from Apple for comparable machines?

Not officially but you can get full support via opencore on older x86 Mac’s which will run latest macOS.

Jan Olšan said:
Like there aren't enough Windows laptops at that price point...

Those people who couldn’t spend $1000 on a MacBook would’ve gotten a cheaper Windows laptop especially students. Now you can get a MacBook as low as $500. That will impact sales on lower end windows laptops and the Neo has been selling very well.

poke01 · Friday at 4:54 PM

Jan Olšan said:
And Linux users can add unknown number of extra years after any hypothetical date at which these computers actually stop working with Windows, whereas I can't confidently expect a "life after Apple" for M series MacBooks/MacMinis with how the Linux support limps on, currently

M1 and M2 support Linux. M3 and later is being worked on.

https://www.reddit.com/r/AsahiLinux/s/rulCtRUPqN

Jan Olšan said:
The Qualcomm devices due to the same lack of standards in Arm world

https://developer.arm.com/dev2/Architectures/ACPI.

Qualcomm doesn’t use ACPI yet. Ampere CPUs do support ACPI.
I suspect the Qualcomm laptops will get Linux support and will be usable after windows support dies, as Linux can run on anything.

adroc_thurston · Friday at 4:57 PM

poke01 said:
Qualcomm doesn’t use ACPI yet.

They do but the current crop of Glymur devices still shipped devicetrees.

poke01 · Sunday at 8:35 PM

The funniest part of the review is he said Panther lake in cpu performance was equivalent to a 12 core M2 Max, a 3 year old CPU on N5. Kinda shows how far behind Intel is.

LightningDust · 2026-04-13T10:54:43-0400

poke01 said:
M1 and M2 support Linux. M3 and later is being worked on.

https://www.reddit.com/r/AsahiLinux/s/rulCtRUPqN
View attachment 141578

https://developer.arm.com/dev2/Architectures/ACPI.

Qualcomm doesn’t use ACPI yet. Ampere CPUs do support ACPI.
I suspect the Qualcomm laptops will get Linux support and will be usable after windows support dies, as Linux can run on anything.

Snapdragon laptops most assuredly have ACPI. They just use PEP, which is unsupported on Linux.

Geddagod · 2026-04-13T20:28:24-0400

Looks like a 1MB L2 cache has been added for the M5 max P-core.
Question is why add it now?
Hopefully Geekerwan looks further into this + perf/IPC impacts on his video about the SOC.

adroc_thurston · 2026-04-13T20:29:10-0400

Geddagod said:
Looks like a 1MB L2 cache has been added for the M5 max P-core.

How is this news? M5 vanilla also has it.

Geddagod said:
Question is why add it now?

nT scaling.

Geddagod · 2026-04-13T20:42:40-0400

adroc_thurston said:
How is this news? M5 vanilla also has it.

Was not reported anywhere.
Hells, arstechnica doesn't even report it now.

adroc_thurston said:
nT scaling.

If the purpose of the pL2 is to keep traffic off the SL3 and help nT scaling, how much does this help since the number of cores per SL3 didn't change?
Both the M4 Max and M5 Max have a cluster of 6 P-cores on 16MB of SL2/3. If the M5 max had more cores in a cluster, that reasoning would make more sense.

LightningDust · 2026-04-13T20:49:46-0400

Geddagod said:
View attachment 141755
Looks like a 1MB L2 cache has been added for the M5 max P-core.
Question is why add it now?
Hopefully Geekerwan looks further into this + perf/IPC impacts on his video about the SOC.

It was talked about at length a few pages ago.

@name99 - do you believe in it yet?

adroc_thurston · 2026-04-13T20:51:25-0400

Geddagod said:
Was not reported anywhere.

Super visible on any dieshot and any latency ladder.

Geddagod said:
Hells, arstechnica doesn't even report it now.

>arsetechnica

Geddagod said:
If the purpose of the pL2 is to keep traffic off the SL3 and help nT scaling, how much does this help since the number of cores per SL3 didn't change?

Bandwidth.
6 cores with 64B@clk private L2 each are gucci.

Geddagod said:
how much does this help since the number of cores per SL3 didn't change?

Effective bandwidtidth in a bunch of nT loads should be a solid chunk higher.

Geddagod · 2026-04-13T21:06:22-0400

LightningDust said:
It was talked about at length a few pages ago.

Yes... I was there lol
Problem is that no one posted any proof of it. I am aware of littletree on baidu posting that earlier too, but never saw any latency tables/die shot showing that yet.

adroc_thurston said:
Super visible on any dieshot

No M5 die shots out yet

adroc_thurston said:
and any latency ladder.

First one I've seen. People just don't test that anymore.
Oh and the ss I posted is a month old lol, I just didn't see it till now. To be fair though, it's also posted on baidu...

adroc_thurston said:
>arsetechnica

I don't think any other website even talked about the cache hierarchy on the M5 max at all...

adroc_thurston said:
Effective bandwidtidth in a bunch of nT loads should be a solid chunk higher.

Was this much of a problem for them before anyway?

poke01 · 2026-04-13T21:19:45-0400

Intertesting info on Apple 9&10 GPUs here.

https://tieba.baidu.com/p/10618711073?pid=153388734461&cid=#153388734461

Geddagod · 2026-04-13T22:08:38-0400

poke01 said:
Intertesting info on Apple 9&10 GPUs here.

https://tieba.baidu.com/p/10618711073?pid=153388734461&cid=#153388734461

baidu really is a great (untapped IMO) resource.

adroc_thurston · 2026-04-13T22:10:02-0400

Geddagod said:
baidu really is a great (untapped IMO) resource.

That's because tech twitter is just idiots and grifters now.
Everyone interesting left or got hired by a variety of IHVs.

S'renne · 2026-04-13T22:15:28-0400

poke01 said:
Intertesting info on Apple 9&10 GPUs here.

https://tieba.baidu.com/p/10618711073?pid=153388734461&cid=#153388734461

That should go to a blog or something its a bit hard to read directly

NCU · 2026-04-14T00:02:11-0400

Geddagod said:
baidu really is a great (untapped IMO) resource.

Sometimes, that particular thread isn't great barring 2 or 3 people there, most of the discussion revolves around the nonsensical RT levels from Imagination PR.

Also not only particular to that thread, but the amount of people that regards SER as a second coming of christ is becoming insufferable, simply because NV forced it in SM6.9. When there are better ways to push for ray coherence sorting and help with material shading not destroying the SIMDs. Not that SER it's bad in itself, but it's more like a band-aid to the problem.

Geddagod · 2026-04-14T04:35:09-0400

@Nothingness do u have a zhihu account?

Nothingness · 2026-04-14T04:41:05-0400

Geddagod said:
@Nothingness do u have a zhihu account?

I didn't even know what zhihu is and I tend to stay away from Chinese sites (and from most social networks).

The Hardcard · 2026-04-14T11:20:55-0400

Geddagod said:
Was not reported anywhere.
Hells, arstechnica doesn't even report it now.
View attachment 141757

If the purpose of the pL2 is to keep traffic off the SL3 and help nT scaling, how much does this help since the number of cores per SL3 didn't change?
Both the M4 Max and M5 Max have a cluster of 6 P-cores on 16MB of SL2/3. If the M5 max had more cores in a cluster, that reasoning would make more sense.

I think it related to the increased latency incurred by having the CPU on a separate die from the SLC and memory controllers. Apple Silicon has been a GPU with memory and SLC that then feeds some of that bandwidth to the CPU, ANE, and the rest of the SOC.

While we are still waiting for die shots and proper full microbenching, I still believe that the entire memory interface and SLC are on the GPU die so the CPU die has to use the Fusion die to die interconnect. I suspect the private L2 helps mitigate the added latency from the newly introduced hop.

Geddagod · 2026-04-14T11:31:58-0400

Nothingness said:
I didn't even know what zhihu is and I tend to stay away from Chinese sites (and from most social networks).

Ah sorry I was going off memory, I wanted to ping @name99 given he has previously linked the site and talked about registering here:

name99 said:
What's the difference between a Fetch Predictor and a Branch Predictor?
When should a Branch Predictor deliver predictions?
What's the data provided by a Fetch Predictor?
Why am I using the term Fetch Predictor, not BTB?

Sorry, but there is something about Fetch, even more than the rest of the CPU, that makes the x86 contingent lose their goddamn minds. EVERY TIME I try to discuss the issue, it's like talking to a brick wall. I'm sick of it and wasting time on it.

Here's the proof. Verilator is by far the toughest "easy-ish to measure" load from the point of view of the FRONT-END.
SPEC is useless for testing front end, the code working set is tiny.
"Server" workloads are what you want, but most of those are difficult to run, verilator is the one fairly easy case to run.
View attachment 112712
This is from James Aslan's site, https://zhuanlan.zhihu.com/p/704707254, which being a mainland site is a freaking pain in the ass to deal with! You will have to register if you want to see anything, and you will never be able to comment because commenting requires a second stage of registration that requires a mainland phone number.

Regardless, the point is that when we push the front-end hard, ARM (and especially team Apple/ex-Apple) does vastly better than team x86. [Zen 4 is basically the same sort of level as Intel].
And yet team x86 refuse to listen every time you tell them they are doing it wrong...

(Firestorm was M1, Avalanche was M2.
Even Blizzard, the M2 small core does slightly better than raptor cove! That's on a different graph, but it achieves 2.50 on VTop1.)

OK, with all that rant out the way, go read:

GitHub - name99-org/AArch64-Explore

Contribute to name99-org/AArch64-Explore development by creating an account on GitHub.

github.com

especially volume 4. That will tell you how to handle instruction flow PROPERLY.

Discussion Apple Silicon SoC thread

Lifer

Diamond Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Member

Golden Member

Diamond Member

Golden Member

Member

Diamond Member

Golden Member

Diamond Member

Golden Member

Diamond Member

Member

Junior Member

Golden Member

Diamond Member

Senior member

Golden Member