• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Discussion Apple Silicon SoC thread

Page 485 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Eug

Lifer
M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:

Screen-Shot-2021-10-18-at-1.20.47-PM.jpg

M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:


M1 Ultra discussion here:


M2 discussion here:


Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:


M4 Family discussion here:


M5 Family discussion here:

 
Last edited:
What does Apple use for AI compute needs? just Cloud credits from Amazon/Alphabet? They are not creating their own models and so its all inference at the moment.

They are creating their own models, they just aren't good enough yet hence the deal to essentially buy a fork of Gemini to use for now.

They've been using M2 Ultra based servers for running their own stuff for a while now. Undoubtedly they also use some external cloud compute as well. They seem to be pretty judicious about building their own capacity, they always use third party cloud as "overflow" even for iCloud storage.

That has turned out to be a really good move in hindsight since more and more countries look to be following China's lead of requiring cloud storage for their their citizens be on servers controlled by companies that are subject to their laws and not those of the US.
 
Because Microsoft dead-ended their own hardware. So consumers then face a choice to move forward - a new PC that Microsoft may dead-end again, or a Mac.
In reality, today's Macs will be "dead-ended" by Apple long before today's x86 PCs lose Windows support. Just look at history.

You can still put Windows 11 on 2008-2011 computers. It's not officially supported but it works and they get all the security patches. What do you get from Apple for comparable machines?
(And Linux users can add unknown number of extra years after any hypothetical date at which these computers actually stop working with Windows, whereas I can't confidently expect a "life after Apple" for M series MacBooks/MacMinis with how the Linux support limps on, currently.)

Possible exception: The Qualcomm devices due to the same lack of standards in Arm world. Those are probably the only Windows computers where I worry about them possibly staying usable for less than 10 years due to potential end of Windows support.
Enter one at $599.
Like there aren't enough Windows laptops at that price point...
 
Last edited:
You can still put Windows 11 on 2008-2011 computers. It's not officially supported but it works and they get all the security patches. What do you get from Apple for comparable machines?
Not officially but you can get full support via opencore on older x86 Mac’s which will run latest macOS.
Like there aren't enough Windows laptops at that price point...

Those people who couldn’t spend $1000 on a MacBook would’ve gotten a cheaper Windows laptop especially students. Now you can get a MacBook as low as $500. That will impact sales on lower end windows laptops and the Neo has been selling very well.
 
And Linux users can add unknown number of extra years after any hypothetical date at which these computers actually stop working with Windows, whereas I can't confidently expect a "life after Apple" for M series MacBooks/MacMinis with how the Linux support limps on, currently
M1 and M2 support Linux. M3 and later is being worked on.
IMG_3430.jpeg
The Qualcomm devices due to the same lack of standards in Arm world

Qualcomm doesn’t use ACPI yet. Ampere CPUs do support ACPI.
I suspect the Qualcomm laptops will get Linux support and will be usable after windows support dies, as Linux can run on anything.
 

The funniest part of the review is he said Panther lake in cpu performance was equivalent to a 12 core M2 Max, a 3 year old CPU on N5. Kinda shows how far behind Intel is.
 
M1 and M2 support Linux. M3 and later is being worked on.
View attachment 141578


Qualcomm doesn’t use ACPI yet. Ampere CPUs do support ACPI.
I suspect the Qualcomm laptops will get Linux support and will be usable after windows support dies, as Linux can run on anything.

Snapdragon laptops most assuredly have ACPI. They just use PEP, which is unsupported on Linux.
 
1776126416374.png
Looks like a 1MB L2 cache has been added for the M5 max P-core.
Question is why add it now?
Hopefully Geekerwan looks further into this + perf/IPC impacts on his video about the SOC.
 
How is this news? M5 vanilla also has it.
Was not reported anywhere.
Hells, arstechnica doesn't even report it now.
1776127029432.png
nT scaling.
If the purpose of the pL2 is to keep traffic off the SL3 and help nT scaling, how much does this help since the number of cores per SL3 didn't change?
Both the M4 Max and M5 Max have a cluster of 6 P-cores on 16MB of SL2/3. If the M5 max had more cores in a cluster, that reasoning would make more sense.
 
Was not reported anywhere.
Super visible on any dieshot and any latency ladder.
Hells, arstechnica doesn't even report it now.
>arsetechnica
If the purpose of the pL2 is to keep traffic off the SL3 and help nT scaling, how much does this help since the number of cores per SL3 didn't change?
Bandwidth.
6 cores with 64B@clk private L2 each are gucci.
how much does this help since the number of cores per SL3 didn't change?
Effective bandwidtidth in a bunch of nT loads should be a solid chunk higher.
 
It was talked about at length a few pages ago.
Yes... I was there lol
Problem is that no one posted any proof of it. I am aware of littletree on baidu posting that earlier too, but never saw any latency tables/die shot showing that yet.
Super visible on any dieshot
No M5 die shots out yet
and any latency ladder.
First one I've seen. People just don't test that anymore.
Oh and the ss I posted is a month old lol, I just didn't see it till now. To be fair though, it's also posted on baidu...
>arsetechnica
I don't think any other website even talked about the cache hierarchy on the M5 max at all...
Effective bandwidtidth in a bunch of nT loads should be a solid chunk higher.
Was this much of a problem for them before anyway?
1776128774545.png
 
baidu really is a great (untapped IMO) resource.
Sometimes, that particular thread isn't great barring 2 or 3 people there, most of the discussion revolves around the nonsensical RT levels from Imagination PR.

Also not only particular to that thread, but the amount of people that regards SER as a second coming of christ is becoming insufferable, simply because NV forced it in SM6.9. When there are better ways to push for ray coherence sorting and help with material shading not destroying the SIMDs. Not that SER it's bad in itself, but it's more like a band-aid to the problem.
 
Was not reported anywhere.
Hells, arstechnica doesn't even report it now.
View attachment 141757

If the purpose of the pL2 is to keep traffic off the SL3 and help nT scaling, how much does this help since the number of cores per SL3 didn't change?
Both the M4 Max and M5 Max have a cluster of 6 P-cores on 16MB of SL2/3. If the M5 max had more cores in a cluster, that reasoning would make more sense.
I think it related to the increased latency incurred by having the CPU on a separate die from the SLC and memory controllers. Apple Silicon has been a GPU with memory and SLC that then feeds some of that bandwidth to the CPU, ANE, and the rest of the SOC.

While we are still waiting for die shots and proper full microbenching, I still believe that the entire memory interface and SLC are on the GPU die so the CPU die has to use the Fusion die to die interconnect. I suspect the private L2 helps mitigate the added latency from the newly introduced hop.
 
I didn't even know what zhihu is and I tend to stay away from Chinese sites (and from most social networks).
Ah sorry I was going off memory, I wanted to ping @name99 given he has previously linked the site and talked about registering here:
What's the difference between a Fetch Predictor and a Branch Predictor?
When should a Branch Predictor deliver predictions?
What's the data provided by a Fetch Predictor?
Why am I using the term Fetch Predictor, not BTB?

Sorry, but there is something about Fetch, even more than the rest of the CPU, that makes the x86 contingent lose their goddamn minds. EVERY TIME I try to discuss the issue, it's like talking to a brick wall. I'm sick of it and wasting time on it.

Here's the proof. Verilator is by far the toughest "easy-ish to measure" load from the point of view of the FRONT-END.
SPEC is useless for testing front end, the code working set is tiny.
"Server" workloads are what you want, but most of those are difficult to run, verilator is the one fairly easy case to run.
View attachment 112712
This is from James Aslan's site, https://zhuanlan.zhihu.com/p/704707254, which being a mainland site is a freaking pain in the ass to deal with! You will have to register if you want to see anything, and you will never be able to comment because commenting requires a second stage of registration that requires a mainland phone number.

Regardless, the point is that when we push the front-end hard, ARM (and especially team Apple/ex-Apple) does vastly better than team x86. [Zen 4 is basically the same sort of level as Intel].
And yet team x86 refuse to listen every time you tell them they are doing it wrong...

(Firestorm was M1, Avalanche was M2.
Even Blizzard, the M2 small core does slightly better than raptor cove! That's on a different graph, but it achieves 2.50 on VTop1.)

OK, with all that rant out the way, go read:
especially volume 4. That will tell you how to handle instruction flow PROPERLY.
 
Back
Top