Discussion Apple Silicon SoC thread

Eug · Nov 10, 2020

M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:

M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:

Page 78 - Discussion - Apple Silicon SoC thread

Page 78 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M1 Ultra discussion here:

Page 109 - Discussion - Apple Silicon SoC thread

Page 109 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M2 discussion here:

Page 127 - Discussion - Apple Silicon SoC thread

Page 127 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:

Page 215 - Discussion - Apple Silicon SoC thread

Page 215 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M4 Family discussion here:

Page 263 - Discussion - Apple Silicon SoC thread

Page 263 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

MS_AT · Mar 7, 2025

Eug said:
M3 Ultra doesn't do so hot in Geekbench 6 CPU, vs. M4 Max.

M4 does have SME going on for it, what further boosts the score.

Nothingness · Mar 7, 2025

MS_AT said:
M4 does have SME going on for it, what further boosts the score.

Exactly, and to illustrate that: https://browser.geekbench.com/v6/cpu/compare/10895775?baseline=10898551

Frequency also is lower (4.05GHz vs 4.5GHz).

moinmoin · Mar 7, 2025

jdubs03 said:
Yeah there are diminishing returns with more cores in GB6;

There is essentially no returns past 6 cores except for one single subtest. Furthermore the "MT" score rewards fast cache setups at low amounts of cores, something Apple is excelling at and improving further with every gen.

GB6's "MT" is a waste of time outside of core limited single task consumer hardware.

mvprod123 · Mar 7, 2025

Eug said:
M3 Ultra doesn't do so hot in Geekbench 6 CPU, vs. M4 Max. I guess the real test will be for the GPU.

M3 Ultra:

Mac15,14 - Geekbench

Benchmark results for a Mac15,14 with an Apple M3 Ultra processor.

browser.geekbench.com

View attachment 119156

M4 Max vs. M3 Ultra:

MacBook Pro (16-inch, 2024) vs Mac15,14 - Geekbench

View attachment 119157

M3 Ultra Geekbench GPU

Mac15,14 - Geekbench

Benchmark results for a Mac15,14 with an Apple M3 Ultra processor.

browser.geekbench.com

smalM · Mar 7, 2025

moinmoin said:
GB6's "MT" is a waste of time outside of core limited single task consumer hardware.

GB6's MT is a cooperative test.
It shows you what you can expect from a many-core CPU on such tasks.
It is not a waste of time, it is "do you know what you meassure with this test?"
The test you are looking for is GB5 MT.

moinmoin · Mar 7, 2025

smalM said:
GB6's MT is a cooperative test.

No, it's mostly an uncooperative limited amount of threads single task test not worth the "multi" denomination it keeps getting.

smalM said:
it is "do you know what you meassure with this test?"

Not "MT" in any case.

naukkis · Mar 7, 2025

moinmoin said:
No, it's mostly an uncooperative limited amount of threads single task test not worth the "multi" denomination it keeps getting.

Not "MT" in any case.

Do you see MT score is a bit higher than ST? That's how those workloads will scale to MT. For most use cases MT performance measured by running multiple independent ST threads concurrently won't tell anything about MT performance. And because of Amdahls law ST performance is very important part of MT-scaling. And that co-operative MT scaling is what actually users need - that score will present chips MT performance on well programmed MT workloads including gaming. GB5 or Cinebench won't tell you anything about those MT performance.

Nothingness · Mar 7, 2025

moinmoin said:
No, it's mostly an uncooperative limited amount of threads single task test not worth the "multi" denomination it keeps getting.

The renderer test scales well. Others show limitations (either due to structurally not scaling, like most software, or poor choice/programming of workload).

I agree that using the global MT score doesn't show a good picture for MT scaling. But OTOH using benchmarks that have perfect scaling is as stupid and misleading.

mvprod123 · Mar 7, 2025

M4 Max Mac Studio

Mac16,9 - Geekbench

Benchmark results for a Mac16,9 with an Apple M4 Max processor.

browser.geekbench.com

mvprod123 · Mar 7, 2025

M3 Ultra Neural Engine

Mac15,14 - Geekbench

Benchmark results for a Mac15,14 with an Apple M3 Ultra processor.

browser.geekbench.com

soresu · Mar 7, 2025

Kinda expected the TOPS figures to be higher given Strix Point has 50.

trivik12 · Mar 7, 2025

M3 Ultra is so mid for the price at which Mac Studio is selling. Thank fully there are enough Apple loonies to buy this over priced system and I am happy as an AAPL stock holder

Eug · Mar 7, 2025

trivik12 said:
M3 Ultra is so mid for the price at which Mac Studio is selling. Thank fully there are enough Apple loonies to buy this over priced system and I am happy as an AAPL stock holder

The Mac Studio Mx Ultra probably sells less than 2% out of all Macs.

oak8292 · Mar 7, 2025

trivik12 said:
M3 Ultra is so mid for the price at which Mac Studio is selling. Thank fully there are enough Apple loonies to buy this over priced system and I am happy as an AAPL stock holder

My speculation is that Apple is buying most of these for their server farms. The I/O was improved for DRAM and storage. I will guess they ‘benchmark’ just fine on the loads that Apple wants in the cloud.

okoroezenwa · Mar 7, 2025

trivik12 said:
M3 Ultra is so mid for the price at which Mac Studio is selling. Thank fully there are enough Apple loonies to buy this over priced system and I am happy as an AAPL stock holder

Sure thing 👍

okoroezenwa · Mar 7, 2025

soresu said:
Kinda expected the TOPS figures to be higher given Strix Point has 50.

How come? Seems like it’s in line with other M3 devices.

soresu · Mar 7, 2025

okoroezenwa said:
How come? Seems like it’s in line with other M3 devices.

Was talking about M4 Max.

Doug S · Mar 7, 2025

moinmoin said:
No, it's mostly an uncooperative limited amount of threads single task test not worth the "multi" denomination it keeps getting.

Not "MT" in any case.

In the real world loads that increase in performance with every core you throw at it are the exception not the rule. Once there is any interdependence between threads there is a point where more cores doesn't help at all, and in some cases can even hurt.

I think John ought to keep the current MT test and add back the GB5 style test, and label them "cooperative MT" and "parallel MT" respectively. Then people who want CB or SPECrate type numbers that go up and up until you run out of memory bandwidth (or memory) can get that, while people who care about stuff like cross core/cross cluster/cross chiplet synchronization overhead can perform relevant comparisons.

That said, I'm not sure GB6 is really testing what I'm talking about. I haven't looked into it, I don't pay much attention to that OR CB like numbers, since for the most part everything has more than enough cores for the kind of stuff I do.

naukkis · Mar 8, 2025

Doug S said:
In the real world loads that increase in performance with every core you throw at it are the exception not the rule. Once there is any interdependence between threads there is a point where more cores doesn't help at all, and in some cases can even hurt.

I think John ought to keep the current MT test and add back the GB5 style test, and label them "cooperative MT" and "parallel MT" respectively. Then people who want CB or SPECrate type numbers that go up and up until you run out of memory bandwidth (or memory) can get that, while people who care about stuff like cross core/cross cluster/cross chiplet synchronization overhead can perform relevant comparisons.

That said, I'm not sure GB6 is really testing what I'm talking about. I haven't looked into it, I don't pay much attention to that OR CB like numbers, since for the most part everything has more than enough cores for the kind of stuff I do.

Multithreading is actually getting best performance out of process. GB6 has MT test. GB5 and Spec doesn't even try to measure MT-performance, they are just throughput tests that measure single-thread performance throughput with n-copies runt concurrently. MT-performance need fast cores and fast interconnect between cores - single thread n-rate test only needs cores and memory bandwidth. N-rate tests are totally unrelated to any desktop or mobile workloads and should not be referred as MT-benchmarks. Spec got it right but GB5 just outright lies about being MT-benchmark.

smalM · Mar 10, 2025

M3 Ultra GB5 CPU
M3 Ultra GB5 Compute

moinmoin · Mar 10, 2025

Sorry for going off topic with this line of discussion, I'll stop after this round of responses.

naukkis said:
Do you see MT score is a bit higher than ST? That's how those workloads will scale to MT. For most use cases MT performance measured by running multiple independent ST threads concurrently won't tell anything about MT performance. And because of Amdahls law ST performance is very important part of MT-scaling. And that co-operative MT scaling is what actually users need - that score will present chips MT performance on well programmed MT workloads including gaming. GB5 or Cinebench won't tell you anything about those MT performance.

What you are talking about is MT capability of software, not hardware. I would wager that the majority of people run benchmarks to see the highest possible performance of their system's hardware, not some software (which in GB's case is completely opaque to boot). Most will try the software in question if they want to know how it performs; most won't even think of looking at GB6's "MT" for that purpose. And on your PC do you kill all tasks to run exactly one task? Then GB6's "MT" is perfect for you indeed. For all others the traditional hardware MT is much more reflective of how the hardware will perform when many applications run at the same time, them being multi threated or not.

Nothingness said:
The renderer test scales well.

I referred to that subtest before.

moinmoin said:
There is essentially no returns past 6 cores except for one single subtest.

Nothingness said:
Others show limitations (either due to structurally not scaling, like most software, or poor choice/programming of workload).

I agree that using the global MT score doesn't show a good picture for MT scaling. But OTOH using benchmarks that have perfect scaling is as stupid and misleading.

I think most PC users don't worry about scaling per se, that's a topic for programmers. What PC users worry about is of their PCs slowing down where they don't want it. Now if somebody buys a PC with many cores it's for being able to throw more stuff at its CPU. Throwing most stuff at a CPU usually doesn't mean heavier applications (this would get into details for programmers again, many applications being constrained by ST or very limited MT etc.) but exactly that, more applications at the same time without the PC bogging down.

Doug S said:
In the real world loads that increase in performance with every core you throw at it are the exception not the rule. Once there is any interdependence between threads there is a point where more cores doesn't help at all, and in some cases can even hurt.

I think John ought to keep the current MT test and add back the GB5 style test, and label them "cooperative MT" and "parallel MT" respectively. Then people who want CB or SPECrate type numbers that go up and up until you run out of memory bandwidth (or memory) can get that, while people who care about stuff like cross core/cross cluster/cross chiplet synchronization overhead can perform relevant comparisons.

That said, I'm not sure GB6 is really testing what I'm talking about. I haven't looked into it, I don't pay much attention to that OR CB like numbers, since for the most part everything has more than enough cores for the kind of stuff I do.

Real world loads are many of them at the same time, sometimes interfering with each other. It's the exception that exactly one load runs completely inhibited for its whole run, but exactly this exception is what is being used for all kinds of benchmarks as it's the scientific reproduceable clean room approach. But nobody is using PCs this way.

While all real MT benchmarks so far are bad for different reasons they mostly do at least do one job sufficiently well: Using all available multicore performance and as such show the user a theoretical upper limit for a given system.

GB6's "MT" though combines the worst of both worlds: It's still the clean room approach, it benchmarks some opaque software's multithreading capability (which stops scaling beyond 6 cores except for one subtest) on a given hardware while using the term traditionally used to benchmarks the limits of the given hardware. While that kind of benchmark has its use (it's excellent for showing the quality of a CPU's cache setup, which - on topic - is what Apple excels at) it should be better referred to as some extended ST in the sense of a "single task" + "limited multithreading" benchmark.

naukkis said:
Multithreading is actually getting best performance out of process. GB6 has MT test. GB5 and Spec doesn't even try to measure MT-performance, they are just throughput tests that measure single-thread performance throughput with n-copies runt concurrently. MT-performance need fast cores and fast interconnect between cores - single thread n-rate test only needs cores and memory bandwidth. N-rate tests are totally unrelated to any desktop or mobile workloads and should not be referred as MT-benchmarks. Spec got it right but GB5 just outright lies about being MT-benchmark.

I see it the completely inverse way. But you show it's mostly a matter of wording:

MT is traditionally used to refer to N-rate tests. And people buying PCs with many cores are usually more interested in overall throughput possible than in a single task's cooperative performance. For the latter most would look at ST instead, which is why I would propose to term "cooperative MT" more as an extension of ST than a replacement of how MT is commonly understood up to now.

naukkis · Mar 10, 2025

moinmoin said:
MT is traditionally used to refer to N-rate tests. And people buying PCs with many cores are usually more interested in overall throughput possible than in a single task's cooperative performance. For the latter most would look at ST instead, which is why I would propose to term "cooperative MT" more as an extension of ST than a replacement of how MT is commonly understood up to now.

In that is case Intel got its hybrid scheme wrong - they only need one big core for ST and as many as possible small cores for best throughput. But in reality there really aren't stressful plain ST workloads that matter, everything is at least somehow multithreaded. But process multithreading in most cases only scales up to about 8 threads - because of Amdahls law.

So average Joe when deciding his cpu and comparing cpus should focus on GB6 MT - not plain ST and sure not those single-thread n-rate tests. With GB6 MT even nowadays gaming performance is pretty comparable to those MT results. For those users that need n-rate performance - they basically doesn't even need benchmarks - they can need to compare core counts and memory bandwidth - more the better. But 99% of desktop users and about 100% mobile users does not get more performance with more cores. Showing them benchmarks that show otherwise will only be beneficial for hardware makers - they can sell more expensive hardware for people which won't perform any better in real use cases. In some cases they actually might be worse option - and to overcome that fact those hardware makers need to offer "performance utilities" that just plain disable badly performing cores from lowering user experience.

MS_AT · Mar 10, 2025

naukkis said:
With GB6 MT even nowadays gaming performance is pretty comparable to those MT results.

9800x3d begs to differ

unless you claim 9950x or newest Intel cpus are better at gaming than it

Nothingness · Mar 10, 2025

MS_AT said:
9800x3d begs to differ unless you claim 9950x or newest Intel cpus are better at gaming than it

That's because some of the GB6 MT tests scales much better than game engines. Half kidding 😉

Anyway, game engines are targetting a limited number of cores and have a potential hard bottleneck out of their control (GPU and their drivers), so it's unsurprising they don't scale well beyond a point.

mikegg · Mar 11, 2025

trivik12 said:
M3 Ultra is so mid for the price at which Mac Studio is selling. Thank fully there are enough Apple loonies to buy this over priced system and I am happy as an AAPL stock holder

M3 Ultra is a bargain for those wanting to run DeepSeek at home or other large LLM models.

Even compared to workstations such as the ones Puget Systems sells, it's a bargain.

32 core Threadripper + 512GB of RAM + RTX 4060ti is already $12k.

Meanwhile, an M3 Ultra has 32 core CPU and a 160 core GPU with 512GB of 819GB/s RAM for $9.5k.

Discussion Apple Silicon SoC thread

Lifer

Senior member

Diamond Member

Diamond Member

Senior member

Member

Diamond Member

Golden Member

Diamond Member

Senior member

Senior member

Diamond Member

Senior member

Lifer

Member

Member

Member

Diamond Member

Diamond Member

Golden Member

Member

Diamond Member

Golden Member

Senior member

Diamond Member

Golden Member