Discussion Apple Silicon SoC thread

Eug · Nov 10, 2020

M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:

M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:

Page 78 - Discussion - Apple Silicon SoC thread

Page 78 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M1 Ultra discussion here:

Page 109 - Discussion - Apple Silicon SoC thread

Page 109 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M2 discussion here:

Page 127 - Discussion - Apple Silicon SoC thread

Page 127 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:

Page 215 - Discussion - Apple Silicon SoC thread

Page 215 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M4 Family discussion here:

Page 263 - Discussion - Apple Silicon SoC thread

Page 263 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M5 Family discussion here:

Page 431 - Discussion - Apple Silicon SoC thread

Page 431 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

Nothingness · Saturday at 4:43 AM

511 said:
basically if any single part screws up you are screwed

Yes, everything has to be balanced, and that's very difficult to achieve and can be difficult to measure in advance. Note this applies to all parts of a design, though mistakes in the memory hierarchy tend to strike harder.

I think he's the real GW (this perhaps means he's done painting his house 😀).

511 · Saturday at 5:05 AM

Nothingness said:
Yes, everything has to be balanced, and that's very difficult to achieve and can be difficult to measure in advance. Note this applies to all parts of a design, though mistakes in the memory hierarchy tend to strike harder.

we have seen practical example for 2 gens in a HVM product 🤣 🤣

Gerard Williams · Saturday at 10:20 AM

511 said:
basically if any single part screws up you are screwed i guess also are you the real GWIII?

Yes i am the real one and the painting is not finished. You plan for balance not screw ups. You remove tail latency issues. Build queues and caching structures for throughput and parallelism.

511 · Saturday at 10:46 AM

Gerard Williams said:
Yes i am the real one and the painting is not finished. You plan for balance not screw ups. You remove tail latency issues. Build queues and caching structures for throughput and parallelism.

Good Luck with your painting

MerryCherry · Saturday at 10:59 AM

Nothingness said:
Yes, everything has to be balanced, and that's very difficult to achieve and can be difficult to measure in advance. Note this applies to all parts of a design, though mistakes in the memory hierarchy tend to strike harder.

This is what struck me when looking at uarch diagrams of Apple (and also Qualcomm Oryon) CPU designs. They appear to be very well 'balanced'.

This really shows in the PPA of those cores. Apple's P-cores have similar area to ARM's Cortex X, while having superior performance and efficiency.

Oryon Prime cores have similar peak performance to Apple's P-cores, while being about a gen behind in IPC/performance-per-watt, but the core is only 3/4 the size.

Momoka_ · Saturday at 11:03 AM

Gerard Williams said:
Yes i am the real one and the painting is not finished. You plan for balance not screw ups. You remove tail latency issues. Build queues and caching structures for throughput and parallelism.

Enjoying your painting!

mvprod123 · Saturday at 11:49 AM

https://twitter.com/x/status/2027777240772317622

Doug S · Saturday at 6:46 PM

Gerard Williams said:
Yes i am the real one and the painting is not finished.

If you find you really enjoy it you're welcome to come paint my garage. It could use a couple fresh coats!

mvprod123 · 2026-03-02T06:43:12-0500

https://twitter.com/x/status/2028287465799442770

"Apple ANE Successfully Reverse-Engineered! Is the 38 TOPS Performance Just a Numbers Game?

I just came across a hardcore open-source project by the blogger maderix: he reverse-engineered Apple’s private APIs, bypassed CoreML, and managed to run neural network training directly on the Apple Neural Engine (ANE)!

Wait — what exactly is ANE?
The ANE is the neural network accelerator inside Apple silicon chips. On the M4, it currently features 16 compute cores, and Apple officially claims 38 TOPS of performance. However, it has always been a black box: you can only access it through the CoreML framework. There are no public APIs, no documentation, no ISA — nothing.

So this guy basically peeled away the CoreML layer. Using reverse-engineering techniques (such as dyld_info scanning and method swizzling to intercept CoreML calls), he reconstructed the entire compilation and execution pipeline. Most importantly, he figured out the in-memory compilation path, allowing MIL (similar to NVIDIA’s PTX) to be compiled directly into ANE binaries in memory. This potentially makes training large models on ANE much more feasible.

During the reverse-engineering process, several explosive findings emerged:

First, ANE is fundamentally a convolution engine, not a matrix multiplication engine. If you rewrite the same computation as a convolution, throughput can increase by up to 3×. Apple’s own ml-ane-transformers reference implementation hints at this pattern, but they’ve never stated it explicitly.

Second, ANE appears to contain roughly 32MB of on-chip SRAM. This was inferred from performance cliffs observed during matrix multiplication scaling tests.

Third, a single operator can only achieve about 30% of ANE’s peak performance. That’s because the 16 ANE cores are organized in a pipeline. If you submit only one operation, most cores remain idle. To fully utilize the hardware, you need to chain together 16–64 operations in a single computation graph submission. That way, different cores can process different pipeline stages simultaneously, pushing utilization up to around 94%.

Finally — and perhaps most controversially — the “38 TOPS” figure may be a numbers game. The author ran identical operations in FP16 and INT8 and observed identical throughput. The conclusion: when executing INT8 workloads, ANE likely dequantizes them to FP16 internally before computation. Apple’s “38 TOPS INT8” claim may simply be 19 TFLOPS FP16 multiplied by two — essentially a marketing figure. The real peak performance appears to be 19 TFLOPS FP16.

Another interesting detail: ANE features hardware-level power gating. When idle, its power consumption is truly 0 mW — not low-power standby, but completely powered off with zero leakage. That level of power management is seriously impressive and extremely mobile-friendly.

Of course, beyond the performance claims, the reverse-engineering process itself is highly educational. The two blog posts are packed with technical depth — far more than I can summarize here. If you’re interested, I highly recommend reading the original article, “inside-the-m4-apple-neural-engine.” This is just a brief introduction to spark your curiosity."

Inside the M4 Apple Neural Engine, Part 1: Reverse Engineering

How we bypassed CoreML and talked directly to the hardware

maderix.substack.com

Inside the M4 Apple Neural Engine, Part 2: ANE Benchmarks

Measuring the real performance of Apple's neural accelerator

maderix.substack.com

mvprod123 · 2026-03-02T09:07:16-0500

Apple introduces iPhone 17e

Apple today announced the new iPhone 17e, a powerful and affordable addition to the iPhone 17 lineup.

www.apple.com

iPhone 17e
A19 (4-core GPU)
N1+C1X
Starting storage at 256GB
MagSafe
Ceramic Shield 2 with improved anti-reflection

Apple introduces the new iPad Air, powered by M4

Apple announced the new iPad Air featuring M4 and more memory, giving users a big jump in performance and making it more versatile than ever.

www.apple.com

iPad Air
M4 (8-core CPU with 3 P-cores + 5 E-cores and a 9-core GPU)
N1+C1X

MerryCherry · 2026-03-02T10:05:05-0500

So even the iPhone 17e has 256 GB base storage.

So I doubt the low cost Macbook will have a 128 GB option.

They should bump up the Macbook Air's base to 512 GB, and the Macbook Pro to 1 TB.

mvprod123 · 2026-03-02T10:19:50-0500

MerryCherry said:
So even the iPhone 17e has 256 GB base storage.

So I doubt the low cost Macbook will have a 128 GB option.

They should bump up the Macbook Air's base to 512 GB, and the Macbook Pro to 1 TB.

The iPad Air still has 128 GB in the base model.

Eug · 2026-03-02T10:32:23-0500

The M4 iPad Air got 12 GB RAM. That surprised me, especially since my M4 iPad Pro only has 8 GB. This means that M4 has variants with 8 GB, 12 GB, 16 GB, 24 GB, and 32 GB. I wonder just how many of those base RAM M4 iPad Pros actually have 12 GB RAM (with 4 GB inactive). All of them, or just some?

The base 256 GB storage in the 17e also surprised me. The 17e getting MagSafe was no surprise though, but then again that didn't really affect my kid with the 16e, since a $10 MagSafe case was all that was necessary to get the magnetic mount. (The charging speed is slower on the 16e, but that hasn't been a real world issue since my kid almost always just charges overnight anyways.)

poke01 · 2026-03-02T13:37:05-0500

Eug said:
The M4 iPad Air got 12 GB RAM.

I think they did this cause, by the end of the year the iPad mini will get the A19 pro/A20 Pro which has 12GB RAM. So the iPad Air can’t have less RAM than the mini

johnsonwax · 2026-03-02T13:57:09-0500

So, base RAM and storage going up, prices staying the same. I guess either Apple's contracts are still holding or they are eating the cost.

mvprod123 · 2026-03-02T14:06:05-0500

johnsonwax said:
So, base RAM and storage going up, prices staying the same. I guess either Apple's contracts are still holding or they are eating the cost.

Probably supplies from previous contracts.

name99 · 2026-03-02T14:30:48-0500

mvprod123 said:
https://twitter.com/x/status/2028287465799442770

"Apple ANE Successfully Reverse-Engineered! Is the 38 TOPS Performance Just a Numbers Game?

I just came across a hardcore open-source project by the blogger maderix: he reverse-engineered Apple’s private APIs, bypassed CoreML, and managed to run neural network training directly on the Apple Neural Engine (ANE)!

Wait — what exactly is ANE?
The ANE is the neural network accelerator inside Apple silicon chips. On the M4, it currently features 16 compute cores, and Apple officially claims 38 TOPS of performance. However, it has always been a black box: you can only access it through the CoreML framework. There are no public APIs, no documentation, no ISA — nothing.

So this guy basically peeled away the CoreML layer. Using reverse-engineering techniques (such as dyld_info scanning and method swizzling to intercept CoreML calls), he reconstructed the entire compilation and execution pipeline. Most importantly, he figured out the in-memory compilation path, allowing MIL (similar to NVIDIA’s PTX) to be compiled directly into ANE binaries in memory. This potentially makes training large models on ANE much more feasible.

During the reverse-engineering process, several explosive findings emerged:

First, ANE is fundamentally a convolution engine, not a matrix multiplication engine. If you rewrite the same computation as a convolution, throughput can increase by up to 3×. Apple’s own ml-ane-transformers reference implementation hints at this pattern, but they’ve never stated it explicitly.

Second, ANE appears to contain roughly 32MB of on-chip SRAM. This was inferred from performance cliffs observed during matrix multiplication scaling tests.

Third, a single operator can only achieve about 30% of ANE’s peak performance. That’s because the 16 ANE cores are organized in a pipeline. If you submit only one operation, most cores remain idle. To fully utilize the hardware, you need to chain together 16–64 operations in a single computation graph submission. That way, different cores can process different pipeline stages simultaneously, pushing utilization up to around 94%.

Finally — and perhaps most controversially — the “38 TOPS” figure may be a numbers game. The author ran identical operations in FP16 and INT8 and observed identical throughput. The conclusion: when executing INT8 workloads, ANE likely dequantizes them to FP16 internally before computation. Apple’s “38 TOPS INT8” claim may simply be 19 TFLOPS FP16 multiplied by two — essentially a marketing figure. The real peak performance appears to be 19 TFLOPS FP16.

Another interesting detail: ANE features hardware-level power gating. When idle, its power consumption is truly 0 mW — not low-power standby, but completely powered off with zero leakage. That level of power management is seriously impressive and extremely mobile-friendly.

Of course, beyond the performance claims, the reverse-engineering process itself is highly educational. The two blog posts are packed with technical depth — far more than I can summarize here. If you’re interested, I highly recommend reading the original article, “inside-the-m4-apple-neural-engine.” This is just a brief introduction to spark your curiosity."

Inside the M4 Apple Neural Engine, Part 1: Reverse Engineering

How we bypassed CoreML and talked directly to the hardware

maderix.substack.com

Inside the M4 Apple Neural Engine, Part 2: ANE Benchmarks

Measuring the real performance of Apple's neural accelerator

maderix.substack.com

"Explosive findings" only if you never bothered to read my PDFs.
Where I explained the convolution engine in greater detail, along with the pipelining.
I also explain how the doubled INT8 fits with the FP16 MACs, and if the authors read that, they might have a better idea of how to tap into that doubled INT8 performance. (I suspect it requires using the FP16 datapath to load both INT8s side by side then execute what looks like a SIMD INT8[2] operation.)

johnsonwax · 2026-03-02T14:46:08-0500

mvprod123 said:
Probably supplies from previous contracts.

But I thought Samsung demanded 4,000 quatloos for RAM and Apple, so desperate for it, offered 5,000.

johnsonwax · 2026-03-02T16:11:02-0500

The M4 iPad Air is an odd product. 3 P cores, 5 E cores, 9 GPU cores - a dogs breakfast of bins.

Doug S · 2026-03-02T17:02:59-0500

johnsonwax said:
The M4 iPad Air is an odd product. 3 P cores, 5 E cores, 9 GPU cores - a dogs breakfast of bins.

Wonder if that's it? Since Apple doesn't advertise specs like SLC size when they announce products they could bin there as well and ship with 3/4 of the SLC vs a "regular" M4 and no one would know unless someone does benchmarks to see where the knees of the memory latency graph are located.

Doug S · 2026-03-02T17:27:11-0500

MerryCherry said:
So even the iPhone 17e has 256 GB base storage.

So I doubt the low cost Macbook will have a 128 GB option.

They should bump up the Macbook Air's base to 512 GB, and the Macbook Pro to 1 TB.

The RAM/NAND specs for the products announced this week were probably set in stone before DRAM/NAND pricing started to get crazy last fall. I wouldn't look for them to be increasing base configs on products coming out this fall just because of what they do with the products announded this week. Not just this fall, I bet they hold the line on current DRAM/NAND configs vs the previous version they're replacing in most products until the bubble bursts.

The fact Apple customers have been so willing to pay for DRAM and NAND upgrades and that Apple has charged far more than the actual cost to them of those upgrades helps them out. It still hurts their margin (i.e. instead of making 90% profit when a user goes to a higher NAND tier of iPhone maybe they will only make 70%) but if holding the line on DRAM/NAND configs for the next couple years causes more customers to get those upgrades over time that'll make up a lot of the margin they would be losing by holding firm on list prices for the base config.

It sounds like Apple plans to hold the line on pricing as much as possible. They can use the fact everyone else will be under FAR greater margin pressure that will force them to raise their prices (and drop entry level products entirely) as a way for Apple's products to gain market share.

When they add expensive new technology like OLED displays on Macbook Pro they might use things like that as an opportunity to increase prices. Just like they did when they made that switch with iPhone. But whatever roadmap they may have had for bringing OLED displays in less expensive Macbooks down the line is probably on hold until the memory market stabilizes.

Discussion Apple Silicon SoC thread

Lifer

Diamond Member

Diamond Member

Junior Member

Diamond Member

Member

Member

Senior member

Diamond Member

Senior member

Senior member

Member

Senior member

Lifer

Diamond Member

Senior member

Senior member

Senior member

Senior member

Senior member

Diamond Member

Diamond Member