Discussion Apple Silicon SoC thread

Eug · Nov 10, 2020

M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:

M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:

Page 78 - Discussion - Apple Silicon SoC thread

Page 78 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M1 Ultra discussion here:

Page 109 - Discussion - Apple Silicon SoC thread

Page 109 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M2 discussion here:

Page 127 - Discussion - Apple Silicon SoC thread

Page 127 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:

Page 215 - Discussion - Apple Silicon SoC thread

Page 215 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M4 Family discussion here:

Page 263 - Discussion - Apple Silicon SoC thread

Page 263 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M5 Family discussion here:

Page 431 - Discussion - Apple Silicon SoC thread

Page 431 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

Nothingness · May 23, 2024

SarahKerrigan said:
@Doug S expressed skepticism that SPEC is, at this point, getting anything out of SME with any actually-existing compiler. I agree with him; I seriously doubt it is. (SPECfp might be a little bit, but probably not to any sort of breakthrough degree.)

Feel free to do a run yourself and check. That would be interesting and valuable information.

I did express my doubt too. I think some SPEC tests can benefit from SVE, but I'd be pleasantly surprised if any SME instruction was emitted (and no, SSVE doesn't count as SME).

All people I know who work on HPC code rely on intrinsics no matter the platform (or even generate specialized vector code at runtime). I know it's just a small sample, but still.

Doug S · May 23, 2024

SpudLobby said:
Mx chip platform power for ST has always been more than like 5W though. So you have to keep that in mind. More like 5-9W (idle normalized). The second thing is that Apple is getting more performance, around 25-30% more at 20-22% less power simultaneously. Or just like 30-35% more at the same power. (Minus SME stuff). Still more than an easy node change and frequency boost, but with some modest IPC gains, process gains alone they should be able to narrow the gap.

But yes dude your freakouts on this were unnecessary. You should freakout if 8 Gen 4 sucks, that’s our tell if Oryon, where they had extra time to do phydes and use N3E, sucks in phones. That and if V2 is like single digit IPC improvement. Otherwise, time to chill out.

See I don't see it as a problem if a chip is able to use a lot of power for single thread. If I have a single thread load, and a CPU with a 100 watt TDP, I'd love it if it was able to usefully turbo up enough to run that one core at 100 watts to finish my single thread load as quickly as possible. That's not realistic, I know, but to the extent it is possible I'd like that to happen. When there is more than a single core load then that single core wouldn't be allowed to go as high and that's fine. We're already used to the situation where when you spin up more cores you take a frequency hit, especially in mobile where the total power budgets are much lower.

This sort of behavior wouldn't be useful in a server CPU because you aren't going to see single core only loads on it - if you do you bought the wrong thing. But in a PC and most definitely in a phone, sure single core loads are a real thing and to the extent they can be made faster I'm there for it.

If an Intel or AMD CPU uses a bunch of power in a single core load its excused, because "that's turbo". Maybe M4 is running in a "turbo" mode of sorts when running GB6 ST. Apple doesn't publish frequency specs, and whether what you call the frequency it starts running an ST load until it has to slow down "turbo" or "standard frequency" doesn't really matter. It amounts to the same thing either way.

SpudLobby · May 23, 2024

Doug S said:
See I don't see it as a problem if a chip is able to use a lot of power for single thread. If I have a single thread load, and a CPU with a 100 watt TDP, I'd love it if it was able to usefully turbo up enough to run that one core at 100 watts to finish my single thread load as quickly as possible. That's not realistic, I know, but to the extent it is possible I'd like that to happen. When there is more than a single core load then that single core wouldn't be allowed to go as high and that's fine. We're already used to the situation where when you spin up more cores you take a frequency hit, especially in mobile where the total power budgets are much lower.

This sort of behavior wouldn't be useful in a server CPU because you aren't going to see single core only loads on it - if you do you bought the wrong thing. But in a PC and most definitely in a phone, sure single core loads are a real thing and to the extent they can be made faster I'm there for it.

If an Intel or AMD CPU uses a bunch of power in a single core load its excused, because "that's turbo". Maybe M4 is running in a "turbo" mode of sorts when running GB6 ST. Apple doesn't publish frequency specs, and whether what you call the frequency it starts running an ST load until it has to slow down "turbo" or "standard frequency" doesn't really matter. It amounts to the same thing either way.

I am actually fine with where Apple and Qualcomm apparently have it, at 11-13W peak with very very steep slopes and low power floors. My problem is that you start to change phydes and bloat cores when you aim for what Intel/AMD do. I want a bunch of Zen 5Cs in a laptop. I agree though re turbo.

Nothingness said:
I did express my doubt too. I think some SPEC tests can benefit from SVE, but I'd be pleasantly surprised if any SME instruction was emitted (and no, SSVE doesn't count as SME).

All people I know who work on HPC code rely on intrinsics no matter the platform (or even generate specialized vector code at runtime). I know it's just a small sample, but still.

Agreed.

smalM · May 23, 2024

FlameTail said:
Power consumption exploded.

What a pitty Geekerwan seams not to know the difference between power and energy.

FlameTail · May 24, 2024

Memory bandwidth of hypothetical Apple M5 Ultra with LPDDR6-10667 and 1536 bit bus.

(10.667 Gbps÷8 bits) × 1536 bits

= 2054.4 GB/s
= ~2 TB/s

That is an insane amount of memory bandwidth.

SarahKerrigan · May 24, 2024

smalM said:
What a pitty Geekerwan seams not to know the difference between power and energy.

Dissipating ~60% more power for a ~20% shorter period of time doesn't translate to less energy use.

SpudLobby · May 24, 2024

smalM said:
What a pitty Geekerwan seams not to know the difference between power and energy.

Lol.

SarahKerrigan said:
Dissipating ~60% more power for a ~20% shorter period of time doesn't translate to less energy use.

I was about to say similar and just couldn’t help but leave it alone, people really think they’re clever by mentioning energy 😹. This place is amazing.

Doug S · May 24, 2024

FlameTail said:
Memory bandwidth of hypothetical Apple M5 Ultra with LPDDR6-10667 and 1536 bit bus.

(10.667 Gbps÷8 bits) × 1536 bits

= 2054.4 GB/s
= ~2 TB/s

That is an insane amount of memory bandwidth.

That also happens to be the data transfer rate between the two M1 Max dies in an M1 Ultra. They're gonna need more and/or faster I/Os between the dies when they go to LPDDR6.

Eug · May 24, 2024

FlameTail said:
Memory bandwidth of hypothetical Apple M5 Ultra with LPDDR6-10667 and 1536 bit bus.

(10.667 Gbps÷8 bits) × 1536 bits

= 2054.4 GB/s
= ~2 TB/s

That is an insane amount of memory bandwidth.

Doug S said:
That also happens to be the data transfer rate between the two M1 Max dies in an M1 Ultra. They're gonna need more and/or faster I/Os between the dies when they go to LPDDR6.

UltraFusion M1 was advertised to be 2.5 TB/s. Same goes for UltraFusion M2.

FlameTail · May 24, 2024

Eug said:
UltraFusion M1 was advertised to be 2.5 TB/s. Same goes for UltraFusion M2.

Nvidia Blackwell interconnect is 10 TB/s.

So there's definitely room for improvement.

FlameTail · May 24, 2024

There are 2 bizarre things that happened in the tech world recently, which I still haven't come to terms with:

1. Apple downgrading the memory bus of M3 Pro to 192 bit.

2. Qualcomm disabling Core Boost in the Snapdragon X Plus, and even some Elite SKUs.

SteinFG · May 24, 2024

FlameTail said:
Apple downgrading the memory bus of M3 Pro to 192 bit.

Steinfg said:
Nvidia downgrading the memory bus of 4070 to 192 bit.

Corporate wants you to find the difference

The real answer is, they probably felt 150GB/s is enough, and with the introduction of 12GB ram packages, they switched from 32GB (4x8) to 36GB (3x12) on their top-end M3 Pro chip

SarahKerrigan · May 24, 2024

FlameTail said:
There are 2 bizarre things that happened in the tech world recently, which I still haven't come to terms with:

1. Apple downgrading the memory bus of M3 Pro to 192 bit.

2. Qualcomm disabling Core Boost in the Snapdragon X Plus, and even some Elite SKUs.

3. Itanium end of life?

Doug S · May 24, 2024

Eug said:
UltraFusion M1 was advertised to be 2.5 TB/s. Same goes for UltraFusion M2.

I thought it was 2 TB, but either way they probably need somewhere between double and triple the memory bandwidth given all the intra-cache transfers between each die's SLC, and perhaps even direct sharing between L2s in the CPU and GPU (I'm not sure how "fused" UltraFusion is)

Not that that such an increase would be an issue, but I think M4 is probably the line in the sand for the hope of an "Apple Silicon Extreme". If there's an M4 Ultra but nothing further we should not expect to ever see more than two dies linked together, previous Apple patents showing four die connectivity notwithstanding.

smalM · May 24, 2024

SarahKerrigan said:
Dissipating ~60% more power for a ~20% shorter period of time doesn't translate to less energy use.

Yeah sure, 60% more power usage and 28% more energy usage, no difference at all, totally the same.

SarahKerrigan · May 24, 2024

smalM said:
Yeah sure, 60% more power usage and 28% more energy usage, no difference at all, totally the same.

Are you okay?

Glo. · May 24, 2024

FlameTail said:
There are 2 bizarre things that happened in the tech world recently, which I still haven't come to terms with:

1. Apple downgrading the memory bus of M3 Pro to 192 bit.

2. Qualcomm disabling Core Boost in the Snapdragon X Plus, and even some Elite SKUs.

Considering how short lived M3 series was - it was understandable why they cut the bus.

FlameTail · May 24, 2024

M3, M3 Pro, M3 Max

All three M3 generation parts, from the lowest end to the highest end, have a 17 TOPS NPU. This is interesting. The NPU does not scale up in size/performance for the higher end parts, like CPU/GPU does. Why not?

Will it remain this way for future generations too?

poke01 · May 24, 2024

FlameTail said:
he NPU does not scale up in size/performance for the higher end parts, like CPU/GPU does. Why not?

Will it remain this way for future generations too?

It may in future M series now that NPU is an important factor

Eug · May 25, 2024

iPad Pro 13" Chip ID

This guide contains a selection of photos and chip identification summary for the iPad Pro 13. Check out our teardown here. Special thanks to...

www.ifixit.com

iFixit says with "high certainty" that the M4 iPad Pro memory chip used is indeed 6 GB, Micron LPDDR5X. (Geekerwan had called it 6 GB custom LPDDR5.)

https://twitter.com/x/status/1794071111979938167

okoroezenwa · May 25, 2024

poke01 said:
It may in future M series now that NPU is an important factor

Agreed. Maybe we could see M4 Pro and Max with successively larger NPUs this time around. Or at least I hope.

FlameTail · May 25, 2024

How much memory bandwidth does 1 TOPS use?

Doug S · May 25, 2024

poke01 said:
It may in future M series now that NPU is an important factor

Yeah I think they didn't really have a whole lot for the NPU to do, particularly in Macs, so it wasn't worth scaling in Pro/Max. It was a solution looking for a problem. While they're still not sure what the problem is, judging from stock market price surges and Microsoft "AI PC" hype the solution is clearly "more TOPS!"

I still think over time we'll see the GPU and NPU merge. When the NPU was this tiny little corner it wasn't worth the bother, but if the NPU grows significantly while the GPU will of course continue to be very important, there is a lot to be gained from combining the two. Yes it means some work since there isn't a 100% overlap in their function, and there will need to be a way of dynamically partitioning so it can tilt from almost entirely GPU to almost entirely NPU depending on the load, but the gains from such a merger are too great to ignore.

We might see it as soon as next year, but probably 2026 unless they've already been planning it for a while.

roger_k · May 25, 2024

Doug S said:
See I don't see it as a problem if a chip is able to use a lot of power for single thread. If I have a single thread load, and a CPU with a 100 watt TDP, I'd love it if it was able to usefully turbo up enough to run that one core at 100 watts to finish my single thread load as quickly as possible. That's not realistic, I know, but to the extent it is possible I'd like that to happen. When there is more than a single core load then that single core wouldn't be allowed to go as high and that's fine. We're already used to the situation where when you spin up more cores you take a frequency hit, especially in mobile where the total power budgets are much lower.

I welcome it when a CPU uses up the entire available thermal range, but this has to stay within reasonable limits. I do not think that 50+ watts for single-threaded operation is reasonable. A desktop might get away with it (even though it's a massive waste), but it is simply unacceptable for laptops. I do not want my power to shoot up beyond the CPU TDP when opening a new browser tab.

I do not see any excuses for contemporary mobile CPUs drawing more power than the enthusiast-class desktop ten years ago. That is not good engineering, and that is not honest advertising. I like Apple's hardware because their thermal design targets make sense to me. And they can still hit performance records despite using much less power than the competition. This is the path the industry should follow, not the massive power inflation we have witnessed in the last decade. And frankly, TDP should become recognized as a fraudulent advertising practice. The spec sheet should show CPU power consumption across the frequency range, not some detached from reality number that makes the CPU maker look good.

name99 · May 25, 2024

FlameTail said:
Nvidia Blackwell interconnect is 10 TB/s.

So there's definitely room for improvement.

That's to multiple devices.
I believe the Blackwell chip-to-chip link is 1.8TB/s so still slightly behind Apple.

(Of course to be fair we know nvLink scales, in a way that we believe is true for UltraFusion but have not actually seen; AND nvLink can cover longer distances.)

Discussion Apple Silicon SoC thread

Lifer

Diamond Member

Diamond Member

Golden Member

Member

Diamond Member

Senior member

Golden Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Senior member

Senior member

Diamond Member

Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Lifer

Member

Diamond Member

Diamond Member

Member

Senior member