Discussion Apple Silicon SoC thread

Page 72 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Eug

Lifer
Mar 11, 2000
23,586
1,000
126
M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:

Screen-Shot-2021-10-18-at-1.20.47-PM.jpg

M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:


M1 Ultra discussion here:


M2 discussion here:


Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:

 
Last edited:

Eug

Lifer
Mar 11, 2000
23,586
1,000
126
I wonder if the higher end chips will be the same clock speed. It seems Apple is being very generous with how many GPU cores they can "Chop" before declaring a chip unusable. I think the higher end chips could have a somewhat higher clock speed despite having way more cores.

How would that be reflected in the efficiency cores? Could Apple allow a comparatively even higher efficiency core clock speed, or even a partial clock speed "turbo boost" for efficiency cores in a better cooled part, while still maintaining reasonable efficiency (for a machine with a much bigger battery)?

Sorry, this is just n00b spitballin', trying to match the design to the rumour.
The other point is that there is no guarantee the efficiency cores of the Pro chips are the exact same design. Not only could it be faster clocked, it could also be a higher IPC design too.

BTW, if the 128 core rumour for the Mac Pro GPU turns out to be true, how fast do you think it would be? That's 16 times as many GPU cores as M1, but there is no guarantee that it would be the exact same design.
 

moinmoin

Diamond Member
Jun 1, 2017
4,944
7,656
136
I'm starting to think Apple's approach to the cores is more similar to AMD's Zen than to ARM's big.LITTLE, but with two big advantages.

With core designs there are essentially three different approaches:
  • Optimize the chip for absolute performance. This is the way the desktop x86 chips went, pursuing ever higher frequencies. To achieve those the design is made less denser thus need a bigger area.
  • Optimize the chip for lowest possible power usage. This is the approach ARM's little cores and Intel's Atom cores took, in both cases features that consume much power are either cut or significantly trail full featured performance/high frequency chips.
  • Optimize for power efficiency. This is (used to be?) the standard approach for new ARM designs, the chip is full featured and very efficient at it, but there is no headroom for pushing performance further through increasing the frequency. This both ensures the chip is always efficient and allows for a very dense design.

AMD's approach with Zen has been to optimize the cores for power efficiency, but then relinquish the density for achieving higher absolute power. The result is that Zen can be used across the whole range of products, mobile, desktop and server as well as ST and manycore MT workloads. The downside being Zen cores being relatively big compared to ARM cores on the same process node. And especially with Zen 3 we can see well how power efficiency carries the MT performance while higher ST performance on too many cores at once is limited by TDP.

And I think that balance between MT performance and ST performance within a shared TDP is what led Apple to the 8+2 configuration. Apple's efficiency cores are optimized for power efficiency and as such perfect for MT workloads, all while wasting little space for the cores. Apple's high performance cores are optimized for absolute performance and as such perfect for ST workloads. In the ideal case all 10 cores can max out at the same time without breaking the TDP budget. The advantage versus AMD's approach then is that with this design Apple can ensure both predictable reproducible MT and ST performance at once, whereas for Zen at unlimited TDP MT workloads would turn the chip into a highly inefficient all core ST performance furnace.
 
Last edited:

Doug S

Platinum Member
Feb 8, 2020
2,248
3,478
136
I am betting the rumor is wrong and they have 4 lower power cores.

These are just pure rumor right now. IMO 8+4 makes more sense. If the other laptops can make use of 4 low power cores to save battery life, then so can higher end MBP. Long battery life is a significant bonus regardless of your laptop size.

Plus those low power cores are VERY tiny, taking up marginal die space.


The other laptops have 4 efficiency cores because years ago Apple decided on 4 for the iPad Pro. And that decision could very well have been as simple as "let's take the iPhone chip and paste in two more big cores, double the GPU cores and LPDDR width, but otherwise leave everything the same".

Had they designed the M1 from scratch targeting only the Mac, maybe it would have only had 2 efficiency cores. The power management of a Mac laptop isn't going to be the same as the iPhone, probably they care a bit less about improving battery life by pushing as much stuff as possible to the efficiency cores, and a bit more about improving performance by using the big cores to a greater extent since battery isn't quite as scarce a resource as it is on a phone's form factor.

I don't know why you would think "4" is somehow the proper number of efficiency cores. Why not 2? Why not 6? For that matter, we don't even know for sure that the Mac Pro will enable all 8 efficiency cores - or for that matter ANY of them. There are arguments to be made that for power users having all threads running the same performance level is more important than the tiny potential performance gain from adding 8 small cores to a big MT task running across 40 big cores, or having it use a half watt less when idling (and with that many big cores running flat out, adding small cores competing for memory access could potentially REDUCE throughput in some situations)
 

Heartbreaker

Diamond Member
Apr 3, 2006
4,226
5,227
136
I don't know why you would think "4" is somehow the proper number of efficiency cores. Why not 2? Why not 6?

Having the same efficiency core count, means they can fit in the same low power envelope as the M1 when doing basic operations, so they could use the next chip in a more powerful 13" MPB as well as the 16" MBP, and again efficiency cores are so tiny they barely affect die size.
 

Mopetar

Diamond Member
Jan 31, 2011
7,830
5,978
136
Part of the reason efficiency cores are much more efficient is that they have a more limited instruction set, which is less of a problem for someone like Apple since the control both hardware and software and can design both with a shared goal in mind.

However, it doesn't get around the issue that if your software needs the kind of instructions that efficiency cores don't have, they're effectively useless. For something like a Mac Pro I think there's even less need for them since those computers will mainly be crunching away on some important task with minimal other tasks being done.

Having a large number for f efficiency cores is unlikely to scale to solve the task for which they were included and probably don't work nearly as well as the performance cores on the types of workloads being run on a Mac Pro.
 

eek2121

Platinum Member
Aug 2, 2005
2,930
4,025
136
4k/8k movie raw processing? Scientific calculations? comparable things these machines are often used for?
RAM does not matter as much for video editing these days thanks to speedy solid state storage.

What scientific packages are available and widely used on a Mac?
What does running the accesses across the PCIe bus do to memory latency? How about the granularity of read/write sizes?
I always bring this question up and never get a proper response. PCIE latency is significantly higher than traditional RAM.
This is something I've been wondering about, but the design of the current Mac Pro (and any other desktop honestly) is the antithesis of what the M1 is. The Mac Pro is highly modular with user replaceable RAM, storage, graphics, accelerator cards. The M1 essentially takes all of that minus the storage and expansion cards and puts it on a single package. How will Apple build a new Mac Pro with these two opposite philosophies? I guess this is why there is so much discussion going on about it.

I don't see them taking away user expandable storage or RAM on the Mac Pro given the target audience. GPU options could be much more limited than before if they go with the integrated route without support for graphics cards. If they do this and want to have expandable memory, they have to forgo the unified memory model of the M1 and potentially have on-package memory act as a large L3 cache? or will it be dedicated to the GPU with expandable memory being main memory? My prediction is if they don't support graphics cards, they might stack HBM2 on package and have DDR4/5 for main memory.

Apple wands to control everything top to bottom. My bet is that the Mac Pro becomes a super beefy high end Mac. Much of the expandability will disappear.


I think someone asked this already but I'm not sure we got an answer:

Why 8+2 and not 8+4? I was reading parts of this thread from last year again and just about all of us just assumed it was going to be 8+4 (although a few people tossed around the idea of a 6+4 part).

But now that I think of it, the only machine where this would matter would be the MacBook Pro. 8+4 offers little advantage for the Mac mini and iMac. It would offer some advantage for the MacBook Pro, but less so because the higher end MacBook Pros are going have more robust battery support. Truly the biggest advantage of an X+4 configuration is with the low power machines like the iPad Pro, MacBook Air, and 13" MacBook Pro.

Plus, for a Mac laptop, perhaps Apple just didn't want to go with more than 10 cores, and in the context of Pro laptop with a big battery, 8+2 makes more sense than 6+4.
I am betting the rumor is wrong and they have 4 lower power cores.

These are just pure rumor right now. IMO 8+4 makes more sense. If the other laptops can make use of 4 low power cores to save battery life, then so can higher end MBP. Long battery life is a significant bonus regardless of your laptop size.

Plus those low power cores are VERY tiny, taking up marginal die space.

The source is pretty good when it comes to accuracy.

8 big cores is going to pull quite a bit of power and the Macbook Pro doesn’t exactly have top tier cooling.

EDIT: The M1 in the Macbook Air has an estimated TDP of 10W, and the one in higher end Macs has an estimated TDP of 28W.
 

naukkis

Senior member
Jun 5, 2002
705
576
136
Part of the reason efficiency cores are much more efficient is that they have a more limited instruction set, which is less of a problem for someone like Apple since the control both hardware and software and can design both with a shared goal in mind.

What? As I know every big.LITTLE configuration have identical instruction set support for both big and little cores, including Apple. With incompatible instruction sets it would be impossible to switch between core clusters making whole arrangement pretty much useless.
 

moinmoin

Diamond Member
Jun 1, 2017
4,944
7,656
136
Part of the reason efficiency cores are much more efficient is that they have a more limited instruction set, which is less of a problem for someone like Apple since the control both hardware and software and can design both with a shared goal in mind.
The biggest reason Apple doesn't have this problem is that there is no deviation between the two kinds of cores. What Intel rather arbitrarily packs into the x86 instruction set (and AMD so far sadly doesn't dare to deviate from), Apple packs into separate dedicated and as such way more efficient accelerators and engines on the same SoC.
 

Doug S

Platinum Member
Feb 8, 2020
2,248
3,478
136
Part of the reason efficiency cores are much more efficient is that they have a more limited instruction set, which is less of a problem for someone like Apple since the control both hardware and software and can design both with a shared goal in mind.

However, it doesn't get around the issue that if your software needs the kind of instructions that efficiency cores don't have, they're effectively useless. For something like a Mac Pro I think there's even less need for them since those computers will mainly be crunching away on some important task with minimal other tasks being done.

Having a large number for f efficiency cores is unlikely to scale to solve the task for which they were included and probably don't work nearly as well as the performance cores on the types of workloads being run on a Mac Pro.

I'm unaware of ANY difference in instruction set between the little and big cores. At least not for user level code. Possibly they don't support the same virtualization instructions, but I'd need to see proof to be convinced of that.
 

Doug S

Platinum Member
Feb 8, 2020
2,248
3,478
136
8 big cores is going to pull quite a bit of power and the Macbook Pro doesn’t exactly have top tier cooling.

EDIT: The M1 in the Macbook Air has an estimated TDP of 10W, and the one in higher end Macs has an estimated TDP of 28W.

They have 8 big cores whether or not it is an 8+2 or 8+4 chip, the number of little cores is irrelevant to the max TDP of a design.
 
  • Like
Reactions: Tlh97 and coercitiv

Roland00Address

Platinum Member
Dec 17, 2008
2,196
260
126
Just a reminder 2 extra efficiency cores is not going to increase performance that much, maybe a few percent but I doubt even that will happen.

Only benefit at all is better battery life / efficiency for work and I bet Apple has lots of data on 8+2 vs 8+4 especially if that 8+2 is doubled or quadrupled for they are using multiple dies connected by an interconnect.
 
  • Like
Reactions: Tlh97 and coercitiv

B-Riz

Golden Member
Feb 15, 2011
1,482
612
136
4k/8k movie raw processing? Scientific calculations? comparable things these machines are often used for?

I have no links ATM, but I have come across anecdotal articles of users that went Threadripper / Windows because Apple kind of abandoned a serious workstation level PC...
 

B-Riz

Golden Member
Feb 15, 2011
1,482
612
136
I'm starting to think Apple's approach to the cores is more similar to AMD's Zen than to ARM's big.LITTLE, but with two big advantages.

With core designs there are essentially three different approaches:
  • Optimize the chip for absolute performance. This is the way the desktop x86 chips went, pursuing ever higher frequencies. To achieve those the design is made less denser thus need a bigger area.
  • Optimize the chip for lowest possible power usage. This is the approach ARM's little cores and Intel's Atom cores took, in both cases features that consume much power are either cut or significantly trail full featured performance/high frequency chips.
  • Optimize for power efficiency. This is (used to be?) the standard approach for new ARM designs, the chip is full featured and very efficient at it, but there is no headroom for pushing performance further through increasing the frequency. This both ensures the chip is always efficient and allows for a very dense design.

AMD's approach with Zen has been to optimize the cores for power efficiency, but then relinquish the density for achieving higher absolute power. The result is that Zen can be used across the whole range of products, mobile, desktop and server as well as ST and manycore MT workloads. The downside being Zen cores being relatively big compared to ARM cores on the same process node. And especially with Zen 3 we can see well how power efficiency carries the MT performance while higher ST performance on too many cores at once is limited by TDP.

And I think that balance between MT performance and ST performance within a shared TDP is what led Apple to the 8+2 configuration. Apple's efficiency cores are optimized for power efficiency and as such perfect for MT workloads, all while wasting little space for the cores. Apple's high performance cores are optimized for absolute performance and as such perfect for ST workloads. In the ideal case all 10 cores can max out at the same time without breaking the TDP budget. The advantage versus AMD's approach then is that with this design Apple can ensure both predictable reproducible MT and ST performance at once, whereas for Zen at unlimited TDP MT workloads would turn the chip into a highly inefficient all core ST performance furnace.

To note, AMD has the legacy of x86 it is designing around, where, good or bad, Apple was able to go clean sheet / fresh approach (optimize for good performance and low production cost) for the M1 and use what they learned from years of phone and tablet work AND they have the resources for robust OS support.
 

B-Riz

Golden Member
Feb 15, 2011
1,482
612
136
I can stomach this price for a luxury Chromebox, I join up tomorrow! :laughing:

View attachment 44538

I got my goods the other day, but have not had time to even unpack and set it up.

I feel like I bought a piece of personal computing history, maybe the people that bought an Apple IIe back in the day felt the same way?

I think an M1 Mini at $600 or an AMD 4600G system ~$500 seems to be the best options for new personal computers. Intel needs to get their stuff together, lol.
 

moinmoin

Diamond Member
Jun 1, 2017
4,944
7,656
136
To note, AMD has the legacy of x86 it is designing around, where, good or bad, Apple was able to go clean sheet / fresh approach (optimize for good performance and low production cost) for the M1 and use what they learned from years of phone and tablet work AND they have the resources for robust OS support.
Indeed. I hope AMD sometime gets another chance to do a x64 style disruption of x86, and uses it to overhaul it in a major way. But that's for another thread.
 

Eug

Lifer
Mar 11, 2000
23,586
1,000
126
I got my goods the other day, but have not had time to even unpack and set it up.

I feel like I bought a piece of personal computing history, maybe the people that bought an Apple IIe back in the day felt the same way?

I think an M1 Mini at $600 or an AMD 4600G system ~$500 seems to be the best options for new personal computers. Intel needs to get their stuff together, lol.
The Apple //e was after the Apple ][ line was already firmly established. A ton of people of people had already bought the Apple ][ and Apple ][ plus. The main difference with the //e besides lower cost was that it added lower case support, but you could mod the ][ and ][ plus to support this.

I am waiting on the Mac mini. I can't get myself to buy a headless machine which can only support dual screen only if one of those can only be HDMI. I also won't buy an 8 GB Mac in 2021, because I tend to keep my Macs a very long time. In my experience, 8 GB is the sweet spot in 2021 for light usage, but memory requirements go up roughly around 50% every 5 years or so.
 

eek2121

Platinum Member
Aug 2, 2005
2,930
4,025
136
They have 8 big cores whether or not it is an 8+2 or 8+4 chip, the number of little cores is irrelevant to the max TDP of a design.

That is completely inaccurate. The increased number of cores changes the thermal properties of a chip and can make it harder to cool. Apple is not exempt from the laws of physics/thermodynamics. A 12 core chip will always be more difficult to cool than a 10 core chip, especially when also combining a larger GPU. They would either have to go down in frequency or increase the core count.
 

Doug S

Platinum Member
Feb 8, 2020
2,248
3,478
136
That is completely inaccurate. The increased number of cores changes the thermal properties of a chip and can make it harder to cool. Apple is not exempt from the laws of physics/thermodynamics. A 12 core chip will always be more difficult to cool than a 10 core chip, especially when also combining a larger GPU. They would either have to go down in frequency or increase the core count.

That what I was saying. I was responding to a claim that having more efficiency cores was necessary to fit in a smaller TDP window (but I now see I quoted the wrong post when doing so)
 

Roland00Address

Platinum Member
Dec 17, 2008
2,196
260
126
That is completely inaccurate. The increased number of cores changes the thermal properties of a chip and can make it harder to cool. Apple is not exempt from the laws of physics/thermodynamics. A 12 core chip will always be more difficult to cool than a 10 core chip, especially when also combining a larger GPU. They would either have to go down in frequency or increase the core count.
Not disagreeing with your logic but it is also more complicated.

We have a dark silicon problem with the smaller you get density wise with transistors where you are roughly doubling the amount of transistors per mm yet the amount of power to power a transistor does not shrink in a quadratic fashion and thus you have to have some unused silicon not all the time but at least some of the time when you are at max tdp.

Dr. Ian Cuttress of Anandtech talks about this at 10 minutes in (Cavetts) section of this video


With the money quote by Dr Sophie Wilson starting 11:20 in


Dr Sophie Wilson is one of the big wigs in technology that was there at important things such as help create the first ARM chips in the 80s.1621704319748.png

So yeah I am in agreement with how you design the chip, and spread it all out matters a whole lot and it is not easy, in fact it is complicated.

It may be advantageous to have more small cores to spread out the tdp throughout the chip especially if those parts can be at a better voltage level than the faster / larger cores. But it also can occur the opposite, it depends on factors we can not see. We just have to trust the engineers and the computer simulations they are doing that they are designing the chip to the best of their ability. The video I linked to above is making the point I am trying to make better than my words as a reply.
 

eek2121

Platinum Member
Aug 2, 2005
2,930
4,025
136
Not disagreeing with your logic but it is also more complicated.

We have a dark silicon problem with the smaller you get density wise with transistors where you are roughly doubling the amount of transistors per mm yet the amount of power to power a transistor does not shrink in a quadratic fashion and thus you have to have some unused silicon not all the time but at least some of the time when you are at max tdp.

Dr. Ian Cuttress of Anandtech talks about this at 10 minutes in (Cavetts) section of this video


With the money quote by Dr Sophie Wilson starting 11:20 in


Dr Sophie Wilson is one of the big wigs in technology that was there at important things such as help create the first ARM chips in the 80s.View attachment 44751

So yeah I am in agreement with how you design the chip, and spread it all out matters a whole lot and it is not easy, in fact it is complicated.

It may be advantageous to have more small cores to spread out the tdp throughout the chip especially if those parts can be at a better voltage level than the faster / larger cores. But it also can occur the opposite, it depends on factors we can not see. We just have to trust the engineers and the computer simulations they are doing that they are designing the chip to the best of their ability. The video I linked to above is making the point I am trying to make better than my words as a reply.

Of course it is more complicated. AMD has some designs that have had dark silicon (Zen+) that use dark silicon to improve clocks and heat density. Adding 2 more active cores, low power or not, is going to take more than it gives.

I'd love to see Apple release an 8+4 (or an 8+8) chip that works in a mobile configuration, but I'd be shocked if the leak is inaccurate. Usually those type of leaks are pretty dead on.

Note that I use a mac in a professional capacity. I dislike macOS for a number of reasons, including poor window management, I would prefer KDE on Linux or Windows with docker any day of the week over macOS. I like what they are doing with the hardware, but macOS is a dog.
 
  • Like
Reactions: Tlh97 and Gideon

Mopetar

Diamond Member
Jan 31, 2011
7,830
5,978
136
What? As I know every big.LITTLE configuration have identical instruction set support for both big and little cores, including Apple. With incompatible instruction sets it would be impossible to switch between core clusters making whole arrangement pretty much useless.

There have been a few that don't. The Exynos 9810 for example had the big/little cores on different versions of the ISA (one was 8.0, the other 8.2) which could cause some problems.
 

B-Riz

Golden Member
Feb 15, 2011
1,482
612
136
Can you actually buy that from a reputable source?

The chip or the system?

I should have said 5600G's, the 4600G is being phased out in OEM systems.

Costco stores, HP website, and Office Max / Office Depot have the 3 / 5 / 7 5K series APU systems in stock; I bought and returned a 5700G from Office Depot, as I got a 5900X and then could reuse the 3900X in the media server instead of using the 5700G system.

https://www.hp.com/us-en/shop/pdp/hp-pavilion-desktop-tp01-2155m-bundle-pc - 5300G

https://www.costco.com/hp-pavilion-desktop---amd-ryzen-5-5600g.product.100767850.html - 5600G

https://www.officedepot.com/a/products/5448005/HP-Pavilion-TP01-2066-Desktop-PC/ - 5700G
 
  • Like
Reactions: Tlh97

Eug

Lifer
Mar 11, 2000
23,586
1,000
126
While I'm usually not one to obsess over thinness, I think the Mac mini could go thinner. This would also help the Mac-mini-as-server types too. Even with a faster SoC, I suspect they could go thinner.

Unlike older models, the M1 Mac mini is largely empty space inside.

Right now the Mac mini is 1.4 in / 3.6 cm tall.
Well, that's the new rumour now. The leaker is a bit of a toss... Anyhow, here is the rendered guess:


mac-mini-ports.jpg

Note though that this suggests the power supply gets moved to an external power brick.