Discussion Apple Silicon SoC thread

Page 165 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Eug

Lifer
Mar 11, 2000
23,587
1,001
126
M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:

Screen-Shot-2021-10-18-at-1.20.47-PM.jpg

M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:


M1 Ultra discussion here:


M2 discussion here:


Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:

 
Last edited:

mikegg

Golden Member
Jan 30, 2010
1,756
411
136
If they are just making 40 core by attaching 4 copies of a 10 core part together, then there is nothing new being fabbed to worry about. Apple cloud with Apple SoCs doesn't seem to make much sense.
They'd have to 2x the Ultra bandwidth, design the Max SoC for connecting to 2 chips, design the GPU for 4x transparency, optimize macOS for 4 Max SoCs connected together, continuously provide support for an extremely niche chip that requires a ton of high level hardware and software customization.

If it's so easy, they would have released it already because we're past their 2 year deadline for migrating all Macs to Apple SIlicon.

Apple Cloud makes sense because Apple would be able to allow users to rent an "M3 Extreme" SoC in the cloud while running on an Air in a cafe. Anyone could harness the power of the most powerful Apple Silicon easily. There's a growing market of power users and businesses who need to rent much faster hardware in the cloud for specific tasks. Apple could make it a seemless experience.

Some cloud providers already do this by buying many physical Mac Minis. I believe AWS does it.
 
Last edited:

FlameTail

Platinum Member
Dec 15, 2021
2,356
1,273
106
Maybe they can have two different versions of the M2 Max? One meant to be used as single die in the Macbooks etc.. and the other meant to be used as chiplets to make M2 Ultra and Extreme?

The chiplet variant can contain the DDR5 controllers and the interconnect IO for the UltraFusion interposed thingy. This saves them die space on the laptop variant
 

Doug S

Platinum Member
Feb 8, 2020
2,267
3,519
136
isn't it possible to put the DDR5 controller separately on the PCB or make it on a separate die and stack it on top of the M2 Max die?

It would need to be connected to all four SoCs and not really save any money - instead of a slightly larger Max SoC you have a special DDR5 controller SoC produced at very low volumes.
 

Doug S

Platinum Member
Feb 8, 2020
2,267
3,519
136
Maybe they can have two different versions of the M2 Max? One meant to be used as single die in the Macbooks etc.. and the other meant to be used as chiplets to make M2 Ultra and Extreme?

The chiplet variant can contain the DDR5 controllers and the interconnect IO for the UltraFusion interposed thingy. This saves them die space on the laptop variant


Then that special Ultra/Extreme only SoC costs a lot more because its mask set costs are amortized across far fewer units, unless they can make it another chop die. I honestly don't think they care all that much about increasing the cost of the laptop die a bit. It isn't as if Apple is selling laptops containing a 'Max' for peanuts.
 

MadRat

Lifer
Oct 14, 1999
11,910
238
106
I would guess that Apple already supported connections to an external memory controller as paged copies. This was a predictable wall and BSD already tackled it ages ago. I cannot believe MacOS wouldn't have already incorporated this support. You can therefore do up to 256gigs of soldered in RAM and expand from there, only you won't see much benefit until external RAM probably exceeds pre-packaged memory. Most users may as well use a solid state drive as virtual memory at that point. The target audience of corporate users would probably shell out for terabytes of DDR5 if its an option.
 

scineram

Senior member
Nov 1, 2020
361
283
106
They'd have to 2x the Ultra bandwidth, design the Max SoC for connecting to 2 chips, design the GPU for 4x transparency, optimize macOS for 4 Max SoCs connected together, continuously provide support for an extremely niche chip that requires a ton of high level hardware and software customization.

If it's so easy, they would have released it already because we're past their 2 year deadline for migrating all Macs to Apple SIlicon.

Apple Cloud makes sense because Apple would be able to allow users to rent an "M3 Extreme" SoC in the cloud while running on an Air in a cafe. Anyone could harness the power of the most powerful Apple Silicon easily. There's a growing market of power users and businesses who need to rent much faster hardware in the cloud for specific tasks. Apple could make it a seemless experience.

Some cloud providers already do this by buying many physical Mac Minis. I believe AWS does it.
Not sure, this excuse sounds BS. With chiplets connecting 4 M2 Max shouldn't be expensive.
 

Mopetar

Diamond Member
Jan 31, 2011
7,842
5,994
136
If they are just making 40 core by attaching 4 copies of a 10 core part together, then there is nothing new being fabbed to worry about. Apple cloud with Apple SoCs doesn't seem to make much sense.

I'm assuming they found some issue that made connecting all four either tank some aspect of the performance or create some software headaches they wouldn't be able to fix in time for the product launch.

Putting 4 chips together isn't too much more challenging than 2, and the cost to develop the chip and manufacture them has already been spent. I'm assuming the market conditions haven't changed so much that the product couldn't be sold.

Unless I'm missing something obvious that would suggest either a hardware or software issue affecting scaling of performance in one or more key areas. If an Ultra gets 75%+ of the performance for half the cost, then it's harder to sell an Extreme. Better to just get it ironed out and release in the future.
 

Ajay

Lifer
Jan 8, 2001
15,458
7,862
136
I'm assuming they found some issue that made connecting all four either tank some aspect of the performance or create some software headaches they wouldn't be able to fix in time for the product launch.

Putting 4 chips together isn't too much more challenging than 2, and the cost to develop the chip and manufacture them has already been spent. I'm assuming the market conditions haven't changed so much that the product couldn't be sold.

Unless I'm missing something obvious that would suggest either a hardware or software issue affecting scaling of performance in one or more key areas. If an Ultra gets 75%+ of the performance for half the cost, then it's harder to sell an Extreme. Better to just get it ironed out and release in the future.
I'm kind of wonder if Apple has figured out how to satisfy 80% plus of potential Mac Pro users with a future generation of the Mac Studio. Maybe something based on an M3 design with stacked RAM to accommodate their needs. I imagine an M3 <ultra> Max would be on TSMC N3E and can pack a fair bit more in terms of cpu and gpu cores.
 
Last edited:

Doug S

Platinum Member
Feb 8, 2020
2,267
3,519
136
I'm kind of wonder if Apple has figured out how to satisfy 80% plus of potential Mac Pro users with a future generation of the Mac Studio. Maybe something based on an M3 design with stacked RAM to accommodate their needs. I imagine an M3 Max would be on TSMC N3E and can pack a fair bit more in terms of cpu and gpu cores.


I think Apple would want to satisfy more than 80% of them. They can satisfy well over 95% of them with an M2 "Extreme" that has 384 GB with no room for memory expansion. The number of people who would want 2x the compute/GPU are a lot higher than the number of people who want more than 384 GB.
 

Ajay

Lifer
Jan 8, 2001
15,458
7,862
136
I think Apple would want to satisfy more than 80% of them. They can satisfy well over 95% of them with an M2 "Extreme" that has 384 GB with no room for memory expansion. The number of people who would want 2x the compute/GPU are a lot higher than the number of people who want more than 384 GB.
Meant Max Ultra (Extreme, or whatever they call it). If Apple can satisfy 80% or more of 'Pro' users with Mac Studio Ultra (with whatever improvements they need to make); then catering to the small remainder of users needing more may not make any financial sense. Macs to make up a small % of the desktop market. Mac Pros are a tiny size of the market, those who need high end Mac Pros are a very very small part of the market (that's essentially my point).

Anyway, typical of Apple products, rumors abound and facts are few.
 
Last edited:

Doug S

Platinum Member
Feb 8, 2020
2,267
3,519
136
Meant Max Ultra (Extreme, or whatever they call it). If Apple can satisfy 80% or more of 'Pro' users with Mac Studio Ultra (with whatever improvements they need to make); then catering to the small remainder of users needing more may not make any financial sense. Macs to make up much of the desktop market. Mac Pros are a tiny size of the market, those who need high end Mac Pros are a very small part of the market (that's essentially my point).

Anyway, typical of Apple products, rumors abound and facts are few.


I think that's basically what the Mac Pro will be - a larger Studio to accommodate the larger SoC with greater cooling needs. I think there's a decent chance of a few PCIe x4 slots for higher end networking/storage that even the latest and greatest version of TB isn't fast enough for, but like > 384 GB that's a niche market they may decide to bypass.

I think a "Extreme" Studio you're talking about would satisfy a lot more than 80% of the Mac Pro userbase.
 
  • Like
Reactions: Ajay

Ajay

Lifer
Jan 8, 2001
15,458
7,862
136
I think that's basically what the Mac Pro will be - a larger Studio to accommodate the larger SoC with greater cooling needs. I think there's a decent chance of a few PCIe x4 slots for higher end networking/storage that even the latest and greatest version of TB isn't fast enough for, but like > 384 GB that's a niche market they may decide to bypass.

I think a "Extreme" Studio you're talking about would satisfy a lot more than 80% of the Mac Pro userbase.
Thankfully, you could understand my garbled sentences :D
 

FlameTail

Platinum Member
Dec 15, 2021
2,356
1,273
106
Apparently, with the M1 generation, Apple ran into some bottlenecks when scaling performance. Rumours said that they already had the M1 Extreme chip ready to ship but cancelled it at last because of poor performance scaling due to a limited TLB size. Apparently, said TLB size restriction was removed in the M2 generation.
 

Eug

Lifer
Mar 11, 2000
23,587
1,001
126
Apparently, with the M1 generation, Apple ran into some bottlenecks when scaling performance. Rumours said that they already had the M1 Extreme chip ready to ship but cancelled it at last because of poor performance scaling due to a limited TLB size. Apparently, said TLB size restriction was removed in the M2 generation.
From what I read, M1 Extreme / M1 Max x 4 is simply not supported by the M1 series design nor is it supported by macOS.

ie. Such a giant SoC with M1 Max x 4 was never going to happen.
 

Doug S

Platinum Member
Feb 8, 2020
2,267
3,519
136
The odd part is that Studio is already prepared for cooling more than is currently needed by any available top end SoC configuration.


I mentioned that when it was released, and that an "Extreme" version of Studio down the road could be a possibility. I'm unsure if double the SoCs and double the LPDDR stacks will physically fit - though as those LPDDR stacks are custom and larger than they need to be maybe smaller ones are incoming.

The real reason may be Apple going overkill on cooling because they don't ever want to run the fan at higher speeds where it becomes more than slightly audible. You can't call something "Studio", and then have it be too noisy to be anyplace one would consider a "studio".
 
  • Like
Reactions: Ajay

mikegg

Golden Member
Jan 30, 2010
1,756
411
136
Not sure, this excuse sounds BS. With chiplets connecting 4 M2 Max shouldn't be expensive.
I think people are underestimating the technical difficulty of trying to get 4x SoCs glued together and have it scale, with enough bandwidth, OS optimization, and app optimization.

Remember that the M1 Ultra was the first-ever transparent multi-die GPU. Previously, no one had ever done it before. Now Apple needs to make it work with 4 dies. I'm sure they have to invent a lot of new techniques to make this work and scale. This stuff isn't cheap.
 

eek2121

Platinum Member
Aug 2, 2005
2,930
4,026
136
Here's my M1 Pro CPU wattage data during ST portion of Geekbench5.

Wattage info is shown every 1 second.
What software did you use to meadure power consumption? Also, you should use Cinebench rather than Geekbench. Geekbench doesn’t load the CPU as heavily as other software.
None of that matters for the SPEC results and those are quite good for Apple SoCs and have been for years. Apple is making an excellent CPU and any specialized hardware and OS optimizations that can add performance are just gravy on top.

If either AMD or Intel cared to they could release their own customized *nix distribution that does the same.
If I ever run SPEC for a living, I will be sure to…ah forget it!
 
  • Haha
Reactions: Thunder 57

Heartbreaker

Diamond Member
Apr 3, 2006
4,228
5,228
136
I think people are underestimating the technical difficulty of trying to get 4x SoCs glued together and have it scale, with enough bandwidth, OS optimization, and app optimization.

Remember that the M1 Ultra was the first-ever transparent multi-die GPU. Previously, no one had ever done it before. Now Apple needs to make it work with 4 dies. I'm sure they have to invent a lot of new techniques to make this work and scale. This stuff isn't cheap.

Not underestimating.

It's just that when the choice is building a unique Mac Pro large die, or utilizing connected smaller Mac dies, then the choice will obviously be connecting smaller Mac dies. Mac Pro volume is much too low to have it's own unique SoC tapeout.

You don't even need to have to full high speed between all 4 dies. You could just connect two M2-Ultras with a slower connection between them. The OS can handle the localization issues.
 

Doug S

Platinum Member
Feb 8, 2020
2,267
3,519
136
Not underestimating.

It's just that when the choice is building a unique Mac Pro large die, or utilizing connected smaller Mac dies, then the choice will obviously be connecting smaller Mac dies. Mac Pro volume is much too low to have it's own unique SoC tapeout.

You don't even need to have to full high speed between all 4 dies. You could just connect two M2-Ultras with a slower connection between them. The OS can handle the localization issues.


There was a single die layout (i.e. single tapeout) used for M1 Pro, M1 Max and M1 Ultra. The Pro was a "chop" (i.e. did not expose the bottom part of the mask with the extra cores/controllers/etc. that are only present in Max) It isn't clear whether all M1 Max had the 10,000 I/O pads to connect between dies or if it was also a "chop", though the fact die photos revealed that extra stuff seems to indicate it was not a chop and M1 Max used as a standalone SoC had some 'wasted' die area.

Apple has patents showing the 10,000 I/Os for connecting two dies, that show 30,000 I/Os for connecting four dies, so it is pretty clear the strategy is for full speed between all dies to allow the GPU to scale performance.
 

Mopetar

Diamond Member
Jan 31, 2011
7,842
5,994
136
If I ever run SPEC for a living, I will be sure to…ah forget it!

You do understand that the individual SPEC benchmarks are actual programs that are chosen because the represent particular types of applications and workloads right? You can see what they cover here: https://www.spec.org/cpu2017/Docs/overview.html#benchmarks

No benchmarks are perfect indicators of general performance or performance in a specific application, but SPEC has a broad set of performance measures. You might not care about all of them, but some may be indicative of performance in an area you care about.