Discussion Apple Silicon SoC thread

Page 112 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Eug

Lifer
Mar 11, 2000
23,579
992
126
M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:

Screen-Shot-2021-10-18-at-1.20.47-PM.jpg

M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:


M1 Ultra discussion here:


M2 discussion here:


Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:

 
Last edited:

eek2121

Platinum Member
Aug 2, 2005
2,883
3,860
136
Apple's own marketing slide mentions M1 Ultra is the final member of the M1 family so I think there is a decent chance Mac Pro's SoC might not be based on M1.

One of 2 things are happening:

  1. The Mac Pro is dead, replaced by the Mac Studio.
  2. The Mac Pro will live, but with a different chip altogether.

Apple mentioned one more Mac was due to be released for their transition to be complete, and everyone assumes it is the Mac Pro, however the M1 lacks the features of x86 processors which means they'd need a new chip. They also, on the flip side, killed the 27" iMac, so now you have to ask yourself what the final product will be. It may be a product that doesn't even exist today. One theory I have is that the Mac Studio is designed to replace the Mac Pro, and the final product will be a server, possibly rack mounted. We will see.
 

Eug

Lifer
Mar 11, 2000
23,579
992
126
One of 2 things are happening:

  1. The Mac Pro is dead, replaced by the Mac Studio.
  2. The Mac Pro will live, but with a different chip altogether.

Apple mentioned one more Mac was due to be released for their transition to be complete, and everyone assumes it is the Mac Pro, however the M1 lacks the features of x86 processors which means they'd need a new chip. They also, on the flip side, killed the 27" iMac, so now you have to ask yourself what the final product will be. It may be a product that doesn't even exist today. One theory I have is that the Mac Studio is designed to replace the Mac Pro, and the final product will be a server, possibly rack mounted. We will see.
Actually that Apple guy specifically said right in the Apple event video that the Mac Pro is coming. So we are 100% sure about that.

Also, way back in 2020, Mark Gorman said both a 20-core chip and likely a 40-core chip were coming. Two years later his 20-core claim has been proven correct. Now we are waiting for that 40-core part.
 

ashFTW

Senior member
Sep 21, 2020
302
225
96
One of 2 things are happening:

  1. The Mac Pro is dead, replaced by the Mac Studio.
  2. The Mac Pro will live, but with a different chip altogether.

Apple mentioned one more Mac was due to be released for their transition to be complete, and everyone assumes it is the Mac Pro, however the M1 lacks the features of x86 processors which means they'd need a new chip. They also, on the flip side, killed the 27" iMac, so now you have to ask yourself what the final product will be. It may be a product that doesn't even exist today. One theory I have is that the Mac Studio is designed to replace the Mac Pro, and the final product will be a server, possibly rack mounted. We will see.
They explicitly mentioned the last one to transition to be Mac Pro. Review the last 3 minutes of the Peek Performance event video.
 

Eug

Lifer
Mar 11, 2000
23,579
992
126
So the Mac Pro replacement, if it does exist, won't be M1 but rather M2?

Edit: nevermind, just saw Eug's guess of M1 Ultra Duo
Apple's own marketing slide mentions M1 Ultra is the final member of the M1 family so I think there is a decent chance Mac Pro's SoC might not be based on M1.
Below was a prediction on the chip architectures before the Mac Studio was released.

Specifically, he states that while M1 Max x 2 makes design sense, M1 Max x 4 simply isn't possible because of the design restrictions. He further goes on to state that (M1 Max x 2)2 isn't going to happen either.

So that would point to a separate chip design for the Mac Pro.


M1 Max × 4 is not a thing because the M1 Max, all the drivers around it, and everything we know about the hardware, says it can only support a 2-die configuration, but none of the news outlets are paying attention to the people actually reverse engineering the platform so...

So my bets are: M1 Pro/Max for the new Mac Mini, M1 Max × 2 for the new iMac Pro (and maaaybe a "Mac Mini Pro"?), and T6500 × N for a new, proper, monster Mac Pro.

would a 2 socket x 2 die configuration be possible?

No.
 

Heartbreaker

Diamond Member
Apr 3, 2006
4,219
5,220
136
would a 2 socket x 2 die configuration be possible?

No.

I'd like to see some actual reasons why this isn't possible.

Die limitations do not apply to NUMA configurations.

AFAICT, NUMA is always a possible option. Because it's just two independent CPU/Memory systems in looser association. There have been dual socket Mac Pros, and there are dual socket threadripper boards as well.

It may be a sub-optimal option that Apple doesn't want to do, but I see nothing that indicates it's impossible.
 

Eug

Lifer
Mar 11, 2000
23,579
992
126
I'd like to see some actual reasons why this isn't possible.

Die limitations do not apply to NUMA configurations.

AFAICT, NUMA is always a possible option. Because it's just two independent CPU/Memory systems in looser association.

It may be a sub-optimal option that Apple doesn't want to do, but I see nothing that indicates it's impossible.
This stuff is beyond my knowledge, but take these posts as you will:


 

Heartbreaker

Diamond Member
Apr 3, 2006
4,219
5,220
136
Can't read the second one without logging in, and I refuse to get a twitter account (same applies to facebook and similar). I hate it when people lock content on social media sites.

I guess it's stating that MacOS is not NUMA aware. I suppose that is an issue, but not necessarily an insurmountable one.
 
Last edited:
  • Like
Reactions: Mopetar

Doug S

Platinum Member
Feb 8, 2020
2,191
3,378
136
Why would Apple start adding sockets when they have demonstrated they can handle putting multiple SoCs on the same interposer? You think they can somehow manage only two, not four? Also, the "hardware can only support a two die configuration" thing is due to interrupts, which can't be overcome by having four dies spread across two sockets.

They made changes in the A15 generation to support additional address bits clearly targeted at larger systems, they'd take care of other stuff that wouldn't really affect the iPhone like interrupts, widening data paths to better handle all that memory bandwidth, and so on. Though I'm still thinking there's a good chance they base the M2 on the A16 core. The longer it takes for the first M2 systems to see the light of day, the more likely that becomes. They'd also mostly settled into an even year thing for the 'X' series chips - the only exception after the A5X was the A9X, but they were introducing the iPad Pro that year so they needed to slip one in on an odd year (and A8 -> A9 was their largest ever yearly performance jump so they couldn't miss it for their 'Pro' device)
 

Mopetar

Diamond Member
Jan 31, 2011
7,794
5,898
136
Maybe the obvious answer is that there's no M1 Mac Pro and we don't get one until the M2 comes out, which coincidentally is designed such that more than two chips can be connected via interposer.

The only other possibility I can think of is new silicon that acts as a bridge between two M1 Max chips and just adds additional cores or other resources. However I don't think the Mac Pro has enough volume for that to be a reasonable approach.
 

Eug

Lifer
Mar 11, 2000
23,579
992
126
They made changes in the A15 generation to support additional address bits clearly targeted at larger systems, they'd take care of other stuff that wouldn't really affect the iPhone like interrupts, widening data paths to better handle all that memory bandwidth, and so on. Though I'm still thinking there's a good chance they base the M2 on the A16 core. The longer it takes for the first M2 systems to see the light of day, the more likely that becomes. They'd also mostly settled into an even year thing for the 'X' series chips - the only exception after the A5X was the A9X, but they were introducing the iPad Pro that year so they needed to slip one in on an odd year (and A8 -> A9 was their largest ever yearly performance jump so they couldn't miss it for their 'Pro' device)
Why bother adding these capabilities to A15 if they were going to base the Mac Pro chips on A16?

Anyhow, my expectation is that the new Mac Pro will at least be previewed at WWDC in June.
 

Heartbreaker

Diamond Member
Apr 3, 2006
4,219
5,220
136
Why would Apple start adding sockets when they have demonstrated they can handle putting multiple SoCs on the same interposer? You think they can somehow manage only two, not four? Also, the "hardware can only support a two die configuration" thing is due to interrupts, which can't be overcome by having four dies spread across two sockets.

There is a big ramp up in interconnects needed to connect 4 vs connecting 2 chips.

You need 1 bus/chip to to connect 2 chips. You need 3 buses/chip to connect 4 chips (with one hop).

Apples fusion bus is insane, with 10,000 pins. It takes up one end of the chip. It's not really practical to add 2 more of those.

It almost seems like Apple could have just fabbed the two halves of M1 Ultra as, as one big beast chip with that bridge already connecting them. Then cut them in half when using them as Max/Pro chips, and leave the connection intact for Ultra Chips. No need to have a silicon interposer then.
 

Doug S

Platinum Member
Feb 8, 2020
2,191
3,378
136
There is a big ramp up in interconnects needed to connect 4 vs connecting 2 chips.

You need 1 bus/chip to to connect 2 chips. You need 3 buses/chip to connect 4 chips (with one hop).

Apples fusion bus is insane, with 10,000 pins. It takes up one end of the chip. It's not really practical to add 2 more of those.

It almost seems like Apple could have just fabbed the two halves of M1 Ultra as, as one big beast chip with that bridge already connecting them. Then cut them in half when using them as Max/Pro chips, and leave the connection intact for Ultra Chips. No need to have a silicon interposer then.


It is 10,000 pins with 2.5 TB/sec of bandwidth. That's massive overkill for connecting two chips that each have 400 GB/sec of memory bandwidth at their disposal. I think this is the solution they'll be using for connecting four chips together. Four chips could face the same edge at each other, with a third of the pins going to each chip. They'd need a second layer on the interposer for that so that the wires between two can route underneath (which is why I suggested part of the reason why there are so many is they used differential signaling, though they would intersect at a 90* angle so maybe crosstalk isn't too bad)

I mean, they said there were over 10,000 connections and 2.5 TB/sec of bandwidth. They didn't say the M1 Ultra was using it all. I can't really figure out what it possibly could use that for, there isn't nearly enough possible cross die communication even with max snoop traffic on the caches.
 

ashFTW

Senior member
Sep 21, 2020
302
225
96
It is 10,000 pins with 2.5 TB/sec of bandwidth. That's massive overkill for connecting two chips that each have 400 GB/sec of memory bandwidth at their disposal. I think this is the solution they'll be using for connecting four chips together. Four chips could face the same edge at each other, with a third of the pins going to each chip. They'd need a second layer on the interposer for that so that the wires between two can route underneath (which is why I suggested part of the reason why there are so many is they used differential signaling, though they would intersect at a 90* angle so maybe crosstalk isn't too bad)

I mean, they said there were over 10,000 connections and 2.5 TB/sec of bandwidth. They didn't say the M1 Ultra was using it all. I can't really figure out what it possibly could use that for, there isn't nearly enough possible cross die communication even with max snoop traffic on the caches.

Such a 2x2 grid arrangement, with 1/3 each of pins from a chip going to the other 3 chips, sounds elegant as long as the bandwidth between the chips is sufficient. Note that there will be more CPU and GPU cores on each M2 Max compared to M1 Max. Also, I think that it can’t all be routed on a single plane, so the interposer will have to have two layers of links. Additionally, some links will be longer that the other, I don’t know if that’s an issue with wanting the 4 chiplets to appear as one monolithic chip.
 

Justinus

Diamond Member
Oct 10, 2005
3,165
1,507
136
It is 10,000 pins with 2.5 TB/sec of bandwidth. That's massive overkill for connecting two chips that each have 400 GB/sec of memory bandwidth at their disposal. I think this is the solution they'll be using for connecting four chips together. Four chips could face the same edge at each other, with a third of the pins going to each chip. They'd need a second layer on the interposer for that so that the wires between two can route underneath (which is why I suggested part of the reason why there are so many is they used differential signaling, though they would intersect at a 90* angle so maybe crosstalk isn't too bad)

I mean, they said there were over 10,000 connections and 2.5 TB/sec of bandwidth. They didn't say the M1 Ultra was using it all. I can't really figure out what it possibly could use that for, there isn't nearly enough possible cross die communication even with max snoop traffic on the caches.

Not only does that displace the memory chips which are positioned adjacent to the appropriate chip, but it also implies the only bandwidth that matters is memory bandwidth.

In order for the chips to function to software as a monolithic die is for the throughput from every core to every other core to be uniform, or within margins. Considering cache can run in the TB/s range, you can't split that bus up to more than 2 chips without the cache taking a severe bandwidth hit.

I significantly doubt more than two chips on a package is in the cards for this design.
 

ashFTW

Senior member
Sep 21, 2020
302
225
96
Why bother adding these capabilities to A15 if they were going to base the Mac Pro chips on A16?

Anyhow, my expectation is that the new Mac Pro will at least be previewed at WWDC in June.
Functionality is often added in smaller increments even if it’s not needed right away, so that it can be tested earlier and further built upon. Basic engineering.
 
  • Like
Reactions: ftt, Doug S and Eug

ashFTW

Senior member
Sep 21, 2020
302
225
96
Is M2 which is expected late this year essentially just a 4nm die shrink of M1 ?
It will have the next generation of CPU, GPU, and Neural Engine cores. The number of cores may be increased as well. There will also be improvements to other subsystems like image processing, Secure Enclave etc.
 

amd6502

Senior member
Apr 21, 2017
971
360
136
It will have the next generation of CPU, GPU, and Neural Engine cores. The number of cores may be increased as well. There will also be improvements to other subsystems like image processing, Secure Enclave etc.

I've heard the cpu config is likely to be the same.

Adding two gpu units would be hardly any new work for designers, but they would have to redo lithography steps.

To me it didn't sound like much of any architectuaral change. More like a higher stepping and some cutn paste.

If they really wanted to make some small architectural improvement, the lowest hanging fruit would be just to double the 4 energy efficient cores to 8. But it seems like they don't want to touch the cpu section.
 

ashFTW

Senior member
Sep 21, 2020
302
225
96
I've heard the cpu config is likely to be the same.

Adding two gpu units would be hardly any new work for designers, but they would have to redo lithography steps.

To me it didn't sound like much of any architectuaral change. More like a higher stepping and some cutn paste.

If they really wanted to make some small architectural improvement, the lowest hanging fruit would be just to double the 4 energy efficient cores to 8. But it seems like they don't want to touch the cpu section.
Every generation of A chip has brought substantial improvements in most parts of the chip. I don’t see why it would be any different for the M chips since they borrow heavily from the A chips. M2 is likely going to be based on A16, in my opinion, skipping A15.
 
  • Like
Reactions: amd6502

Heartbreaker

Diamond Member
Apr 3, 2006
4,219
5,220
136
It is 10,000 pins with 2.5 TB/sec of bandwidth. That's massive overkill for connecting two chips that each have 400 GB/sec of memory bandwidth at their disposal. I think this is the solution they'll be using for connecting four chips together. Four chips could face the same edge at each other, with a third of the pins going to each chip. They'd need a second layer on the interposer for that so that the wires between two can route underneath (which is why I suggested part of the reason why there are so many is they used differential signaling, though they would intersect at a 90* angle so maybe crosstalk isn't too bad)

IMO, it's not overkill. This is not a bus to connect single purpose chiplet to common memory.

This is a bus to let the M1 Ultra act as if it is one big chip containing all the parts of both SoCs. All the GPU units work together, the caches work together, the NPUs work together, the external I/O works together, the media ecoders work together, without any NUMA or SLI issues. When connected it really is like one big chip and that needs a connection to everything at high speed and low latency just as if it were all on the same chip.

It's pretty freaking amazing.
 

Eug

Lifer
Mar 11, 2000
23,579
992
126
Geekbench Results Impressive MT results

View attachment 58360

Those M1 Ultra Geekbench 5 scores are looking legit. Here is another:


1785 / 23942

And here is the M1 Max Mac Studio:


1776 / 12780

I just want to see one hit 1800 single-core! ;)

BTW, Ars has pointed out that Apple seems to have changed the naming standard. The identifier just says "Mac13". I would have expected "MacStudio13" in line with the other Macs.
 

ashFTW

Senior member
Sep 21, 2020
302
225
96
IMO, it's not overkill. This is not a bus to connect single purpose chiplet to common memory.

This is a bus to let the M1 Ultra act as if it is one big chip containing all the parts of both SoCs. All the GPU units work together, the caches work together, the NPUs work together, the external I/O works together, the media ecoders work together, without any NUMA or SLI issues. When connected it really is like one big chip and that needs a connection to everything at high speed and low latency just as if it were all on the same chip.

It's pretty freaking amazing.
Yes, amazing indeed! It’s hard to understand how exactly this works.

Contrast this to 4 chiplet Sapphire Rapids, where all the EMIB chiplets have to do is bridge the mesh links from each chiplet to two other chiplets. I don’t know what the total mesh bandwidth is, but the solution makes the 4 chiplets work as if it was one large monolithic chip.