Discussion Apple Silicon SoC thread

Page 456 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Eug

Lifer
Mar 11, 2000
24,176
1,816
126
M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:

Screen-Shot-2021-10-18-at-1.20.47-PM.jpg

M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:


M1 Ultra discussion here:


M2 discussion here:


Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:


M4 Family discussion here:


M5 Family discussion here:

 
Last edited:

jdubs03

Golden Member
Oct 1, 2013
1,498
1,090
136
The "its for the Studio" would make sense, especially if Gurman was right that the Macbook Pro comes in March.

Or maybe the M5 Max Macbook Pro comes out now and the M5 Pro follows a month later. If Apple is capacity constrained like Cook said, it makes sense to release the higher profit Max SKUs first, and lag the Pro.

Though if the new chiplet/packaging stuff gives you more configuration freedom, maybe there is no "Pro" any longer (the "Macbook Pro with M5 Pro" was always an unwieldy name anyway) If you go for the lowest end/default setup maybe that's comparable to what was a Pro but if you beef it up much it is more like Max or maybe even beyond it (i.e. if you wanted to take the minimum/default GPU core count and max out your CPU core count it may be close to Ultra in CPU capability)

We'll have to see how much choice there is for Macbook Pro configuration. I think it might have just two levels each for CPU and GPU but who knows.
As long as they offer at least the same amount of options in CPU and GPU cores as before, fine by me. Across all SKUs.
 

name99

Senior member
Sep 11, 2010
696
582
136
If you have the option of making one giant die at 85% yield, or 2 smaller dies at 90-95% yield, at a lower cost per die or combined, why would you not.
Because
1. advanced packaging is not free.
2. advanced packaging has its own yield losses
3. energy concerns. Crossing a die boundary is always more expensive than staying within a single die
4. as a general point, yield in the old-fashioned sense is vastly less important than claimed. Almost any modern design is designed for yield (eg redundancy in SRAMs, and multiple cores that can operate independently even if one of them is forced inactive.
Far more important to modern designs is parametric yield (ie does the design function at the targeted speed). But parametric yield doesn't magically go up just by splitting a mid-sized chip into two smaller chips.

Obviously there are optionality wins to using multiple dies.
But optionality wins are different from the claims in the article.
 
  • Like
Reactions: Mopetar

Doug S

Diamond Member
Feb 8, 2020
3,845
6,797
136
Because
1. advanced packaging is not free.
2. advanced packaging has its own yield losses
3. energy concerns. Crossing a die boundary is always more expensive than staying within a single die
4. as a general point, yield in the old-fashioned sense is vastly less important than claimed. Almost any modern design is designed for yield (eg redundancy in SRAMs, and multiple cores that can operate independently even if one of them is forced inactive.
Far more important to modern designs is parametric yield (ie does the design function at the targeted speed). But parametric yield doesn't magically go up just by splitting a mid-sized chip into two smaller chips.

Obviously there are optionality wins to using multiple dies.
But optionality wins are different from the claims in the article.

Apple was ALREADY using advanced packaging even with monolithic dies for the LPDDR integration, and SoIC being used for M5 P/M/U is supposedly both cheaper and better for package yield than CoWoS they've used on previous Apple Silicon generations.
 
  • Like
Reactions: Mopetar

fkoehler

Senior member
Feb 29, 2008
223
197
116
I know this is the Apple thread, but you're on AT still.
AMD has proven tile and related packaging is more economical in Server/Desktop, which is why Intel finally threw in the towel as they were getting destroyed with monolithic.

This has been discussed for 7-8+ years now.

Aside the fact tiling allows for greater yield, I wonder if they're actually able to increase that further by not running as much redundancy as monolithic requires, yielding more good dies, and bad ones are just binned.
 
  • Like
Reactions: Mopetar and poke01

name99

Senior member
Sep 11, 2010
696
582
136
I know this is the Apple thread, but you're on AT still.
AMD has proven tile and related packaging is more economical in Server/Desktop, which is why Intel finally threw in the towel as they were getting destroyed with monolithic.

This has been discussed for 7-8+ years now.

Aside the fact tiling allows for greater yield, I wonder if they're actually able to increase that further by not running as much redundancy as monolithic requires, yielding more good dies, and bad ones are just binned.
As an aside Intel made very little money on Lunar Lake. And they dialed back the tile count going from Arrow Lake (CPU, GPU, SoC, Io Extender tile) to just three for Panther Lake (CPU, GPU, SoC)...

Please, for the love of god, read WHAT I WROTE, not what you THINK I wrote...
What you are discussing between Intel vs AMD is optionality. Which I mentioned.
I'm not saying "tiles are bad". I'm saying tiles are not *cheaper* than monolithic, both in raw dollars and in using more energy.
 

fkoehler

Senior member
Feb 29, 2008
223
197
116
Not going to argue.

You said:
"1. Separate GPU and CPU chiplets have various advantages, but lower cost compared to a monolithic SoC are not one of those advantages."

Thats a blanket statement that is arguable, and was half a decade ago.

If it were cheaper to continue to go with monolithic, Intel, TSMC would not have bothered as their competitor would have eaten marketshare.
One could argue that with all associated costs involved total BOM is higher, and thus cost. However that totally misses the large cost savings seen as significantly more die pass QA and allow economies of scale to lower that BOM.

Just pointing out your blanket statement.
 

poke01

Diamond Member
Mar 8, 2022
4,881
6,223
106
Not going to argue.

You said:
"1. Separate GPU and CPU chiplets have various advantages, but lower cost compared to a monolithic SoC are not one of those advantages."

Thats a blanket statement that is arguable, and was half a decade ago.

If it were cheaper to continue to go with monolithic, Intel, TSMC would not have bothered as their competitor would have eaten marketshare.
One could argue that with all associated costs involved total BOM is higher, and thus cost. However that totally misses the large cost savings seen as significantly more die pass QA and allow economies of scale to lower that BOM.

Just pointing out your blanket statement.
I agree as well, chiplets are cheaper because you can mix and match nodes. Its easier to manage SKUs as well.

Also the reason Apple is moving to chiplets as per SemiAnalysis is because of Apple will be using older nodes for things other than GPU and CPU.
 

johnsonwax

Senior member
Jun 27, 2024
473
678
96
I agree as well, chiplets are cheaper because you can mix and match nodes. Its easier to manage SKUs as well.

Also the reason Apple is moving to chiplets as per SemiAnalysis is because of Apple will be using older nodes for things other than GPU and CPU.
Will note that Apple does not SKU spam like the component suppliers do. There are market reasons why they want to mix/match tiles that at least don't apply as readily to Apple, so I can see why Apple hasn't been as attracted to that approach. That said, I do think they have a bit of a problem of needing at least a bit of CPU/GPU variety. If you just want a load of CPU, you gotta buy a lot of GPU on the same die to get it.
 

Doug S

Diamond Member
Feb 8, 2020
3,845
6,797
136
Will note that Apple does not SKU spam like the component suppliers do. There are market reasons why they want to mix/match tiles that at least don't apply as readily to Apple, so I can see why Apple hasn't been as attracted to that approach. That said, I do think they have a bit of a problem of needing at least a bit of CPU/GPU variety. If you just want a load of CPU, you gotta buy a lot of GPU on the same die to get it.

There are also the custom servers for Apple's datacenters, and they'd want chips loaded with GPU tiles with minimal CPU resources for AI, and chips loaded with CPU tiles with zero GPU resources for general purpose servers. Makes sense to design tiles to handle both those roles plus their consumer lineup.
 
  • Like
Reactions: Mopetar and oak8292

The Hardcard

Senior member
Oct 19, 2021
354
448
136
There are also the custom servers for Apple's datacenters, and they'd want chips loaded with GPU tiles with minimal CPU resources for AI, and chips loaded with CPU tiles with zero GPU resources for general purpose servers. Makes sense to design tiles to handle both those roles plus their consumer lineup.
I sure hope they are willing to share at least some information about the PCC in comparison to consumer M5s. Better if they get giddy again and put out die shots.
 

Mopetar

Diamond Member
Jan 31, 2011
8,531
7,795
136
There are also the custom servers for Apple's datacenters, and they'd want chips loaded with GPU tiles with minimal CPU resources for AI, and chips loaded with CPU tiles with zero GPU resources for general purpose servers. Makes sense to design tiles to handle both those roles plus their consumer lineup.

It makes sense for the consumer space as well for a company that wants to offer a wider variety of configurations. Apple doesn't quite go this far, but they certainly could.

If they designed for the ability to make offered the option for customers to choose different chiplet combinations they could offer a lot of flexibility based on whether someone wants more CPU/GPU/NPU cores.

The manufacturing has a ways to go to enable that in a way that's responsive to the market and consumer demands, but a lot of people have needs that lean towards one type of core over others
 

Doug S

Diamond Member
Feb 8, 2020
3,845
6,797
136
It makes sense for the consumer space as well for a company that wants to offer a wider variety of configurations. Apple doesn't quite go this far, but they certainly could.

If they designed for the ability to make offered the option for customers to choose different chiplet combinations they could offer a lot of flexibility based on whether someone wants more CPU/GPU/NPU cores.

The manufacturing has a ways to go to enable that in a way that's responsive to the market and consumer demands, but a lot of people have needs that lean towards one type of core over others

Theoretically yes, the problem is that these configurations are fixed at packaging time so they would need to stock more SKUs in inventory and risk overproducing one and underproducing another. Its the same as with the DRAM being part of the package, there might be a market for a Mini or Macbook Pro or Studio with 2x or 4x the current maximum RAM which they could do using bigger stacks and/or denser chips. But even when DRAM was plentiful and cheap they didn't feel such SKUs were worth it to stock.

So we might see basically the same offerings as today, even if there might be people who want to build a Studio that goes crazy on GPU tiles and no additional CPU tiles for to build AI clusters (like some are already doing with Minis or M3 Ultra Studios) and others who would love a headless Studio with no GPU tiles that maxed out CPU tiles they could use as a compute server. Apple would already be making those SKUs because that's what they're likely to be using for their internal needs, but still not think it is worth offering those options to customers (though if they were that AI focused Studio would likely be a big seller)
 

name99

Senior member
Sep 11, 2010
696
582
136
Not going to argue.

You said:
"1. Separate GPU and CPU chiplets have various advantages, but lower cost compared to a monolithic SoC are not one of those advantages."

Thats a blanket statement that is arguable, and was half a decade ago.

If it were cheaper to continue to go with monolithic, Intel, TSMC would not have bothered as their competitor would have eaten marketshare.
One could argue that with all associated costs involved total BOM is higher, and thus cost. However that totally misses the large cost savings seen as significantly more die pass QA and allow economies of scale to lower that BOM.

Just pointing out your blanket statement.
You keep assuming that chiplets have a cost advantage. This just isn't true as a blanket statement. It's not even true at the highest end (where chiplets work best because of optionality and reticle limits) let alone at the consumer level.

https://newsletter.semianalysis.com/p/cpus-are-back-the-datacenter-cpu discusses this in the sections about Sierra Forest-AP and Clearwater Forest. Both suffer from costing too much (in part because of the low PACKAGING yields), while offering too little improvement (in part because of the INTER-CHIPLET energy costs).
"
With such limited performance gains despite much higher costs from low hybrid bonding yields, it is no wonder that Intel barely mentioned Clearwater Forest in their latest Q4 ’25 earnings. Our take is that Intel does not want to produce these chips in high volumes which hurt margins and would rather keep this as a yield learning vehicle for Foveros Direct.
"

A similar article is https://newsletter.semianalysis.com/p/intel-emerald-rapids-backtracks-on

Chiplets are merely the most recent example of Intel doing its Intel-thing -- marketing-driven engineering. Desperate for some way to look relevant a few years ago, they seized on chiplets (sorry "tiles") as their differentiating factor, and pushed them into everything.
Problem is, this was dumb! Chiplets are, like any engineering tool, a solution to some problems and not others, a tool with COSTS as well as benefits; they are not a universal solution. If they were, why stop at Pro and Max chiplets? Why not make the A19 out of three chiplets and the M5 out of six chiplets?

You are welcome to drink Intel kool-aid. I've been in this business long enough to
1. have seen pretty much my every prediction about some Intel issue eventually come true
2. in spite of which, to see that Intel-worship continues undiminished. Even people who don't think of themselves as Intel-heads still subtly take in the surrounding propaganda claims about the supposed direction of computing (eg the superiority of tiles uber alles).
 

name99

Senior member
Sep 11, 2010
696
582
136
So we might see basically the same offerings as today, even if there might be people who want to build a Studio that goes crazy on GPU tiles and no additional CPU tiles for to build AI clusters (like some are already doing with Minis or M3 Ultra Studios) and others who would love a headless Studio with no GPU tiles that maxed out CPU tiles they could use as a compute server. Apple would already be making those SKUs because that's what they're likely to be using for their internal needs, but still not think it is worth offering those options to customers (though if they were that AI focused Studio would likely be a big seller)
How much CPU do you assume is required for an "AI cluster"?
I suspect it's quite a bit, and more going forward. The main point of an article is quoted earlier, https://newsletter.semianalysis.com/p/cpus-are-back-the-datacenter-cpu , is that data center CPU sales are rising dramatically in tandem with GPU sales.

If you think about it, this makes sense. It's not only the obvious point that some CPU is required to control and command those GPUs, it's that the most interesting aspects of AI going forward are probably going to lean on CPU intelligence as much as GPU throughput, things like RAG (requires looking up data via a network stack, or locally in the file system) or agents (the GPU generates the agent plan, but the plan "execution" generally consists of running a bunch of CPU programs).

Customers might think they want a SoC optimized for the specifics of their code as of March 2026, but Apple has always played the long game, giving customers what they (will) need rather than what they (think) they want.

I'm not sure the compute server market exists much anymore (I guess maybe something like gitHub) but even for this market, to the extent that Apple cares about serving it, an easier and more forward looking solution might be a mechanism that allows the Pro more memory capacity rather than disaggregated chiplets. (The Pro gives you basically the same degree of CPU as the Max, just with less GPU but also, crucially, less memory capacity.)

At the end of the day, it's history more than anything else that has us in this situation where we think of GPUs as a different thing from CPUs, rather than just writing a single program that the compilers generates making optimal use of CPU, NEON, AMX, and GPU as appropriate. My expectation is that Apple would love to get to that point, and it's mainly been the need to continue supporting non-Apple GPUs that has prevented this, a need that's going away...

All of which means Apple has even less reason to push designs with crippled GPU relative to CPU. Their goal is to get developers using all the silicon resources available, and they're not going to do something dumb like Intel and then ship the equivalent of chips without AVX512 or QAT or DSA and thereby slow developer adoption.
 

fkoehler

Senior member
Feb 29, 2008
223
197
116
You keep assuming that chiplets have a cost advantage.

If I'm drinking koolaide, it would be AMD, so again your assumptions are wrong.

Without writing a book the below are the commonly cited reasons why chiplet/tiles are economically favorable.
  • Higher yields on smaller dies
  • Better wafer utilization
  • Mixed-process node optimization
  • Lower non-recurring engineering (NRE) in many cases
You picked several Intel mediocre result products, and that is somehow proof, but AMD kicking Intel's ass doing similarly, while paying TSMC to manf. for them, isn't?

All you're really doing is proving Intel is 2nd place to TSMC.

Even Apple is now doing chiplet/tiles, and they're pretty notorious for not throwing BOM around willy-nilly.

Not sure why you think chiplets are somehow the root of any deficiences in the litho industry.
If staying monlithic were easier, and made a product cheaper, we would not be having this discussion as Intel or TSMC would be laughing at the other's stupidity.

The reduction in yield losses is counter-balanced by fabrics and placement, and the folks actually making the chips have decided.

You are one of the few seemingly calling an entire industry fools.
 

Doug S

Diamond Member
Feb 8, 2020
3,845
6,797
136
How much CPU do you assume is required for an "AI cluster"?
I suspect it's quite a bit, and more going forward. The main point of an article is quoted earlier, https://newsletter.semianalysis.com/p/cpus-are-back-the-datacenter-cpu , is that data center CPU sales are rising dramatically in tandem with GPU sales.

If you think about it, this makes sense. It's not only the obvious point that some CPU is required to control and command those GPUs, it's that the most interesting aspects of AI going forward are probably going to lean on CPU intelligence as much as GPU throughput, things like RAG (requires looking up data via a network stack, or locally in the file system) or agents (the GPU generates the agent plan, but the plan "execution" generally consists of running a bunch of CPU programs).

Customers might think they want a SoC optimized for the specifics of their code as of March 2026, but Apple has always played the long game, giving customers what they (will) need rather than what they (think) they want.

I'm not sure the compute server market exists much anymore (I guess maybe something like gitHub) but even for this market, to the extent that Apple cares about serving it, an easier and more forward looking solution might be a mechanism that allows the Pro more memory capacity rather than disaggregated chiplets. (The Pro gives you basically the same degree of CPU as the Max, just with less GPU but also, crucially, less memory capacity.)

At the end of the day, it's history more than anything else that has us in this situation where we think of GPUs as a different thing from CPUs, rather than just writing a single program that the compilers generates making optimal use of CPU, NEON, AMX, and GPU as appropriate. My expectation is that Apple would love to get to that point, and it's mainly been the need to continue supporting non-Apple GPUs that has prevented this, a need that's going away...

All of which means Apple has even less reason to push designs with crippled GPU relative to CPU. Their goal is to get developers using all the silicon resources available, and they're not going to do something dumb like Intel and then ship the equivalent of chips without AVX512 or QAT or DSA and thereby slow developer adoption.

You only need enough CPU resources in the AI server itself to marshal the data in/out and keep it fed for optimal performance. What to do with the results it produces is a "some other server" problem and that's where you have all the CPU cores.

How big each of those resources is as a share of overall CPU core resources does not seem to be a solved problem from what I can gather. There are a lot of different solutions with a lot of different mixes. Which one is best, who knows.
 

name99

Senior member
Sep 11, 2010
696
582
136
If I'm drinking koolaide, it would be AMD, so again your assumptions are wrong.

Without writing a book the below are the commonly cited reasons why chiplet/tiles are economically favorable.
  • Higher yields on smaller dies
  • Better wafer utilization
  • Mixed-process node optimization
  • Lower non-recurring engineering (NRE) in many cases
You picked several Intel mediocre result products, and that is somehow proof, but AMD kicking Intel's ass doing similarly, while paying TSMC to manf. for them, isn't?

All you're really doing is proving Intel is 2nd place to TSMC.

Even Apple is now doing chiplet/tiles, and they're pretty notorious for not throwing BOM around willy-nilly.

Not sure why you think chiplets are somehow the root of any deficiences in the litho industry.
If staying monlithic were easier, and made a product cheaper, we would not be having this discussion as Intel or TSMC would be laughing at the other's stupidity.

The reduction in yield losses is counter-balanced by fabrics and placement, and the folks actually making the chips have decided.

You are one of the few seemingly calling an entire industry fools.
Look at what I said. I said "1. Separate GPU and CPU chiplets have various advantages, but lower cost compared to a monolithic SoC are not one of those advantages."
I stand by that statement.

AMD is not a case of trying to maximize overall yield by using smaller chipsets, it is a case of optionality. They can churn out one type of die then use different numbers of binned variants of that in different products. That works for their target market (PC vendors who want dozens of different SKUs).

But they don't the same chiplet architecture for Strix Point even though that's 233mm^2 (compared to 70mm^2 for the Granite Ridge CCD).
Why not? Why not use the same Granite Ridge pieces and add a GPU chiplet?
Because Strix Point ships in substantially higher volumes, so optionality is less important than absolute lowest cost per unit manufactured. And at 233mm^2 that's still monolithic.

Apple is in the same sort of situation as AMD. They are not at reticle limits (Max is around 450mm^2) so they don't HAVE to go chiplet for Max (of course Ultra is in a sense going chiplet to go beyond reticle limits).
And 450mm^2 together with design for yield and binning means they're not being killed by any sort of yield issues (would almost certainly lose more yield to packaging than they are losing today to monolithic 450mm^2).
So LOWER COST is not a relevant factor. Which is is what I said and continue to say!

Is optionality an issue (as it very much is for AMD). Well that's a much more interesting question, and one about which I have no special insight.

1. They obviously don't have a variety of PC-style vendors all asking for slightly different SKUs; they have an existing SKU segmentation that seems to work pretty well. I discussed in a post above why I *think* that segmentation has no reason to change.

2. What about "servers" to populate Apple data centers? Do they want dense large packages (like the highest end 256 or 512 core level designs in the ARM/x86 space)? Or do they want racks that are specifically primarily/only CPU and other racks primarily/only GPU?
I think none of us know!

There are a few recent Apple patents for large rack designs and, while these do not answer the above question, they do suggest that Apple doesn't feel a need to slavishly copy the way the current industry has solved various problems. (They do, for example, suggest a radically different solution to the question of how to disaggregate compute vs memory so as to provide compute with varying amounts of very high bandwidth reasonably low-latency DRAM from a large shared pool - like CXL but turned up to 11.)
 

name99

Senior member
Sep 11, 2010
696
582
136
You only need enough CPU resources in the AI server itself to marshal the data in/out and keep it fed for optimal performance. What to do with the results it produces is a "some other server" problem and that's where you have all the CPU cores.

How big each of those resources is as a share of overall CPU core resources does not seem to be a solved problem from what I can gather. There are a lot of different solutions with a lot of different mixes. Which one is best, who knows.
That's an answer that's relevant to a data warehouse. It's NOT relevant to a home compute server ala clawd, or even a departmental server (eg an M5 Ultra/Extreme).

I replied to your "even if there might be people who want to build a Studio that goes crazy on GPU tiles and no additional CPU tiles for to build AI clusters (like some are already doing with Minis or M3 Ultra Studios) and others who would love a headless Studio with no GPU tiles that maxed out CPU tiles they could use as a compute server."

It's in that contexts that I think Apple will continue to provide its fairly simple matrix
(CPU-biased [Pro] vs GPU-biased [Max], each at two performance levels) under the expectation that that works, and works well, for anyone engaged in less than building an actual data center.

And if they are building an actual data center? See my previous post!
 

QuickyDuck

Member
Nov 6, 2023
73
79
61
Intel owns so much advance package capacity and it cant afford to let it bleeding money by sitting idle.

AMD uses chiplets across product line or generation while every Intel product lines these days have its own chiplet solution with no commonality between product lines.
 

fkoehler

Senior member
Feb 29, 2008
223
197
116
I give up.
AMD has saved untold billions with its approach, which Intel came late to the party too.
On top of that, AMD is paying TSMC to make its products, incurring a substantial cost to profit that Intel can avoid.
AMD would likely not be in business today if it had tried to stay on a monolithic process. Or at most, would be a bit player and we'd be on hex-cores w/Intel.