Discussion Apple Silicon SoC thread

Page 457 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Eug

Lifer
Mar 11, 2000
24,176
1,816
126
M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:

Screen-Shot-2021-10-18-at-1.20.47-PM.jpg

M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:


M1 Ultra discussion here:


M2 discussion here:


Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:


M4 Family discussion here:


M5 Family discussion here:

 
Last edited:

Mopetar

Diamond Member
Jan 31, 2011
8,532
7,795
136
Theoretically yes, the problem is that these configurations are fixed at packaging time so they would need to stock more SKUs in inventory and risk overproducing one and underproducing another.

It's obviously not something that can be done now, but I can see a push towards implementing the same JIT approach when it comes to semiconductor manufacturing and packaging.

If robotics advances enough over the next decade there's little reason not to move manufacturing of finished products closer to the markets where they'll be sold.

Companies only need to stock the most popular configurations that will sell through quickly enough that there's less risk of overproduction. If anything custom can be made to order and delivered in a week or two most customers will be happy with a slight wait.

There's still the roadblock of longer lead time for manufacturing the silicon. If that takes several months from a wafer start to shipping chiplet, it will struggle to adjust to any rapid shifts in consumer demand.
 

fkoehler

Senior member
Feb 29, 2008
225
197
116
Not quite sure where exactly I said chiplets are a perfect technology.
Anyone with sense would understand its an approach that reduces potential die destroying or die wastage.
Nothing is free, and fabrics/etc cost money, but you get numerous benefits from mixing-matching litho processes in a way monlithic simply doesn't.

You're the one seeming to believe its the wrong approach, for.. reasons.
Personally, I'll trust the folks who make their living competing for the cheapest production cost as likely having just a bit more data than myself or anyone else outside the field.
 

mvprod123

Senior member
Jun 22, 2024
549
600
96
Good results. The M5 Max will close the gap in GPU, and the M6 Max will come out ahead. Mobile GPUs in the RTX 6000 series are not expected until approximately 2028.

 
  • Like
Reactions: Mopetar

oak8292

Senior member
Sep 14, 2016
211
226
116
Not quite sure where exactly I said chiplets are a perfect technology.
Anyone with sense would understand its an approach that reduces potential die destroying or die wastage.
Nothing is free, and fabrics/etc cost money, but you get numerous benefits from mixing-matching litho processes in a way monlithic simply doesn't.

You're the one seeming to believe its the wrong approach, for.. reasons.
Personally, I'll trust the folks who make their living competing for the cheapest production cost as likely having just a bit more data than myself or anyone else outside the field.
Probably under represented in this conversation is the need for die volume to reduce non-recurring engineering and mask costs. Somewhere in the not too distant future just about everything may be chiplets. This has been coming for years and was even in Moore’s papers from 50 years ago. A presentation at IEDM in 2015 provided graphs of transistor cost versus die volume. There was an increase in transistor cost for volumes less than 10 million die just going from 28 nm to 16 nm. At 300 million die the projection was that transistor costs start to flatten with continuing shrinks. That means essentially nobody is gaining anything from transistor costs. There can still be benefits in power and performance.


AMD was the first company to need chiplets to increase die volume. As pointed out they were uncompetitive with monolithic die. Using the minimum number of die for the maximum number of SKUs isn’t novel. At one point Intel was using 5 distinct die to cover the consumer space and three die for servers. At AMD necessity was the mother of invention.

There are obvious advantages to monolithic but if the margins aren’t there or the power advantages aren’t there to cover the cost then chiplets are the answer. Apple Mac volumes aren’t high. The Ultra is low volume and it is already multi-chip. Doug’s thoughts about the servers being added to the chiplet mix for these processors really makes sense and should improve cost and maybe even update frequency.

Apple appears to update based on sales volume to cover engineering costs. The Mac Pro is virtually dead and my assumption is that reflects sales.
 

name99

Senior member
Sep 11, 2010
700
584
136
Not quite sure where exactly I said chiplets are a perfect technology.
Anyone with sense would understand its an approach that reduces potential die destroying or die wastage.
Nothing is free, and fabrics/etc cost money, but you get numerous benefits from mixing-matching litho processes in a way monlithic simply doesn't.

You're the one seeming to believe its the wrong approach, for.. reasons.
Personally, I'll trust the folks who make their living competing for the cheapest production cost as likely having just a bit more data than myself or anyone else outside the field.
...............................................
Starting point:
<<
If you have the option of making one giant die at 85% yield, or 2 smaller dies at 90-95% yield, at a lower cost per die or combined, why would you not.
Because
1. advanced packaging is not free.
2. advanced packaging has its own yield losses
3. energy concerns. Crossing a die boundary is always more expensive than staying within a single die
4. as a general point, yield in the old-fashioned sense is vastly less important than claimed. Almost any modern design is designed for yield (eg redundancy in SRAMs, and multiple cores that can operate independently even if one of them is forced inactive.
Far more important to modern designs is parametric yield (ie does the design function at the targeted speed). But parametric yield doesn't magically go up just by splitting a mid-sized chip into two smaller chips.

Obviously there are optionality wins to using multiple dies.
But optionality wins are different from the claims in the article.
>>

Next:
<<
I know this is the Apple thread, but you're on AT still.
AMD has proven tile and related packaging is more economical in Server/Desktop, which is why Intel finally threw in the towel as they were getting destroyed with monolithic.

This has been discussed for 7-8+ years now.

Aside the fact tiling allows for greater yield, I wonder if they're actually able to increase that further by not running as much redundancy as monolithic requires, yielding more good dies, and bad ones are just binned.
>>

My POINT was that "tile and related packaging" is NOT automatically cheaper, and certainly not automatically "better". Thus answering fkoehler's question.
I have given you multiple lines of evidence to that fact. Most significantly, the fact that the hyperscaler ARM designs (where optionality is unimportant) are uniformly based on a monolithic piece of logic that's reticle limited. And even Intel walked back from SPR as four (mid-sized) logic chiplets to two (reticle-limited) logic chiplets as of EMR.

If you still insist that chiplets are always cheaper, well it's not my business to disabuse you of the fact. I've said everything I need to say on this issue.
 
  • Like
Reactions: coercitiv

MerryCherry

Member
Jan 25, 2026
35
42
46
There are many indicators that tell us single thread performance is more important than multi thread performance beyond games and other applications that are difficult to extract parallelism from the code beyone 8 to 12 cores, which is why I am still a big proponent of ST performance and that those first 8 to 12 threads an application grabs are the most important. That is the reason for my "ST Investigation" thread.

Even an application, which everyone for the most part agrees is "ridiculously multithreaded" relies heavily on those first 8 to 12 cores.

Follow me here.

Here is a chart of CB R26 MT performance from my 9950X tested from 1 to 16 threads. If I move beyond 16 threads it will start using logical threads and skew the results. As you can see the throughput/GHz or Pts./GHz decreases from 105 when using 1 thread to 83 when using 16 threads. That is a 20% decrease in utilization of threads for this "ridiculously multithreaded benchmark."

While Skymont/Darkmont, etc.. have become remarkably performant since the Gracemont days, by the time Cinebench grabs the first E core (for Nova Lake) at thread number #17, less than 20% of the available CB performance will only be extracted from that thread. I don't know how this application scales beyond 16 physical threads but if the trend continues this does not bode well for what many call the E core "Cinebench accelerators" for Nova Lake.

There are factors at work here beyond linear scaling. Zen 6 may only have 24 total logical threads, but threads 17 through 24, which we assume will continue this diminishing returns trend in CB will be going up against less performant Intel E-cores. By the time Zen 6 runs out of physical threads CB might only be extracting 35 or 40% performance from each one, meaning the battle for Cinebench MT supremacy many already be lost for Intel by that 24th thread.

Compared to R23 which was I'm guessing was more "core local," the Redshift engine in CB R26 may stress L3 and main memory, thus starving cores for resources as thread count increases. Seeing how I doubt Maxon will revise the Redshift engine for Intel (but who knows;), those large BLLC's might be exactly for this purpose. To insure the CB accelerators have enough resources to keep accelerating CB.

Or, I'm completely wrong about this and the Redshift engine for some other reason just doesn't scale linearly with threads. Which is not good new for Intel as far as this benchmark goes.


View attachment 138271
And this is why Apple Silicon is great. Their focus on ST performance.

And also shows that the Geekbench 6 Multicore Test (which everyone spits on), is very relevant.

Many client applications are ST bound, and only lightly multi threaded.
 

Doug S

Diamond Member
Feb 8, 2020
3,856
6,810
136
Microsoft's Cobalt 200 has two logic chiplets.

How big are they though? Nvidia is using multiple reticle sized dies now. Cerebras is using the entire wafer as a "chip" (and I can't help wondering why they aren't stacking a second die on top that's composed entirely of SRAM in a big boy version of AMD's X3D line)
 

Covfefe

Member
Jul 23, 2025
123
219
76
How big are they though? Nvidia is using multiple reticle sized dies now. Cerebras is using the entire wafer as a "chip" (and I can't help wondering why they aren't stacking a second die on top that's composed entirely of SRAM in a big boy version of AMD's X3D line)
By my pixel counting, 500-600mm^2. So yes, in this case one chip would be too big for the reticle limit. Though other hyperscalers have used tiny chiplets before for I/O. I guess that's why name99 specified logic chiplets.
 

fkoehler

Senior member
Feb 29, 2008
225
197
116
...............................................
Starting point:


If you still insist that chiplets are always cheaper, well it's not my business to disabuse you of the fact. I've said everything I need to say on this issue.

Not sure anyone has made any claim that chiplets are always cheaper.
AMD could likely make a single die for Ryzen.
Even on TSMC's better nodes, the amount of redundancy required to avoid scrapping or binning at best, is clearly not going to be a money maker.

What you seem to be suggesting is similar to saying Ford/Chevy/Toyota are missing the boat by not getting raw steel inputs and doing everything in-house to finally churn out an end product vehicle.

Not a yield expert, but AMD/Intel seem to feel chiplets and multi-chip is the most economical way to advance.
You are seemingly positive they are wrong for some reason.
 

coercitiv

Diamond Member
Jan 24, 2014
7,497
17,916
136
My POINT was that "tile and related packaging" is NOT automatically cheaper, and certainly not automatically "better".
If I may be allowed to add an equivalent explanation: for a given node, application and packaging tech, there's a die size X from which chiplets start to make sense financially. All of the parameters matter in this equation. Being able to reuse said chiplets across the product portfolio can also help offset the design and (advanced) packaging cost, driving that X value down.

Here's what AMD had to share with us on their 3rd gen Ryzen chiplet costs:

Kz3cs0RT9fOKcSWg.jpg

9uDSBMVuYGk9WqcD.jpg
 
  • Like
Reactions: Covfefe

name99

Senior member
Sep 11, 2010
700
584
136
Not quite sure where exactly I said chiplets are a perfect technology.
Anyone with sense would understand its an approach that reduces potential die destroying or die wastage.
Nothing is free, and fabrics/etc cost money, but you get numerous benefits from mixing-matching litho processes in a way monlithic simply doesn't.

You're the one seeming to believe its the wrong approach, for.. reasons.
Personally, I'll trust the folks who make their living competing for the cheapest production cost as likely having just a bit more data than myself or anyone else outside the field.

Quoting this for the THIRD time...
<<<
If you have the option of making one giant die at 85% yield, or 2 smaller dies at 90-95% yield, at a lower cost per die or combined, why would you not.
Because
1. advanced packaging is not free.
>>>

<<<
know this is the Apple thread, but you're on AT still.
AMD has proven tile and related packaging is more economical in Server/Desktop, which is why Intel finally threw in the towel as they were getting destroyed with monolithic.
>>>

I'm now out.
You're obviously arguing in bad faith, constantly bob-and-weaving, constantly rewording what I said, then shifting to some other issue when I point out the rewording; refusing to accept the *cost* point I have been trying to stress repeatedly; and have zero interest in learning from anyone, even when given copious material to explain a particular point.
 

name99

Senior member
Sep 11, 2010
700
584
136
If I may be allowed to add an equivalent explanation: for a given node, application and packaging tech, there's a die size X from which chiplets start to make sense financially. All of the parameters matter in this equation. Being able to reuse said chiplets across the product portfolio can also help offset the design and (advanced) packaging cost, driving that X value down.

Here's what AMD had to share with us on their 3rd gen Ryzen chiplet costs:

View attachment 138316

View attachment 138317
Look EXACTLY at what I said, not what you think I said.
1. I agree with optionality. Which is what AMD mainly cares about.

2. Obviously you can use packaging to go beyond reticle limit, no-one denies that.

3. I am arguing against ONE very particular claim, namely the claim that yields are bad enough in current logic processes, but good enough on the packaging side, that it's a net COST win to break down a logic chip into small pieces (substantially smaller than reticle) and build a package from those pieces.
I've provided ample evidence than in fact everyone who is faced with the problem of creating a large logic package, from Cerebras down, doesn't actually do things that way. For the most part they create the logic package up to the reticle limit. (Of course they then tie pieces together to beyond reticle limit as in 2. above)

Look EXACTLY at what AMD are claiming above. In particular they are not making the "better yield" argument that was the starting point for my original post. They're making arguments mainly about optionality, but also IP reuse (and in fact mask reuse). None of which I deny. Everything I have said is attacking this claim that "chiplets are cheaper because better yield".
 

Covfefe

Member
Jul 23, 2025
123
219
76
Look EXACTLY at what I said, not what you think I said.
1. I agree with optionality. Which is what AMD mainly cares about.

2. Obviously you can use packaging to go beyond reticle limit, no-one denies that.

3. I am arguing against ONE very particular claim, namely the claim that yields are bad enough in current logic processes, but good enough on the packaging side, that it's a net COST win to break down a logic chip into small pieces (substantially smaller than reticle) and build a package from those pieces.
I've provided ample evidence than in fact everyone who is faced with the problem of creating a large logic package, from Cerebras down, doesn't actually do things that way. For the most part they create the logic package up to the reticle limit. (Of course they then tie pieces together to beyond reticle limit as in 2. above)

Look EXACTLY at what AMD are claiming above. In particular they are not making the "better yield" argument that was the starting point for my original post. They're making arguments mainly about optionality, but also IP reuse (and in fact mask reuse). None of which I deny. Everything I have said is attacking this claim that "chiplets are cheaper because better yield".
Those slides unequivocally show that chiplets reduce manufacturing cost. So yes, AMD is making the "better yield" argument.