Discussion Apple Silicon SoC thread

Page 323 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Eug

Lifer
Mar 11, 2000
23,780
1,351
126
M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:

Screen-Shot-2021-10-18-at-1.20.47-PM.jpg

M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:


M1 Ultra discussion here:


M2 discussion here:


Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:


M4 Family discussion here:

 
Last edited:

smalM

Member
Sep 9, 2019
69
69
91
The Pro just needs to go back to its former state - 8P, 256-bit RAM, half GPU cores.

CPU:
the Pro CPU was 2x 4P-cluster and is now 1x 6P-cluster.
Apple won't go back to a 2x 4P-cluster design.
M3 generation introduced a new P-cluster design, M3 and M4 are using it too (just with 2 less P-cores).
We don't know how many slots it is designed for, maybe just 6 cores + AMX (which may take two slots).
If the cluster wasn't layed out for more than 6 P-cores right from the start, the M4 Pro will not see more than 6 P-cores.

MC:
The M3 Pro uses the same MCs and PHYs the M3 does just 12 instead of 8.
As I don't think Apple will go back to using a half Max GPU design, there is no need to change the M3 Pro MC layout.
It will receive the same update to LPDDR5X as the M4 providing 180GB/s bandwidth.
I also think M5 will just get a speed bump and Apple will wait for LPDDR6 before they change the MC layout again.

GPU:
This is the shared logic of the M3 Pro GPU and half the M3 Max GPU:
Apple M3 Pro GPU - Shared Logic.png Apple M3 Max ½ GPU - Shared Logic.png
What makes you think Apple might want to go back to the half Max GPU design?
That half Max GPU is designed to function as a quarter of an Ultra GPU.
So where is the benefit for a Pro?
 

johnsonwax

Member
Jun 27, 2024
75
142
66
Its CPU as well now. M3 Max has a much better CPU than M3 Pro. 12P vs 6P
Right, but my point is that you're chasing the long tail of utility to users (I have a M1 Max that rarely sees CPU > 200%) So fewer and fewer users benefit from 12 cores. But if you're chasing say, audio/video work where those cores really get used, and if Apple is adding AI to their audio products like Logic Pro, will that benefit from increased NPU perf or is Apple expecting you to decamp your project to Cloud Compute to do that? And if the Cloud Compute processor is just two Max dies welded together, wouldn't they also benefit from the increased NPU?
 

name99

Senior member
Sep 11, 2010
483
364
136
When does Apple start scaling NPU across the line? The point of Pro/Max is mostly GPU perf, not CPU. At some point they'll start looking at NPU scaling across the line. Not sure if M4 got designed in time to care about that for AI.
That's a business decision, not a tech decision.

If the NPU is defined, at a strategic level, as a UI element then maybe never? It's clearly important to Apple to provide as common a UI as possible across the entire platform, and a stronger UI at the high end reduces the value of that commonality. In this view, it's more important to boost the NPU in the Apple watch than to boost it in the Max...

The other alternative is that people are starting to use the NPU for genuine, performance-sensitive work (eg as part of their standard Adobe or Premier type workflows) in which case it makes sense that you can buy a faster workflow. I don't *think* we are there yet (the workflows are starting to exist, but for whatever reasons they mostly still run on GPU). But maybe in a few years?
 
  • Like
Reactions: FlameTail

name99

Senior member
Sep 11, 2010
483
364
136
Well not to that extent 😆 (also 6P+8E is not what anyone’s asking for they shouldn’t do that lol)

The Pro just needs to go back to its former state - 8P, 256-bit RAM, half GPU cores.
Actually more E cores strikes me as more likely than not at some point.
Suppose the most security critical portions of the OS (including calls to crypto libraries) are locked to run on say two E-cores that form their own cluster, with no external code ever running on those cores. This limits the damage possible from things like Spectre; it simply doesn't matter if your CPU can leak info to 3rd party code if no 3rd party code EVER runs on the same CPU as that running "sensitive" code...

My GUESS is that Apple is headed in this direction, essentially returning to the original microkernel vision 30+ years later. This has been a long slow transition, but every year we see big steps (a few years ago networking was essentially moved out the kernel; now the file system is starting to be moved out).

So I would not be surprised to see at some point (probably not with M4 generation, but in the next ten years) at least two E cluster, one the "normal" E cluster and one the "OS" E-cluster (which can probably be even smaller than a half-sized 4-core E-cluster, since AMX is probably not necessary, and maybe also not the page compression/decompression hardware.
 
  • Like
Reactions: okoroezenwa

Doug S

Platinum Member
Feb 8, 2020
2,678
4,528
136
What makes you think Apple might want to go back to the half Max GPU design?
That half Max GPU is designed to function as a quarter of an Ultra GPU.
So where is the benefit for a Pro?

If they want to go "cookie cutter" again they'd probably go in the direction of chiplets. You have say M5 that's a complete SoC in itself, then have a second "add on" die that adds CPU/GPU/NPU more SLC, more memory controllers, and some additional I/O. Add 1 for a Pro, 3 for a Max, 7 for an Ultra, or something in that ballpark.

Not saying that's what I think they're going to do, just that it probably makes more sense than designing the Max to be able to be cut down into a Pro since you still end up with the same "only need to tape out two dies" but probably makes Ultra an easier lift as well.
 

poke01

Golden Member
Mar 8, 2022
1,952
2,479
106
looking at leaked codenames for the M4 series. Donan is M4, M4 Pro and M4 Max is Brava. While this time M4 Ultra has a code name Hidra.

This is a first because the codenames for M4 Pro and Max are grouped together while the M4 Ultra gets a seperate codename. I think they might be doing something different for M4 series.
 
  • Like
Reactions: FlameTail

Doug S

Platinum Member
Feb 8, 2020
2,678
4,528
136
I think that's the price you pay for having a low-power fabric. All laptop SoCs seem to suffer from this no matter the manufacturer (well aside from MTL, but I wouldn't be surprised to see the same in LNL).

Yep the OS scheduler knows about the clusters and will attempt to minimize cross cluster traffic. If you run something big across all cores then you just have to pay the price of latency. I suppose some might wish for a way to ramp up fabric power in the BIOS, and maybe Intel/AMD can give that to you since they offer higher power / lower latency fabrics in higher end desktops.

Probably not worth Apple designing their fabric to be able to operate at higher power for such circumstances since it is really only something that would be desirable on a Studio and a Pro. But who knows, be interesting to see the same test run on a Max in a Macbook Pro and then the same model Max or Ultra in a Mac Studio or Pro (maybe when the M4 family is complete)
 
  • Like
Reactions: CouncilorIrissa

DavidC1

Senior member
Dec 29, 2023
775
1,202
96
Hopefully, M4 Max makes improvements here.
I think that's the price you pay for having a low-power fabric. All laptop SoCs seem to suffer from this no matter the manufacturer (well aside from MTL, but I wouldn't be surprised to see the same in LNL).
While you can turn off cores, fabric can't be turned off until everything else needing the fabric stops requesting data, even trivial ones.

Thus it has a fixed cost and by reducing it you reduce minimum floor on power.
 

poke01

Golden Member
Mar 8, 2022
1,952
2,479
106

$1499 - 18GB RAM, 512GB SSD and M3 Pro. Good deal.
 
  • Like
Reactions: Glo.

FlameTail

Diamond Member
Dec 15, 2021
3,709
2,156
106
CPU:
the Pro CPU was 2x 4P-cluster and is now 1x 6P-cluster.
Apple won't go back to a 2x 4P-cluster design.
M3 generation introduced a new P-cluster design, M3 and M4 are using it too (just with 2 less P-cores).
We don't know how many slots it is designed for, maybe just 6 cores + AMX (which may take two slots).
If the cluster wasn't layed out for more than 6 P-cores right from the start, the M4 Pro will not see more than 6 P-cores.

MC:
The M3 Pro uses the same MCs and PHYs the M3 does just 12 instead of 8.
As I don't think Apple will go back to using a half Max GPU design, there is no need to change the M3 Pro MC layout.
It will receive the same update to LPDDR5X as the M4 providing 180GB/s bandwidth.
I also think M5 will just get a speed bump and Apple will wait for LPDDR6 before they change the MC layout again.

GPU:
This is the shared logic of the M3 Pro GPU and half the M3 Max GPU:
View attachment 105647 View attachment 105649
What makes you think Apple might want to go back to the half Max GPU design?
That half Max GPU is designed to function as a quarter of an Ultra GPU.
So where is the benefit for a Pro?
So the downgraded 192b memory bus of M3 Pro is here to stay?

:sad:
 

name99

Senior member
Sep 11, 2010
483
364
136
Yep the OS scheduler knows about the clusters and will attempt to minimize cross cluster traffic. If you run something big across all cores then you just have to pay the price of latency. I suppose some might wish for a way to ramp up fabric power in the BIOS, and maybe Intel/AMD can give that to you since they offer higher power / lower latency fabrics in higher end desktops.

Probably not worth Apple designing their fabric to be able to operate at higher power for such circumstances since it is really only something that would be desirable on a Studio and a Pro. But who knows, be interesting to see the same test run on a Max in a Macbook Pro and then the same model Max or Ultra in a Mac Studio or Pro (maybe when the M4 family is complete)
What tests do you want to run? The tests that MATTER are those that involve a large number (more than cluster-size) of co-operating threads, and aspects of their performance.

But that's not what's tested! In classic "looking where the light is, not where the keys were lost" fashion, the tests run by the usual benchmarking crowd, while technically accurate (usually...) are not especially USEFUL.
There are obvious factors like "how often do we actually need this thread to wait on that thread" (just doesn't matter how long the wait is if it doesn't happen often) but more important are the less obvious factors.
For example Apple provides remote atomics that handle many synchronization cases without requiring the overhead of locks. But you're not going to see the performance impact of such hardware if you insist that every CPU is an x86 and must be tested like an x86...
Another aspect of this is that Apple provides speculation for common realistic patterns of lock acquisition. But again you're not going to see the real performance impact of this if, instead of testing actual code, what you test is some unrealistic benchmark that simply slams out lock/unlock primitives as fast as possible.

The best we seem to have in terms of real-world tests is GB6 ML, where Apple generally scales a lot better than any x86. That's an imperfect test because you could argue at least part of that is from the Apple better memory bandwidth compared to x86. But it's the best we have, and at the very least you have to conceded that what it shows is Apple's supposedly slow handling of locks and synchronization doesn't seem to have any real-world effect.
 

smalM

Member
Sep 9, 2019
69
69
91
If they want to go "cookie cutter" again they'd probably go in the direction of chiplets. You have say M5 that's a complete SoC in itself, then have a second "add on" die that adds CPU/GPU/NPU more SLC, more memory controllers, and some additional I/O. Add 1 for a Pro, 3 for a Max, 7 for an Ultra, or something in that ballpark.
This is the path I thought Apple would take and I was surprised they instead made a stand-alone M3 Pro.
But now that the design exists and they have something to build apon I think the Pro will stay monolithic for a while.

So the downgraded 192b memory bus of M3 Pro is here to stay?
Yes I think so, as the performance inpact on the Pro seems to be next to non existent for Apple's typical user (Apple: If you want more, buy a Max...). LPDDR5X-7500 now and then maybe LPDDR5X-8400 and that's it until LPDDR6.

But you can allways think, what a looser, the M3 Pro was for sure a one-time-never-again ;)
 
Last edited:

poke01

Golden Member
Mar 8, 2022
1,952
2,479
106
From Apple Insider:
More recently, AppleInsider has learned of two codenames for the t8140 - Tahiti and Tupai, suggesting that Apple has developed two variants of the same chip for use in different devices within the iPhone 16 range.
The more powerful version, known under the codename Tahiti, will be used in the iPhone 16 Pro and iPhone 16 Pro Max, while the iPhone 16, iPhone 16 Plus, and iPhone SE 4 are expected to use the version of the chip codenamed Tupai.


I think Pro will have better GPU and NPU.
 

Doug S

Platinum Member
Feb 8, 2020
2,678
4,528
136
From Apple Insider:
More recently, AppleInsider has learned of two codenames for the t8140 - Tahiti and Tupai, suggesting that Apple has developed two variants of the same chip for use in different devices within the iPhone 16 range.
The more powerful version, known under the codename Tahiti, will be used in the iPhone 16 Pro and iPhone 16 Pro Max, while the iPhone 16, iPhone 16 Plus, and iPhone SE 4 are expected to use the version of the chip codenamed Tupai.


I think Pro will have better GPU and NPU.

The interesting question in my mind is what form the different SoCs will take. Will they be the same die with a core binned off here and there, or will they be separate designs? There's a whole range of possibilities then from wider memory bus to different clock rates to capabilities exclusive to the 'Pro' line.

This will be the first hint of how different they plan to make the Pro line from the base line. If I had to bet I'd say same die with stuff binned/fused off because that's what they've done with Apple Silicon (i.e. the two variants of M4 in iPad Pro) but if they use separate dies that opens up the possibility of reserving newer processes for the Pro SoC in the future (i.e. if the rumors about them using N2 for next year's iPhone turn out to be true) They didn't have a reason to do that in the past because newer processes were always cheaper per transistor, but the per transistor cost is flattening out so there's more of a reason to use an older process in the less expensive products (especially the SE)
 
  • Like
Reactions: poke01

johnsonwax

Member
Jun 27, 2024
75
142
66
They've used the same processor in base and pro before and differentiated by camera, screen, etc. I hadn't anticipated a split of the A series, but given the rumors of a thinner phone next year, maybe a lower power variant is necessary. iPhone volumes are large enough to have distinct dies. It's more a question of does the design team have the bandwidth to design an extra one.
 

Doug S

Platinum Member
Feb 8, 2020
2,678
4,528
136
They've used the same processor in base and pro before and differentiated by camera, screen, etc. I hadn't anticipated a split of the A series, but given the rumors of a thinner phone next year, maybe a lower power variant is necessary. iPhone volumes are large enough to have distinct dies. It's more a question of does the design team have the bandwidth to design an extra one.

They wouldn't be designing different cores just mixing and matching blocks a bit differently. Just another floorplan and route. That's easy enough to do, but another mask set costs money, which would be made up by having a smaller/cheaper die.

I don't see the "thin" phone having a lower power CPU, it would have the same CPU and get less battery life. They could shave 40-50% off the battery life and many people (like me) would not be affected at all, though obviously they wouldn't need to lose that much capacity.They'd still have the "fat" phone for people who need the battery life of today's models. Or who knows, maybe there are some improvements in energy density coming that'll make up the difference.
 

LightningZ71

Golden Member
Mar 10, 2017
1,768
2,122
136
Another possibility: They are doing the same basic SOC (with some things removed) on a completely different foundry node and at a different foundry. With TSMC demonstrating that they are having difficulty at the leading edge recently, Apple could be hedging their bets by making their leading edge design at TSMC with a NGD ongoing contract at a premium, but has a scaled down SOC at a different foundry on a less expensive node for their lower end devices. It would still offer the same technical abilities, but at reduced performance, allowing a common programming target with different cost and performance profiles.
 

johnsonwax

Member
Jun 27, 2024
75
142
66
Another possibility: They are doing the same basic SOC (with some things removed) on a completely different foundry node and at a different foundry. With TSMC demonstrating that they are having difficulty at the leading edge recently, Apple could be hedging their bets by making their leading edge design at TSMC with a NGD ongoing contract at a premium, but has a scaled down SOC at a different foundry on a less expensive node for their lower end devices. It would still offer the same technical abilities, but at reduced performance, allowing a common programming target with different cost and performance profiles.
Boy, that's a LOT of additional work to split A series across foundries. I would think if they wanted to test the waters with a different foundry they'd do it with something like the S series where users don't connect the product so strongly to annual compute advances.
 

LightningZ71

Golden Member
Mar 10, 2017
1,768
2,122
136
"a lot of extra work" = cost
going to a different, far cheaper foundry that is desperate for customers = less cost
Assuming that this only goes in the base iPhone for that generation, it should sell around 3-4 million units per year. If the SoC winds up costing them $10 less per unit than at TSMC, that's $30-40 million worth of costs that can be absorbed and still break even.
 

johnsonwax

Member
Jun 27, 2024
75
142
66
"a lot of extra work" = cost
going to a different, far cheaper foundry that is desperate for customers = less cost
Assuming that this only goes in the base iPhone for that generation, it should sell around 3-4 million units per year. If the SoC winds up costing them $10 less per unit than at TSMC, that's $30-40 million worth of costs that can be absorbed and still break even.
No, actual extra work. You can't just copy a TSMC design to a different foundry utilizing different tech and paste it in. You're overfocused on cost, which is rarely the driving factor in these decisions in the near term and not enough on organizational bandwidth. Does Apple's silicon team (which has been badly poached lately) have the bandwidth in their increasing silicon portfolio to coordinate with a new foundry, identify their process, design around that process, and completely redesign and retest an A series processor for that new process. That's not just a new floorplan and route as Doug points out.

I'm not arguing the cost and supply chain redundancy benefits aren't there, I'm arguing that splitting a design to seek that is a HUGE waste of organizational bandwidth, and would make more sense to pursue when that team is already having to do all of that work with TSMC for an S series jump or something like that, and instead do that with Samsung. It would still be a lot of additional work to build those contracts and relationships and so on, but it wouldn't carry the additional cost of redesigning a processor that doesn't need to be redesigned. That's just not how you would broker that. Not to mention, if Apple was doing that, we would have known about it at least 18 months ago for a product shipping to customers in 4 weeks.