Discussion Apple Silicon SoC thread

Eug · Nov 10, 2020

M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:

M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:

Discussion - Apple Silicon SoC thread

Page 78 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M1 Ultra discussion here:

Discussion - Apple Silicon SoC thread

Page 109 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M2 discussion here:

Discussion - Apple Silicon SoC thread

Page 127 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:

Discussion - Apple Silicon SoC thread

Page 215 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M4 Family discussion here:

Discussion - Apple Silicon SoC thread

Page 263 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M5 Family discussion here:

Page 431 - Discussion - Apple Silicon SoC thread

Page 431 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

name99 · Tuesday at 3:05 PM

OMG! OMG!! OMG!!!

Check out https://patents.google.com/patent/US12572625B1 !

This is pretty much the last missing piece in adding to Apple every known good idea for boosting latency core performance (and something I have been pushing for five years).

The bad news is that with this comes the end of an era, much like the end of the 80s through early 2000s, the era of Denard scaling and doubling MHz then GHz with each new CPU.

CPU IPC will continue to increase after adding support for criticality (and fully exploiting it - the next step is to split each scheduling queue into three queues, one a simple FIFO for instructions with all operands present, one a scheduling queue for critical instructions with at at least operand missing [no cost for tracking age and order], one the standard scheduling queue tracking both ops and age so expensive in area/energy; but this gives us 3x the scheduling reach), but from now on out it's endless painful grind, no silver bullets left.

One consequence of this, I expect (after a few years, we still have to mine out all the benefit left from exploiting good ideas that have been implemented but not 100% optimized over the past few years) is a lot more pressure for a cleaned up ARMv10 ISA. ARMv8 was a magnificent beast that has served us really well, but it's not perfect, and when no other 5% or so improvements are available, the 5% improvement that is available from fixing up the various sub-optimalities that have grown in ARMv8 becomes more compelling.
I have ideas for how this might go down but the most interesting non-technical idea is that if ARM is not willing to update as fast as Apple (and to make changes as large as Apple might want to make) we could see a genuine Apple ISA... (And yeah, it's not going to be that pathetic mess that is RISC-V, not even close. Not gonna, happen, Apple is not populated by retards.)

The Hardcard · Tuesday at 3:49 PM

johnsonwax said:
What's the point of mid cores on a device that has only a single foreground process and a bunch of background ones? How many users on these devices (other than the M series iPad Pros) are using processes that scale well with cores? No, these are devices that are effectively quarantining processes to their own cores - 1 for the OS and it's APIs, plus background user processes, 1 for the foreground process, and E cores for the gazillion background processes.

The use cases for phones isn't changing. And the nature of the apps being used by users isn't changing either. They are almost exclusively big single thread dominated apps that scale poorly with additional cores and almost all of them are I/O constrained to the degree that the user gets frustrated.

This is the wrong question though. It is the e core that needs justification.

The entire purpose of the e cores is to is supremely efficient compute. The mid core matches that efficiency with superior IPC and greater frequency flexibility. There is no known advantage to the e core, so why expect to ever see it again.

My take from the base M5 and A19s is that the mid core wasn’t ready for their schedules. But it is here now and the e core no longer has any purpose, any basis, any justification. It is obsolete.

mvprod123 · Tuesday at 4:02 PM

The Hardcard said:
This is the wrong question though. It is the e cores that need justification.

The entire purpose of the e cores is to is supremely efficient compute. The mid core matches that efficiency with superior IPC and greater frequency flexibility. There is no known advantage to the e core, so why expect to ever see it again.

My take from the base M5 and A19s is that the mid core wasn’t ready for their schedules. But it is here now and the e core no longer has any purpose, any basis, any justification. It is obsolete.

If Apple were planning to phase out the use of E-cores in the future, the improvements implemented in the A19/M5 E-cores would not have been made. I still believe that Apple introduced this third type of core solely for use in Pro/Max/Ultra chips.

The Hardcard · Tuesday at 4:27 PM

mvprod123 said:
If Apple were planning to phase out the use of E-cores in the future, the improvements implemented in the A19/M5 E-cores would not have been made. I still believe that Apple introduced this third type of core solely for use in Pro/Max/Ultra chips.

I don’t think that is reasonable. Why would you stop work before you are done using it? The time to stop is after you are done using it. And is that a reason to keep using something when there is clearly a better choice?

Only one thing matters, with what characteristic or under what circumstances is the e core better than the new P core? Without a clear answer to that question, the e core is toast.

mr_roboto · Tuesday at 4:28 PM

johnsonwax said:
No, these are devices that are effectively quarantining processes to their own cores - 1 for the OS and it's APIs, plus background user processes, 1 for the foreground process, and E cores for the gazillion background processes.

Quarantining the OS and user processes to specific cores that they kind of own isn't how anything actually works.

For example, consider the case of an app reading some bytes from a file. In macOS, the call stack involved in getting data back to the app always includes both userspace libraries and kernel filesystem APIs. (NB for those more familiar with Linux: It's always both here because Apple does not maintain a stable kernel ABI. If you're writing application code, unless you really like a lot of pain, the only sane path to the kernel is calling through vendor libraries - Apple's C standard library, etc.)

Does any of the call stack run on a different core than the one that originated the request? Typically not! Usually you'll observe a sequence of calls spanning many different domains which all execute on the same core as the requesting app thread.

Yes this is simplifying and taking away some reasons why things might move between cores anyways, but point is that if your mental model is "one core for the OS, one for app A, one for app B" etc you're just not thinking about the way things actually work. All cores are for the OS. Application software makes API calls all the time, and in the majority of cases, the highest performance way of implementing the API side is to do most or all the work on the same core that made the API call.

(If you doubt that, consider two things. The first: IPC overhead is real and is not your friend. The second: the requesting core is the one which has all the relevant user app data in-cache. On the back end, the responding core has all the data flowing in the opposite direction. Migrating cache lines between cores is relatively fast, but nowhere near as fast (or energy efficient) as making the requesting and responding cores one and the same so data doesn't have to move at all.)

mr_roboto · Tuesday at 4:37 PM

The Hardcard said:
This is the wrong question though. It is the e core that needs justification.

The entire purpose of the e cores is to is supremely efficient compute. The mid core matches that efficiency with superior IPC and greater frequency flexibility.

[citation needed]

So far, everything I've seen suggests that the M5 generation M core is less efficient than the E core. This is expected: much of the M core's additional performance comes from higher clocks, and clock speed costs power and area (higher power library cells, more pipeline stages, etc).

Your posts make it sound like you believe there's a free lunch in CPU design, and there just isn't. The whole reason this thing is a "mid" core (right down to the clusters being designed M0 and M1) is that it's somewhere between the P and E cores on the spectrum of power/performance/area (PPA) tradeoffs.

The reason it makes sense to eschew E cores in Pro and Max chips is that the benefits of ultra low power E cores tend to get lost in the noise of burning lots more power just for having double or quadruple the DRAM memory controller channel count, and all the other power overheads involved in building a much bigger SoC. But lightweight E cores should still matter a lot in the smaller, lightweight SoCs used in phones and tablets and low-end Macs, so I don't expect to see the E cores go away in the next technology generation.

The Hardcard said:
My take from the base M5 and A19s is that the mid core wasn’t ready for their schedules. But it is here now and the e core no longer has any purpose, any basis, any justification. It is obsolete.

Chip development is not an overnight process. The M5 M-core design effort would have been launched something like 2 or 3 years ago, maybe more, along with everything else in the A19/M5/pro/max technology generation. If Apple intends to fully replace E cores with M cores, there's no obvious reason why they couldn't have already done so in this generation.

Doug S · Tuesday at 4:48 PM

name99 said:
My personal suggestion is that Apple will drop the E-core and announce a new HE-core.
E-cores were (handwaving) 1/3 the performance at 1/3 the energy to perform a given task (relative to S core).
The HE (HIGH Efficiency) core can be 1/10 the performance at 1/10 the energy to perform a given task. You could achieve this (and maybe better) by starting with an in-order 3-wide core and throwing a large suite of very carefully chosen optimizations at it.

Maybe you need to review the benchmarks more closely. E cores have long been 1/3 of the performance (it is more like 2/5th of the performance with the latest E core) at 1/10th the power.

So you already have your 1/10th without sacrificing so much performance. But if you want 1/10th of the performance the E core can clock down to 750 MHz or whatever and it'll be 1/10th of the performance - but likely at more along the lines of 1/30th of the power!

Like I said, I could see the 2/2/2 move if Apple considers the A SoC's role in the Neo to be something they take into consideration during design. But they could easily say "even in the most wildly optimistic scenario about Neo sales we're making 20x more profit from iPhone Pros than Neos so Neo is going to get whatever it gets not get something designed for it".

johnsonwax · Tuesday at 4:49 PM

gdansk said:
High QoS background services if it is indeed more energy efficient
Low QoS but foreground dispatch queue
iOS is flexible enough for it to work. Though not necessarily desirable.
But like Doug was saying, A SoC isn't only for iOS anymore.

Well, you're elevating the Neo to the level of the iPhone in terms of SoC design and it's VERY premature to be doing that.

gdansk · Tuesday at 5:05 PM

johnsonwax said:
Well, you're elevating the Neo to the level of the iPhone in terms of SoC design and it's VERY premature to be doing that.

Not sure about that. It's harder for me to explain an entire core class without it showing up in a lot of products.

johnsonwax · Tuesday at 5:11 PM

Doug S said:
Maybe you need to review the benchmarks more closely. E cores have long been 1/3 of the performance (it is more like 2/5th of the performance with the latest E core) at 1/10th the power.

But this is also a function of Apple's verticality.

Apple isn't designing cores to be performant on Windows or Android, they're designing them to be performant on iOS/MacOS, both unixes that are pretty good at relying on a squidcloud of low QoS background services, and which encourage developers to do the same. But they are different in that their schedulers work differently. If you have an iOS media app you have to put the media stream into a high QoS background thread if you want the user to be able to lock the device and still use the stream, because the P cores are going to be shut off when the device is locked and that stream will be marked as a media stream and will likely get it's own E core to ensure that it's performant, possibly even reserving bandwidth. MacOS doesn't work like that and there's no equivalent to that there. And that's not a minor use case for iPhones, it's one of the major use cases and it's why there are more E than P cores on A series because when it's in your pocket, everything needs to happen on E cores to keep the battery life.

It doesn't matter if it's performant in the benchmarks (which it is), it's performant in the power/use profile that they tell their developers to use and desktop/laptops don't have a 'P cores must be disabled' mode to profile around. (This is also why I think that apps that don't use Apple's native controls on AppleTV suck ass so badly because Apple is able to carefully schedule the media download and the controller and the app front end on different A series cores in a way that isn't open to developers - so the non-native controls choke).

So while I think the benchmarks are instructive to whether M series should have a middle core, I don't think they are for iOS because iOS doesn't work like a PC does, by design. And as such, we shouldn't expect lessons for PCs to apply there. Now Apple could see those performance arguments and embrace them but it would need to come with a fair bit of reworking of how iOS schedules, how developers are told to write their apps, and so on. It's a big undertaking, not a drop in 'the compiler will sort it out' matter.

johnsonwax · Tuesday at 5:18 PM

gdansk said:
Not sure about that. It's harder for me to explain an entire core class without it showing up in a lot of products.

No, axiomatically it's not in the same category. We know this because the Neo uses failed yield chips. Not only is the Neo processor not a consideration in the design of the A series cores, the processor isn't even good enough to use in an iPhone. The Neo processor to Apple is literal garbage. That's how they're hitting the price point. Their BOM lists the processor at $0.

Now, the success of the Neo may change that, but that remains to be seen, and it remains to be seen whether the successor is even a pure successor.

And the other A series products like AppleTV are even more biased away from consideration.

gdansk · Tuesday at 5:21 PM

johnsonwax said:
No, axiomatically it's not in the same category. We know this because the Neo uses failed yield chips. Not only is the Neo processor not a consideration in the design of the A series cores, the processor isn't even good enough to use in an iPhone. The Neo processor to Apple is literal garbage. That's how they're hitting the price point. Their BOM lists the processor at $0.

Now, the success of the Neo may change that, but that remains to be seen, and it remains to be seen whether the successor is even a pure successor.

And the other A series products like AppleTV are even more biased away from consideration.

Not what I'm talking about. The A SoCs are their most shipped and so they want to make it better. That such a configuration would happen to help the MBN is simply an added bonus. They will make iOS more capable. Consider what the foldable must do to justify its price. Consider a mass market VR headset (okay that might be a joke) would probably want something cheaper than M6.

Many product categories where any of 2+2+4, 2+2+2 would be beneficial. And they can eat their BoM back, easily.

johnsonwax · Tuesday at 5:41 PM

The Hardcard said:
This is the wrong question though. It is the e core that needs justification.

The entire purpose of the e cores is to is supremely efficient compute. The mid core matches that efficiency with superior IPC and greater frequency flexibility. There is no known advantage to the e core, so why expect to ever see it again.

My take from the base M5 and A19s is that the mid core wasn’t ready for their schedules. But it is here now and the e core no longer has any purpose, any basis, any justification. It is obsolete.

But it doesn't provide superior latency at the same power. With fewer cores you introduce the problem that threads need to be juggled sufficiently well that the user doesn't observe the juggling. (This is why the Mac had and still has a hardware rendered cursor vs Windows software rendered one - because it was nearly impossible early in the GUI era to guarantee performance to the cursor, and even today Windows occasionally struggles with it.)

Why is the iPhone able to not stutter when scrolling and maintain the illusion of direct interaction (keeping window objects under your finger without drifting) and so on and so forth. The iOS windowserver doesn't compete with the main thread of the foreground app - it runs on a different core so that if the P core running the main thread gets bogged down, the windowserver is unaffected. It has its viewframe and effectively guaranteed performance so you can always scroll it freely. The buttons may be unresponsive when tapped, but the interface is not.

Providing the illusion of a RTOS is MUCH easier if you have guaranteed compute budgets across multiple small cores than trying to maintain that by thread juggling in a single one. It's not a question of aggregate compute/aggregate power, it's a question, at least in some cases, of guaranteed compute/guaranteed budget.

We didn't shift to multiple cores solely because it was more efficient in terms of aggregate compute/aggregate power, we did it because we had increased need for that guarantee. When you watch a Mac scrub through 6 synced 4K streams without a hitch, do you think that's easier to do with 6 cores or 5 cores that are 50% more powerful? So long as the 6 cores each have sufficient power to serve the 4K stream, I guarantee it will be more responsive than the 5 cores.

Note, this isn't a problem that the Mac is trying to solve because apart from things like handling 8K streams in the Pro/Max/Ultra lines, there aren't really many if any critical thresholds they are trying to overcome, but on iOS there are because the nature of the product is so different.

The Hardcard · Tuesday at 6:10 PM

mr_roboto said:
[citation needed]

So far, everything I've seen suggests that the M5 generation M core is less efficient than the E core. This is expected: much of the M core's additional performance comes from higher clocks, and clock speed costs power and area (higher power library cells, more pipeline stages, etc).

Your posts make it sound like you believe there's a free lunch in CPU design, and there just isn't. The whole reason this thing is a "mid" core (right down to the clusters being designed M0 and M1) is that it's somewhere between the P and E cores on the spectrum of power/performance/area (PPA) tradeoffs.

The reason it makes sense to eschew E cores in Pro and Max chips is that the benefits of ultra low power E cores tend to get lost in the noise of burning lots more power just for having double or quadruple the DRAM memory controller channel count, and all the other power overheads involved in building a much bigger SoC. But lightweight E cores should still matter a lot in the smaller, lightweight SoCs used in phones and tablets and low-end Macs, so I don't expect to see the E cores go away in the next technology generation.

Chip development is not an overnight process. The M5 M-core design effort would have been launched something like 2 or 3 years ago, maybe more, along with everything else in the A19/M5/pro/max technology generation. If Apple intends to fully replace E cores with M cores, there's no obvious reason why they couldn't have already done so in this generation.

https://twitter.com/x/status/2032583630665691240

This is what I saw, from the post mvprod123 made on page 476. I have no way to verify any of these posts, but that is my basis. Has anyone shown anything different? Maybe the guy was just drawing for no reason, maybe his analysis or his math is in error.

I’ve been hoping eclecticlight.co would get an M5, he has experience controlling the e core frequencies with QoS and doing analysis, but he hasn’t got one yet. But I haven’t seen any such counter claim that the new core is less efficient inside the e cores frequency range, which the critical question.

The Hardcard · Tuesday at 6:15 PM

johnsonwax said:
But it doesn't provide superior latency at the same power. With fewer cores you introduce the problem that threads need to be juggled sufficiently well that the user doesn't observe the juggling. (This is why the Mac had and still has a hardware rendered cursor vs Windows software rendered one - because it was nearly impossible early in the GUI era to guarantee performance to the cursor, and even today Windows occasionally struggles with it.)

Why is the iPhone able to not stutter when scrolling and maintain the illusion of direct interaction (keeping window objects under your finger without drifting) and so on and so forth. The iOS windowserver doesn't compete with the main thread of the foreground app - it runs on a different core so that if the P core running the main thread gets bogged down, the windowserver is unaffected. It has its viewframe and effectively guaranteed performance so you can always scroll it freely. The buttons may be unresponsive when tapped, but the interface is not.

Providing the illusion of a RTOS is MUCH easier if you have guaranteed compute budgets across multiple small cores than trying to maintain that by thread juggling in a single one. It's not a question of aggregate compute/aggregate power, it's a question, at least in some cases, of guaranteed compute/guaranteed budget.

We didn't shift to multiple cores solely because it was more efficient in terms of aggregate compute/aggregate power, we did it because we had increased need for that guarantee. When you watch a Mac scrub through 6 synced 4K streams without a hitch, do you think that's easier to do with 6 cores or 5 cores that are 50% more powerful? So long as the 6 cores each have sufficient power to serve the 4K stream, I guarantee it will be more responsive than the 5 cores.

Note, this isn't a problem that the Mac is trying to solve because apart from things like handling 8K streams in the Pro/Max/Ultra lines, there aren't really many if any critical thresholds they are trying to overcome, but on iOS there are because the nature of the product is so different.

It doesn’t have to have superior latency. The question is if you replace 6 e cores with an equal number of new cores, would it be worse? I think the new core counts won’t be less than the e cores in any SOC.

johnsonwax · Tuesday at 6:25 PM

The Hardcard said:
Only one thing matters, with what characteristic or under what circumstances is the e core better than the new P core? Without a clear answer to that question, the e core is toast.

So, here's a different story. Back in the Firewire/USB war days, Apple backed Firewire despite it sometimes having lower total bandwidth than USB. Apple got a lot of criticism for this, but Firewire had a feature that USB didn't - it was isochronous. If you were downloading from a camera (Apple's pro market) and you wanted to be able to preview a video over that stream at 60fps while also downloading other clips from the camera, how do you guarantee that the preview stream will get enough bandwidth to display at 60fps without the other downloads cutting into it? Well, that was a feature of Firewire - you could tell the controller to guarantee that bandwidth. That was also a critical feature for audio production because it could do the same thing - guarantee bandwidth to multiple audio channels. USB lacked that feature and so you needed massively more bandwidth to get the same effect because success was a stochastic not guaranteed process.

You guys cannot stop thinking in terms of total bandwidth, and Apple doesn't always think in those terms. They think in terms of experience. If you have a windowserver that needs to guarantee complete it's loop in 8ms for a 120hz display, can your scheduler guarantee that thread will be able to complete its loop when it's shared with other threads on the core?

You're acting as though fewer, more powerful cores would be better because in an NxM calculation you get a bigger number, but there's no merit to the windowserver finishing its thread faster than 8ms, and there's an enormous cost to it finishing it in 9ms. You can guarantee the windowserver will complete its loop on time if you dedicate a core to it that is capable of doing that. You cannot guarantee it if you throw it on a core twice as powerful with a bunch of other tasks unless you block them for 8ms, and that may not be acceptable to do if you have other guaranteed processes - steaming audio or video, etc. iPhone has a lot of guaranteed processes - both in and out of Apples control. The windowserver Apple can choose to let lag. It's a choice. The interface hiccups and the illusion is ruined for the user, maybe no big deal. But the camera sensors are a different matter - and almost all the iPhone hardware is designed around that activity. Can you guarantee that it is able to pull the frame off the 3 sensors plus the audio and run them through filters and whatnot (because Apple does a lot of that in the viewfinder, in real time) and not drop frames? The Mac effectively never needs to do this, but iPhone does, and it needs to be able to do it while also streaming to another device if it's being used as a remote, do it in a 3rd party app with whatever it's own compute needs are, be able to run the UI and the windowserver and be responsive to texts and other general iPhone functions.

I do not think that is achieved by throwing all of that into a pot and having the scheduler dole it out, as the Mac or Windows or linux would do where aggregate compute would be the relevant number. I think it's achieved by reserving compute on a core and dedicating the core to a given task (I believe this because of how Apple informs developers to develop their app). As such, half as many twice as powerful cores is not equivalent to twice as many half as powerful ones, provided the less powerful cores are adequate to handling the critical task which has a known maximum compute budget. You will never need to process more data than the sensor can throw off in this scenario. You will never need to process more screen elements than resolution * refresh rate. And so on. The iPhone seems to be more computationally powerful than it seems like it could be because of these slights of hand which you achieve by approaching the computational task from a different perspective. It's not about getting more work done, it about being able to always get the critical work done while doing a good job of hiding the ball with the work it's not doing. In every case that I have used an app on iOS and the same one on MacOS, the iOS one feels better because it budgets its compute in a very different way. This is the downside of general purpose compute in that these are difficult problems to solve if you treat all processes and all threads as merely a share of a larger budget.

Now, if you are arguing for swapping out 4 E cores for 4 M cores, then it would work. But have you benchmarked it where your compute load is unchanged but also your thread allocation is unchanged. You don't get to fully idle half your cores, they all run but they all idle more. Are you sure you're still coming out ahead in terms of power?

The Hardcard · Tuesday at 7:31 PM

mr_roboto said:
Chip development is not an overnight process. The M5 M-core design effort would have been launched something like 2 or 3 years ago, maybe more, along with everything else in the A19/M5/pro/max technology generation. If Apple intends to fully replace E cores with M cores, there's no obvious reason why they couldn't have already done so in this generation.

I forgot this part. The A19s and base M5 was half a year before the M5 Pro and Max. Given A chip and M chip launch cadences, are you saying no meaningful work couldn’t have needed another half a year?

The iPhone had to launch in September. The M5 products could have waited, but the Vision Pro and iPad Pro were already on extended cycles, the Vision Pro severely so.

johnsonwax said:
Now, if you are arguing for swapping out 4 E cores for 4 M cores, then it would work.

That is precisely what I arguing. I think other comments on core counts may have gotten conflated with mine. I am arguing a minimum of core for core replacement. In fact, given that the AI race is turning into a battle for multiple agents combining to compete tasks, CPU compute demand is increasing. Given the position of Apple Silicon in this battle, it is a strong argument for:

1: new P cores instead of e cores in iPhones and base M devices.

2: increasing the iPhone corre count to 2 + 6.

mr_roboto · Tuesday at 8:22 PM

The Hardcard said:
This is what I saw, from the post mvprod123 made on page 476. I have no way to verify any of these posts, but that is my basis. Has anyone shown anything different? Maybe the guy was just drawing for no reason, maybe his analysis or his math is in error.

There's several problems I see here.

If the dots in the graph represent data points, they've extrapolated the shapes of the curves from just two data points each.
GB6 is deliberately bursty, because they want to simulate race-to-idle patterns. That burstiness makes it harder and more error prone to do accurate power measurements.
Powermetrics doesn't directly measure power, it estimates based on performance counters and mathematical modeling. Apple's own documentation says it's not good for comparing across devices (i.e. they're not guaranteeing the modelling is equally accurate everywhere), and I bet that applies even when comparing two different CPU core types in the same device.
macOS doesn't permit userspace control of CPU freq. How did they collect even two points per curve?

Without any raw data or disclosure of methodology I have a hard time accepting the graph at face value, especially since it's showing something that's counter to all known CPU design principles.

mr_roboto · Tuesday at 8:23 PM

The Hardcard said:
I forgot this part. The A19s and base M5 was half a year before the M5 Pro and Max. Given A chip and M chip launch cadences, are you saying no meaningful work couldn’t have needed another half a year?

Apple co-designs whole families of chips from common building blocks. Yes, some of the chips tape out first, but that doesn't mean all work is equally staggered.

But more importantly, planning for all of them starts many years before. Assume the idea you're arguing for is true, and put yourself in Apple SoC architect shoes several years ago, before any of the engineers under you have started work on A19/M5 generation P/M/E cores. You know you're going to have the team design a M core cluster which can replace the E core cluster across the whole product line with no downsides. Why wouldn't you immediately cancel development of the A19/M5 E core, and make sure the M core schedule was in line with the projected A19 tapeout date?

At the end of the day I obviously don't have insider info, but to me it's not at all obvious that Apple will just want to drop the E core. It still has a place.

The Hardcard said:
That is precisely what I arguing. I think other comments on core counts may have gotten conflated with mine. I am arguing a minimum of core for core replacement. In fact, given that the AI race is turning into a battle for multiple agents combining to compete tasks, CPU compute demand is increasing. Given the position of Apple Silicon in this battle, it is a strong argument for:

1: new P cores instead of e cores in iPhones and base M devices.

2: increasing the iPhone corre count to 2 + 6.

Apple's primary resource for on-device inference isn't CPU cores of any type, it's the neural engine. The ANE is profoundly more efficient (both power and area) for this specialized task than general purpose latency-optimized CPU cores can ever be.

jdubs03 · Tuesday at 10:44 PM

Also let’s keep in mind the A19/M5 series e-core is a massive improvement from the A18/M4 series. They gained 25% in performance at the same power draw; using a 6.6% frequency bump.

The Hardcard · Wednesday at 8:19 AM

Hopefully, more people will dig into the metrics of the M5 generation. I was hoping to see die shots and core flow charts by now. Maybe soon. We still don’t know a lot, including the physical size of the new core.

I’ve been very impressed by e core performance on SPEC17. But I still say it only matters if gives better performance per watt in normal operations or if they needed to make the new core a lot bigger. There’ no nostalgia to it. If Apple sees what that guy thinks he sees, the new core takes over.

mr_roboto said:
But more importantly, planning for all of them starts many years before. Assume the idea you're arguing for is true, and put yourself in Apple SoC architect shoes several years ago, before any of the engineers under you have started work on A19/M5 generation P/M/E cores. You know you're going to have the team design a M core cluster which can replace the E core cluster across the whole product line with no downsides. Why wouldn't you immediately cancel development of the A19/M5 E core, and make sure the M core schedule was in line with the projected A19 tapeout date?

Apple has already demonstrated that they will use cores whenever they are ready. The M3 family all launched on the same day and the M3 Max has a different e core than the others. A half year gap is plenty of time for changes.

mr_roboto said:
Apple's primary resource for on-device inference isn't CPU cores of any type, it's the neural engine. The ANE is profoundly more efficient (both power and area) for this specialized task than general purpose latency-optimized CPU cores can ever be.

Agentic AI is about having your inference launch apps and do work on your computer, thus increasing the CPU demands.

jdubs03 · Wednesday at 9:24 AM

The Hardcard said:
Hopefully, more people will dig into the metrics of the M5 generation. I was hoping to see die shots and core flow charts by now. Maybe soon. We still don’t know a lot, including the physical size of the new core.

I’ve been very impressed by e core performance on SPEC17. But I still say it only matters if gives better performance per watt in normal operations or if they needed to make the new core a lot bigger. There’ no nostalgia to it. If Apple sees what that guy thinks he sees, the new core takes over.

E-core similar size.

name99 · Wednesday at 2:10 PM

mr_roboto said:
Quarantining the OS and user processes to specific cores that they kind of own isn't how anything actually works.

Yes it is. Consider for example the Fujitsu A64FX. Each chip consists of 12+1 cores. Each core runs ARMv8+SVE, and the 12 cores execute the app, while the +1 core executes OS-type functionality.
(The actual hardware looks like it's 14 cores per chip, with an allowance for one to die, to maintain yield, while having each chip look identically 12+1 to SW.)

name99 · Wednesday at 2:14 PM

Doug S said:
Maybe you need to review the benchmarks more closely. E cores have long been 1/3 of the performance (it is more like 2/5th of the performance with the latest E core) at 1/10th the power.

So you already have your 1/10th without sacrificing so much performance. But if you want 1/10th of the performance the E core can clock down to 750 MHz or whatever and it'll be 1/10th of the performance - but likely at more along the lines of 1/30th of the power!

Like I said, I could see the 2/2/2 move if Apple considers the A SoC's role in the Neo to be something they take into consideration during design. But they could easily say "even in the most wildly optimistic scenario about Neo sales we're making 20x more profit from iPhone Pros than Neos so Neo is going to get whatever it gets not get something designed for it".

What is the difference between ENERGY and POWER?
What's the relationship between TIME TAKEN FOR A TASK/PERFORMANCE, MEAN POWER DURING TASK EXECUTION, and ENERGY TAKEN FOR THE TASK?
Yeah...

(EVERY FSCKING TIME!
If America collapses over the next 100 years it will be because apparently only .01% of the country learns even the most basic physics.)

name99 · Wednesday at 2:25 PM

johnsonwax said:
But this is also a function of Apple's verticality.

Apple isn't designing cores to be performant on Windows or Android, they're designing them to be performant on iOS/MacOS, both unixes that are pretty good at relying on a squidcloud of low QoS background services, and which encourage developers to do the same. But they are different in that their schedulers work differently. If you have an iOS media app you have to put the media stream into a high QoS background thread if you want the user to be able to lock the device and still use the stream, because the P cores are going to be shut off when the device is locked and that stream will be marked as a media stream and will likely get it's own E core to ensure that it's performant, possibly even reserving bandwidth. MacOS doesn't work like that and there's no equivalent to that there. And that's not a minor use case for iPhones, it's one of the major use cases and it's why there are more E than P cores on A series because when it's in your pocket, everything needs to happen on E cores to keep the battery life.

It doesn't matter if it's performant in the benchmarks (which it is), it's performant in the power/use profile that they tell their developers to use and desktop/laptops don't have a 'P cores must be disabled' mode to profile around. (This is also why I think that apps that don't use Apple's native controls on AppleTV suck ass so badly because Apple is able to carefully schedule the media download and the controller and the app front end on different A series cores in a way that isn't open to developers - so the non-native controls choke).

So while I think the benchmarks are instructive to whether M series should have a middle core, I don't think they are for iOS because iOS doesn't work like a PC does, by design. And as such, we shouldn't expect lessons for PCs to apply there. Now Apple could see those performance arguments and embrace them but it would need to come with a fair bit of reworking of how iOS schedules, how developers are told to write their apps, and so on. It's a big undertaking, not a drop in 'the compiler will sort it out' matter.

All true. To which I would add (2025)
https://patents.google.com/patent/US20250379816A1

The idea of this patent is to move much of the TCP/IP stack off the E-processor and onto the AoP (very low power Always on Processor) for these sorts of video or audio streaming cases, so that you don't pay the cost of waking up the E-processor to handle each packet.

This is part of the motivation for my idea of an HE processor. Sure, you can bend over backwards to move this sort of work to the AoP, but it's incredibly finicky and difficult. Much easier to just provide something with much the same performance/energy profile as the AoP (perhaps 2 or 3x faster) but looking like a "normal" processor and able to execute more generic code.

Discussion Apple Silicon SoC thread

Lifer

Senior member

Senior member

Senior member

Senior member

Junior Member

Junior Member

Diamond Member

Senior member

Diamond Member

Senior member

Senior member

Diamond Member

Senior member

Senior member

Senior member

Senior member

Senior member

Junior Member

Junior Member

Golden Member

Senior member

Golden Member

Senior member

Senior member

Senior member