• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Discussion Apple Silicon SoC thread

Page 464 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Eug

Lifer
M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:

Screen-Shot-2021-10-18-at-1.20.47-PM.jpg

M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:


M1 Ultra discussion here:


M2 discussion here:


Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:


M4 Family discussion here:


M5 Family discussion here:

 
Last edited:
I'll be looking forward to a deep dive on Apple's new core arrangements. It also raises some interesting questions for the M6 that is expected at the end of this year or early next.

It seems a little weird to announce a new display without a new Studio or Pro to go along with it, but maybe those are still in the oven.
 
The new CPU setup completely changes the game!

Now there's a significant gap between M5 and M5 Pro/Max in multithread performance. What does this bode for M6?

Perhaps Apple might swap out the 6 efficiency cores for 6 performance cores?

Perhaps they might be bump up the number of Super cores from 4 to 6 ?

Or both?
 
What I'm wondering is if the "e-core" branding is going out to pasture, and base M6 will end up with a "super core + p-core" configuration. In other words, if this is actually a third microarchitecture family, or a replacement for the e-core family.
 
The new CPU setup completely changes the game!

Now there's a significant gap between M5 and M5 Pro/Max in multithread performance. What does this bode for M6?

Perhaps Apple might swap out the 6 efficiency cores for 6 performance cores?

Perhaps they might be bump up the number of Super cores from 4 to 6 ?

Or both?
I think nothing will change for the regular M6. E-cores are still critical for fanless devices (iPad, MacBook Air).
 
What I'm wondering is if the "e-core" branding is going out to pasture, and base M6 will end up with a "super core + p-core" configuration. In other words, if this is actually a third microarchitecture family, or a replacement for the e-core family.
This is clearly not a replacement for E-cores. Apple now has three types of cores in its IP portfolio.
 
What I'm wondering is if the "e-core" branding is going out to pasture, and base M6 will end up with a "super core + p-core" configuration. In other words, if this is actually a third microarchitecture family, or a replacement for the e-core family.

Lots of unanswered questions, but many will be answered when the first GB6 runs appear over the next few days.

What I'm surprised about is that Apple managed to keep this change away from the rumor mill. No one mentioned "super cores", or that Apple had developed a third core type.

I'm also really curious where the line is drawn between the two chiplets. Specifically what else is on the one with the GPU cores other than those cores themselves. I'm guessing it will carry some of the memory controllers and SLC, as that would make it suitable for Apple to build AI servers using a bunch of those GPU chiplets.
 
So it's basically a rename of cores, the standard P core is now super core and E core is now perf core. Or are the Perf cores something else than rebranded E cores?
Chiplets open the door to truly Frankenstein A/M core hybrids in the future so they are starting to differentiate between A and M P cores and A and M E cores.
 
I think nothing will change for the regular M6. E-cores are still critical for fanless devices (iPad, MacBook Air).
Not necessarily. Apple has made architecture adjustments when they are ready, leaving differences within a generation. Notably, the M3 Max has a different, wider e-core than other M3s.

It could easily be that the only reason the M5 has efficiency cores was that the new core wasn’t ready yet.
 
So here's a theory:
When Intel introduced their P vs E cores, they confused the terminology.
An Intel P core is, like an Apple P core, optimized for performance.
But an Intel E-core is optimized for AREA, not for ENERGY.

The Apple numbers are that at Apple E-core uses about 1/10th the energy to execute the same work at 3x the time it would take a P-core. (Lot's of if's and but's there, but that's the ball park.)
An Intel E-core is about .5 to .65 the performance of an Intel P-core (compare vs Apple's .35) I don't have energy comparison numbers.

Apple may have concluded based on experience and seeing how M silicon is used in actual professional systems, that an Intel type solution is a better deal, at least for systems where battery life is irrelevant or less important?
So what we have now is maybe something that's a lot more like Intel recent designs like Lunar Lake (4P, 4E) or Arrow Lake (at the high end, 8P+16E). Not as power efficient as the old school Apple P+E, but a substantial jump performance-wise if you have lots of active threads (eg for the all-important Cinebench CPU workload...)
In this lingo, then, we have something like
super ~ Intel P
performance ~ Intel E
efficiency ~ Intel LE

Maybe after seeing how these all play out, the future will give us even more Intel-like designs, with say 2 or 4 efficiency/LE cores added to the die for background work?
Or maybe it goes the other way? Maybe Apple have concluded they now have such good control over DVFS and powering down the unused parts of a core that optimizing for area rather than energy works fine, just run your "performance" chip at 1GHz instead of 3.5 GHz and you get E-level efficiency?

Of course the other weird element is this claim about Fusion Architecture and I got nuttin'! You know my opinion (shared with Jensen) that chiplets do not buy you anything significant in terms of yield, and cost energy and performance to cross die's. But they do give you optionality. So presumably the optionality now matters to Apple. Maybe with the new studio things will be clearer? Maybe there's a feeling that the AI market is hot enough that substantial numbers of Ultra equivalents will be sold, but that market wants to pay for say 3x the GPU of a Max but only 1x the CPU, and "Fusion Architecture" is a better way to deliver that?
 
Lots of unanswered questions, but many will be answered when the first GB6 runs appear over the next few days.

What I'm surprised about is that Apple managed to keep this change away from the rumor mill. No one mentioned "super cores", or that Apple had developed a third core type.

I'm also really curious where the line is drawn between the two chiplets. Specifically what else is on the one with the GPU cores other than those cores themselves. I'm guessing it will carry some of the memory controllers and SLC, as that would make it suitable for Apple to build AI servers using a bunch of those GPU chiplets.
I’d bet that all of the memory is on the GPU and that the CPU is just the CPU cores and the ANE. The new cache layout is likely because of the increase in latency and power of chiplet datapaths.

Apple Silicon has been defacto GPUs with integrated CPUs I don’t think that is changing.
 
Of course the other weird element is this claim about Fusion Architecture and I got nuttin'! You know my opinion (shared with Jensen) that chiplets do not buy you anything significant in terms of yield, and cost energy and performance to cross die's. But they do give you optionality. So presumably the optionality now matters to Apple. Maybe with the new studio things will be clearer? Maybe there's a feeling that the AI market is hot enough that substantial numbers of Ultra equivalents will be sold, but that market wants to pay for say 3x the GPU of a Max but only 1x the CPU, and "Fusion Architecture" is a better way to deliver that?

Alternatively, maybe they want to decouple the CPU and GPU release cadence? Or they want to be able to build one or the other on an older node for cost reasons?

I'm not sure I quite get it either.
 
Well, that new cache setup is big news, if true.

Edit:

Article mentions this, but no details are given.

"The industry-leading super core was first introduced as performance cores in M5, which also adopts the super core name for all M5-based products — MacBook Air, the 14-inch MacBook Pro, iPad Pro, and Apple Vision Pro. This core is the highest-performance core design with the world’s fastest single-threaded performance, driven in part by increased front-end bandwidth, a new cache hierarchy, and enhanced branch prediction."

You can never be sure what phrases like that mean!
My *guess* is that it confirms the existence of a Trace Cache in the M5, as suggested by patents.
(Trace cache, NOT micro-op cache. This is not like what either Intel or ARM have shipped before.)
 
Snapdragon X Elite had no efficiency cores, and it had pretty good battery life.

super ~ Intel P
performance ~ Intel E
efficiency ~ Intel LE
That doesn't sound right. Intel E and LE are the same uarch, but tweaked differently.

In Apple's case, the performance cores and efficiency cores are seperate uarches, if the rumours are to be believed.

A better analogy would be comparing to ARM cores.

Super ~ C1 Ultra
Performance ~ C1 Premium
Efficiency ~ C1 Pro
 
So here's a theory:
When Intel introduced their P vs E cores, they confused the terminology.
An Intel P core is, like an Apple P core, optimized for performance.
But an Intel E-core is optimized for AREA, not for ENERGY.

The Apple numbers are that at Apple E-core uses about 1/10th the energy to execute the same work at 3x the time it would take a P-core. (Lot's of if's and but's there, but that's the ball park.)
An Intel E-core is about .5 to .65 the performance of an Intel P-core (compare vs Apple's .35) I don't have energy comparison numbers.

Apple may have concluded based on experience and seeing how M silicon is used in actual professional systems, that an Intel type solution is a better deal, at least for systems where battery life is irrelevant or less important?
So what we have now is maybe something that's a lot more like Intel recent designs like Lunar Lake (4P, 4E) or Arrow Lake (at the high end, 8P+16E). Not as power efficient as the old school Apple P+E, but a substantial jump performance-wise if you have lots of active threads (eg for the all-important Cinebench CPU workload...)
In this lingo, then, we have something like
super ~ Intel P
performance ~ Intel E
efficiency ~ Intel LE

Maybe after seeing how these all play out, the future will give us even more Intel-like designs, with say 2 or 4 efficiency/LE cores added to the die for background work?
Or maybe it goes the other way? Maybe Apple have concluded they now have such good control over DVFS and powering down the unused parts of a core that optimizing for area rather than energy works fine, just run your "performance" chip at 1GHz instead of 3.5 GHz and you get E-level efficiency?

Of course the other weird element is this claim about Fusion Architecture and I got nuttin'! You know my opinion (shared with Jensen) that chiplets do not buy you anything significant in terms of yield, and cost energy and performance to cross die's. But they do give you optionality. So presumably the optionality now matters to Apple. Maybe with the new studio things will be clearer? Maybe there's a feeling that the AI market is hot enough that substantial numbers of Ultra equivalents will be sold, but that market wants to pay for say 3x the GPU of a Max but only 1x the CPU, and "Fusion Architecture" is a better way to deliver that?
I think you're reading way too much into it. Apple has P and E cores that are different for their A and M series. Apple is steadily blending the use of A and M series chips in the iPad and now Mac. Non-technical users are going to compare the P and E core count in the MacBook Neo with the P and E core counts in the MacBook Pro (base) and wonder why they should pay so much for the MBP. You and I know that the M series P cores are quite a bit better than the A series, but nontechnical users won't.

And if M series E cores are on par with A series P cores, regardless of architecture and clock and all that, then this is a somewhat reasonable marketing change. Additionally, it could signal that Apple is looking to use the same architecture for those and simply clock the M series efficiency ones down, etc.

I don't think you can glean any technical conclusions from this. This looks entirely like marketing and product segmentation to me.

I think more significantly, Apple has identified that the market for high core compute users is very small - something I've been noting for a while. I doesn't really make sense to put your Max value in more P cores, so they aren't. We keep the memory bandwidth benefits, which is very appreciated, and the die space goes to GPU which is probably the bigger market since I'm guessing the high end of the MBP market is video editors rather than developers doing on-device compiles - YouTubers flying someplace interesting, shooting a video and then starting to edit it in the hotel room and flight home so they can push it in a timely manner.
 
If the 12 performance cores are in a single cluster, that's surprising.
What's the practical limit on cluster size? As I understand it, ARM DynamIQ supports up to 14 cores in a cluster.
Obviously it depends how much is shared and what's required.
For example if only (on average) one in 20 cycles requires a core to access L2, then you're probably OK in that (usually) one core will not be blocking another core's L2 access.

One thing this 6+12 gives us is presumably two full sized SME units (though one is clocked slightly lower).
6+6+6 would give us 3 SME units. Making those units E-style (1/4 sized, or 1/2 sized) might be too much of a regression from the M4.

OTOH maybe Apple would like 3 full-sized SME units, if it turns out they're getting light surprisingly often by some AI code?

SME units are large: This M4 die shot suggests about the size of a P-core. But, with a chiplet, they have more area to play with than before.

So ???
 
Maybe there's a feeling that the AI market is hot enough that substantial numbers of Ultra equivalents will be sold, but that market wants to pay for say 3x the GPU of a Max but only 1x the CPU, and "Fusion Architecture" is a better way to deliver that?
I think most of the Ultra's P cores were idle. People were buying memory bandwidth and GPU. Apple signaled their intentions with the Pro/Max having the same CPU core count, but scaling memory bandwidth and GPU. Now, without unified memory, that's a different calculation because there's a real tradeoff between DaVinci Resolve doing compute on the CPU or shipping it to the GPU and back on an x86, but there isn't on a Mac - you are always better off pushing that to GPU and NPU when they can do that compute. So the CPU cores are less and less critical when software is able to freely choose where to run their compute. Those photoshop filters that leaned so hard on your SIMD units are now doing generative AI fills and the like and they don't want to be on the CPU at all.

The same argument will apply to the Studio (maybe even moreso). How much of the rendering pipeline in FCP (understand that to Apple the Mac Studio is a dedicated FCP machine) benefits from more CPU cores, and how much benefits from memory bandwidth and GPU cores. I'm guessing the latter is what matters, and what's more, Apple being the developer of FCP can maximize that benefit. So yeah, I wouldn't be the least bit surprised if the Ultra came out with the same 6 Super cores, same P cores count, but 1.2TB/s of memory bandwidth and 80 GPU cores. For a video editor, you have a machine that has main memory bandwidth which is closer to the x86 competitions L3 cache speed than it is to their main memory speed. You will be able to scrub the timeline of a 2TB video file with zero lag, and that's why people buy Mac Studios.

The Mac Pro becomes an even more awkward product.
 
Back
Top