Discussion Apple Silicon SoC thread

Page 253 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Eug

Lifer
Mar 11, 2000
23,825
1,396
126
M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:

Screen-Shot-2021-10-18-at-1.20.47-PM.jpg

M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:


M1 Ultra discussion here:


M2 discussion here:


Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:


M4 Family discussion here:

 
Last edited:

gdansk

Platinum Member
Feb 8, 2011
2,962
4,493
136
There's not much to talk about Apple cores. IPC largely stagnant since 2020 but clock speed increases with improved nodes.
Apple also never talks at ISSCC or Hot Chips so again nothing to talk about except for unsure measurements and best guesses. And no one expects the company in the lead to make a big jump again especially after they lost engineers to Nuvia, Qualcomm, etc.
 
Last edited:

MarkizSchnitzel

Senior member
Nov 10, 2013
447
69
91
Not really, You should maybe study economics before proclaiming this. Intel is still the #1 chip maker for a reason.

Uh, just because you own an Android phone...

EDIT: even with my 14th gen iPhone, I do not have to charge every day, and my phone is used 8+ hours a day. With that usage is lasts nearly 2 days without needing a recharge.
That is a sample of 1. There is no magic there. In standardized battery tests it's not all that special, even while having a 60Hz screen (the basic one).
 

Mopetar

Diamond Member
Jan 31, 2011
8,113
6,768
136
Performance uplift is something that needs further qualification. If you achieve an amazing 30% IPC uplift, but have a 20% clock regression it's nowhere near as impressive. The same is true if AMD has that 40% improvement, but it's only achieved by throwing so much power at the chip that even Intel would blush.

Apple could probably get a 40% uplift if they had a new, better design on a new, better node and they gave it a bigger power budget. Whether all of those stars align is another matter, but I wouldn't want to see them sacrifice on their power efficiency just to get to a 40% number when I'd be more than happy with a 15% gain if they achieved it with even less power than they use now.
 
  • Like
Reactions: SpudLobby

junjie1475

Junior Member
Apr 9, 2024
17
53
51
For Apple or for Zen 5? :p
For Apple is impossible, for Zen5 I wouldn't say 100% impossible, since their baseline is not that high, and by playing some tricks on the "standard" of IPC. If you look at the research papers in recent years(10 years or so), you will find that they are all focused on tweaking the current architecture by a few % or so. We should focus more on the absolute performance uplift rather than the percentage, the growing curve is no longer exponential... Just by looking at today's processor, the main components i.e. OoO execution, speculative execution, cache-prefetching...etc are all over 20+ years old. so that's why people expand their performance focus to heterogeneous computing and multi-core architecture. Anyway, seems I talk too much stuff that's out of the scope of this reply. :p
 
  • Like
Reactions: Tlh97 and SpudLobby

junjie1475

Junior Member
Apr 9, 2024
17
53
51
So is Apple going to stagnate in IPC improvements?
Yes, just like everyone else.

Do they need a massive core redesign to progress further? It seems IPC improvements have been single digits for Apple lately.
I think not, they will keep tweaking the current architecture(e.g. enhancing the register file design, better utilization of the functional unit, increase the size of various structures). Ultimately the answer depends on what you mean by "massive core redesign", for instance, does a redesigned branch prediction unit, or new cache prefetching algorithm count? or completely restructured frontend/backend?
 
  • Like
Reactions: Tlh97 and poke01

Nothingness

Diamond Member
Jul 3, 2013
3,089
2,083
136
Yes, just like everyone else.
You just wrote that it wasn't 100% impossible for Zen5 to improve by 40-50%. That seems contradictory, doesn't it? :)

I think not, they will keep tweaking the current architecture(e.g. enhancing the register file design, better utilization of the functional unit, increase the size of various structures). Ultimately the answer depends on what you mean by "massive core redesign", for instance, does a redesigned branch prediction unit, or new cache prefetching algorithm count? or completely restructured frontend/backend?
I guess AMD will do the same, or they'd rename Zen to something else. That's why I doubt we'll see a 40-50% IPC improvement (though such a large performance improvement could be possible for selected benchmarks if all AVX units are widened to 512-bit but the cost would be large and it seems the current AVX-512 implementation is already doing very well).
 

junjie1475

Junior Member
Apr 9, 2024
17
53
51
You just wrote that it wasn't 100% impossible for Zen5 to improve by 40-50%. That seems contradictory, doesn't it? :)
Here I meant that Apple is facing what others will face once they reach a certain point in the future. Also, IPC depends on pipeline depth, so this is more like a trade-off. When you have a deeper pipeline it's usually easier to achieve higher frequency and thus higher performance, but the problem is that you can't just keep extending the pipeline. Two things play here:
1) Power consumption
2) Access latency of buffer/queue structures
That's one of the reasons why Apple has such a massive u-arch and hence very high IPC.
Saying so it is not good to directly compare the IPC numbers.
 

Nothingness

Diamond Member
Jul 3, 2013
3,089
2,083
136
Here I meant that Apple is facing what others will face once they reach a certain point in the future. Also, IPC depends on pipeline depth, so this is more like a trade-off. When you have a deeper pipeline it's usually easier to achieve higher frequency and thus higher performance, but the problem is that you can't just keep extending the pipeline. Two things play here:
1) Power consumption
2) Access latency of buffer/queue structures
There are more things that get impacted by pipeline depth. As far as performance goes the main impact is data forward latency and branch misprediction cost. So increasing the pipeline depth should, as you say, be considered with a lot of care.

That's one of the reasons why Apple has such a massive u-arch and hence very high IPC.
Saying so it is not good to directly compare the IPC numbers.
You're talking about "real" IPC. People here mostly think of performance per clock. And no matter what, AMD won't get an IPC increase of 40-50% (that's why I talked about "performance improvement" about AVX in my previous message, which might be achievable).

To sum up, I don't believe a second we'll see 40-50% IPC increase for AMD. And even for performance improvement, my understanding is that people in the know said it was for some benchmarks. And if AMD wants large increases in IPC across many benchmarks they'll have to reduce frequency (that's what Apple did: trade off frequency for IPC). CPU design is an art of balance :)
 

FlameTail

Diamond Member
Dec 15, 2021
3,950
2,376
106
And if AMD wants large increases in IPC across many benchmarks they'll have to reduce frequency (that's what Apple did: trade off frequency for IPC). CPU design is an art of balance :)
That's what is concerning about Apple. They have been largely relying on clock increases to improve performance. Meaning that it makes it even harder for them to increase IPC.
 

Nothingness

Diamond Member
Jul 3, 2013
3,089
2,083
136
That's what is concerning about Apple. They have been largely relying on clock increases to improve performance. Meaning that it makes it even harder for them to increase IPC.
And to increase frequency I guess they had to rely on process improvements. They surely can still gain some IPC, but they've long picked the low hanging fruits, so the gains should be small. Unless they started a new core design more or less from scratch.
 

junjie1475

Junior Member
Apr 9, 2024
17
53
51
So they didn't increase the pipe stage length?
View attachment 96758
Both although not a way to find out the actual pipeline length with software methods. People mostly use branch prediction penalties to estimate the pipeline depth but in recent days processors have heavily optimized the latency of the branch instructions(by passing for example); In addition, there is really no way to tell the backend length by looking at the numbers. Also getting an accurate branch penalty cycle is very difficult because even on a random pattern branch predictor can be correct too.
 
  • Like
Reactions: Nothingness