Discussion Apple Silicon SoC thread

Page 255 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Eug

Lifer
Mar 11, 2000
23,730
1,263
126
M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:

Screen-Shot-2021-10-18-at-1.20.47-PM.jpg

M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:


M1 Ultra discussion here:


M2 discussion here:


Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:


M4 Family discussion here:

 
Last edited:

junjie1475

Junior Member
Apr 9, 2024
17
52
51
So is Apple going to stagnate in IPC improvements?
Yes, just like everyone else.

Do they need a massive core redesign to progress further? It seems IPC improvements have been single digits for Apple lately.
I think not, they will keep tweaking the current architecture(e.g. enhancing the register file design, better utilization of the functional unit, increase the size of various structures). Ultimately the answer depends on what you mean by "massive core redesign", for instance, does a redesigned branch prediction unit, or new cache prefetching algorithm count? or completely restructured frontend/backend?
 
  • Like
Reactions: Tlh97 and poke01

Nothingness

Platinum Member
Jul 3, 2013
2,695
1,290
136
Yes, just like everyone else.
You just wrote that it wasn't 100% impossible for Zen5 to improve by 40-50%. That seems contradictory, doesn't it? :)

I think not, they will keep tweaking the current architecture(e.g. enhancing the register file design, better utilization of the functional unit, increase the size of various structures). Ultimately the answer depends on what you mean by "massive core redesign", for instance, does a redesigned branch prediction unit, or new cache prefetching algorithm count? or completely restructured frontend/backend?
I guess AMD will do the same, or they'd rename Zen to something else. That's why I doubt we'll see a 40-50% IPC improvement (though such a large performance improvement could be possible for selected benchmarks if all AVX units are widened to 512-bit but the cost would be large and it seems the current AVX-512 implementation is already doing very well).
 

junjie1475

Junior Member
Apr 9, 2024
17
52
51
You just wrote that it wasn't 100% impossible for Zen5 to improve by 40-50%. That seems contradictory, doesn't it? :)
Here I meant that Apple is facing what others will face once they reach a certain point in the future. Also, IPC depends on pipeline depth, so this is more like a trade-off. When you have a deeper pipeline it's usually easier to achieve higher frequency and thus higher performance, but the problem is that you can't just keep extending the pipeline. Two things play here:
1) Power consumption
2) Access latency of buffer/queue structures
That's one of the reasons why Apple has such a massive u-arch and hence very high IPC.
Saying so it is not good to directly compare the IPC numbers.
 

Nothingness

Platinum Member
Jul 3, 2013
2,695
1,290
136
Here I meant that Apple is facing what others will face once they reach a certain point in the future. Also, IPC depends on pipeline depth, so this is more like a trade-off. When you have a deeper pipeline it's usually easier to achieve higher frequency and thus higher performance, but the problem is that you can't just keep extending the pipeline. Two things play here:
1) Power consumption
2) Access latency of buffer/queue structures
There are more things that get impacted by pipeline depth. As far as performance goes the main impact is data forward latency and branch misprediction cost. So increasing the pipeline depth should, as you say, be considered with a lot of care.

That's one of the reasons why Apple has such a massive u-arch and hence very high IPC.
Saying so it is not good to directly compare the IPC numbers.
You're talking about "real" IPC. People here mostly think of performance per clock. And no matter what, AMD won't get an IPC increase of 40-50% (that's why I talked about "performance improvement" about AVX in my previous message, which might be achievable).

To sum up, I don't believe a second we'll see 40-50% IPC increase for AMD. And even for performance improvement, my understanding is that people in the know said it was for some benchmarks. And if AMD wants large increases in IPC across many benchmarks they'll have to reduce frequency (that's what Apple did: trade off frequency for IPC). CPU design is an art of balance :)
 

FlameTail

Diamond Member
Dec 15, 2021
3,076
1,757
106
And if AMD wants large increases in IPC across many benchmarks they'll have to reduce frequency (that's what Apple did: trade off frequency for IPC). CPU design is an art of balance :)
That's what is concerning about Apple. They have been largely relying on clock increases to improve performance. Meaning that it makes it even harder for them to increase IPC.
 

Nothingness

Platinum Member
Jul 3, 2013
2,695
1,290
136
That's what is concerning about Apple. They have been largely relying on clock increases to improve performance. Meaning that it makes it even harder for them to increase IPC.
And to increase frequency I guess they had to rely on process improvements. They surely can still gain some IPC, but they've long picked the low hanging fruits, so the gains should be small. Unless they started a new core design more or less from scratch.
 

junjie1475

Junior Member
Apr 9, 2024
17
52
51
So they didn't increase the pipe stage length?
View attachment 96758
Both although not a way to find out the actual pipeline length with software methods. People mostly use branch prediction penalties to estimate the pipeline depth but in recent days processors have heavily optimized the latency of the branch instructions(by passing for example); In addition, there is really no way to tell the backend length by looking at the numbers. Also getting an accurate branch penalty cycle is very difficult because even on a random pattern branch predictor can be correct too.
 
  • Like
Reactions: Nothingness

Nothingness

Platinum Member
Jul 3, 2013
2,695
1,290
136
Both although not a way to find out the actual pipeline length with software methods. People mostly use branch prediction penalties to estimate the pipeline depth but in recent days processors have heavily optimized the latency of the branch instructions(by passing for example); In addition, there is really no way to tell the backend length by looking at the numbers. Also getting an accurate branch penalty cycle is very difficult because even on a random pattern branch predictor can be correct too.
Oh yes. Branch predictors have become incredibly complex. I know several engineers working on bpred in my company; they are very bright and come with crazy ideas to remove every source of misprediction. There's still a lot of active work in that area and for good reasons: if you don't get instructions in time, you can't do anything at all with all your ALU and data prefetchers :)
 
  • Like
Reactions: igor_kavinski

junjie1475

Junior Member
Apr 9, 2024
17
52
51
but note some schedulers now dispatch to several units).
The reason why I put the dotted line in the scheduler blocks is that I don't have enough time to set up all the test patterns and figure out each scheduler’s entry count(only 2 days). But Maynard Handley pointed out that two schedulers that are side by side can share instructions in certain cases for better load balancing(this has been known since m1).
 

SpudLobby

Senior member
May 18, 2022
957
651
106
Performance uplift is something that needs further qualification. If you achieve an amazing 30% IPC uplift, but have a 20% clock regression it's nowhere near as impressive. The same is true if AMD has that 40% improvement, but it's only achieved by throwing so much power at the chip that even Intel would blush.

Apple could probably get a 40% uplift if they had a new, better design on a new, better node and they gave it a bigger power budget. Whether all of those stars align is another matter, but I wouldn't want to see them sacrifice on their power efficiency just to get to a 40% number when I'd be more than happy with a 15% gain if they achieved it with even less power than they use now.

Yep, same view. Or even 10% iso-power to me at this point since they’ve pushed power upwards too much (even if energy efficiency is still good, on phones those peaks are a bit too high).
 
  • Like
Reactions: Mopetar

FlameTail

Diamond Member
Dec 15, 2021
3,076
1,757
106
Bombshell Bloomberg report just dropped.
Apple M4 series to arrive in late 2024.


Gurman says that the entire Mac lineup is slated to get the M4 across late 2024 and early 2025
The iMac, low-end 14-inch MacBook Pro, high-end 14-inch MacBook Pro, 16-inch MacBook Pro, and Mac mini machines will be updated with M4 chips first, followed by the 13-inch and 15-inch MacBook Air models in spring 2025, the Mac Studio in mid-2025, and the Mac Pro later in 2025.
Congrats to those who speculated that M4 family will release ~12 months after M3 family. You were right.
Apple is said to be nearing production of the M4 processor, and it is expected to come in at least three main varieties. Chips are codenamed Donan for the low-end, Brava for the mid-tier, and Hidra for the top-end. The Donan chip will be used in the entry-level MacBook Pro, the ‌MacBook Air‌ machines, and the low-end ‌Mac mini‌, and the Brava chips will be used in the higher-end MacBook Pro and the higher-end ‌Mac mini‌.

The Hidra chip is designed for the ‌Mac Pro‌, which suggests it is an "Ultra" or "Extreme" tier chip. As for the ‌Mac Studio‌, Apple is testing versions with an unreleased M3-era chip and a variation of the M4 Brava processor that would presumably be higher tier than the M4 Pro and M4 Max "Brava"
So,
Donan = M4
Brava = M4 Pro / M4 Max (?)
Hidra = M4 Ultra / M4 Extreme (?)
M4 versions of the Mac desktops could support as much as 512GB Unified Memory, which would be a marked jump over the current 192GB limit
But what bus width? 1024 bit (M4 Ultra) or 2048 bit (M4 Extreme)?
 

Eug

Lifer
Mar 11, 2000
23,730
1,263
126
Congrats to those who speculated that M4 family will release ~12 months after M3 family. You were right.
It's premature to come to that conclusion. Gurman often is correct with release speculation in terms of hardware, but lately he's been repeatedly off with his timing. For example, the OLED iPad Pros have been predicted for several months now and we still don't have them. His new claim is that the iPad Pros are delayed due to OLED manufacturing issues. That may or may not be true but the bottom line is that the timing predictions haven't panned out.

BTW, the other thing I wonder about is the Mac Studio. Are we expecting M3 Ultra this June then? And what about the Mac Pro? Then there's the Mac mini which was never updated with M3 Pro and M3, even though the chips have been out for 6 months. The Mac mini was last updated January of 2023, so I'm thinking the Mac mini may skip the M3 series entirely.

Anyhow, my gut feeling is that the chip cycle will vary, anywhere from 12-18 months or so, based on a number of factors, like how fast TSMC can move. As in there isn't going to rigid yearly cycle, at least not with all the chips (eg. Ultra), but there isn't going to be a rigid 18 month cycle either.
 

mikegg

Golden Member
Jan 30, 2010
1,815
445
136
Bombshell Bloomberg report just dropped.
Apple M4 series to arrive in late 2024.




Congrats to those who speculated that M4 family will release ~12 months after M3 family. You were right.

So,
Donan = M4
Brava = M4 Pro / M4 Max (?)
Hidra = M4 Ultra / M4 Extreme (?)

But what bus width? 1024 bit (M4 Ultra) or 2048 bit (M4 Extreme)?
I was the first to call the 1 year cadence. It was too obvious.

Unfortunately, many people still can't believe it.

2/3 M generations have closely followed the A series by 1 month. The lone year off was a crazy covid year. Suddenly, people think it's 18 months.

All Apple executive public interviews have hinted at yearly updates.

But nope. Not good enough for many posters here.
 

Doug S

Platinum Member
Feb 8, 2020
2,458
4,009
136
People are missing the most interesting detail here, the top memory config of 512 GB.

If that's true it implies several things.

1) Apple is switching from 12Gb DRAMs to 16Gb DRAMs, since you can't make 512 GB with 12Gb parts
2) If they are going to 16Gb DRAMs that likely means LPDDR5X
3) The step up from 192 GB to 512 GB means either they are using denser packages (so the max memory possible for all models goes up) or M4 finally brings the fabled "Extreme"

If they are going LPDDR5X with Apple Silicon I'll bet that's one of the differences between A18 and A18 Pro, with the regular A18 sticking with LPDDR5 for cost reasons. Which would mean for the very first time that the Pro iPhones will be faster. Probably not a whole lot, but measurable at least in tasks requiring a lot of memory bandwidth (unless there is also a clock rate difference, cache size difference, etc. as well)
 

FlameTail

Diamond Member
Dec 15, 2021
3,076
1,757
106
1) Apple is switching from 12Gb DRAMs to 16Gb DRAMs, since you can't make 512 GB with 12Gb parts
Doesn't M3 Max already do that? It has 128 GB Max RAM.
3) The step up from 192 GB to 512 GB means either they are using denser packages (so the max memory possible for all models goes up) or M4 finally brings the fabled "Extreme"
Aren't they already using the densest packages possible with LPDDR5, for M3 Max?
 

SpudLobby

Senior member
May 18, 2022
957
651
106
People are missing the most interesting detail here, the top memory config of 512 GB.

If that's true it implies several things.

1) Apple is switching from 12Gb DRAMs to 16Gb DRAMs, since you can't make 512 GB with 12Gb parts
2) If they are going to 16Gb DRAMs that likely means LPDDR5X
3) The step up from 192 GB to 512 GB means either they are using denser packages (so the max memory possible for all models goes up) or M4 finally brings the fabled "Extreme"

If they are going LPDDR5X with Apple Silicon I'll bet that's one of the differences between A18 and A18 Pro, with the regular A18 sticking with LPDDR5 for cost reasons. Which would mean for the very first time that the Pro iPhones will be faster.
They did this with the A16 & LPDDR5 6400 vs the A15 & LPDDR4x 4200 with the iPhone 14 Pro and iPhone 14 respectively — Pros had 50% more memory bandwidth.

This would be the first time they have the same number branded and the base models get a “new” SoC and the Pros a Pro version with different RAM though yeah.

Probably not a whole lot, but measurable at least in tasks requiring a lot of memory bandwidth (unless there is also a clock rate difference, cache size difference, etc. as well)
 

Doug S

Platinum Member
Feb 8, 2020
2,458
4,009
136
iPhone 16 Pro/16 Pro Max = A18 Pro
iPhone 16/16 Plus = A18 Bionic

This makes perfect sense.

I guess at some point Apple has decided the cost of doing two iPhone SoCs per year is made up for by the savings on the lower end one - using cheaper LPDDR, a cheaper process, and/or a smaller die.

I don't think they will do it the same way every year. This year it seems it makes sense to make two different designs on N3E, so they can't save money on a cheaper process. If they both use the same LPDDR (or the controllers can be made compatible with both, not sure about the compatibility of 5/5X on the controller side, anyone know?) maybe it is the same die but some cores are disabled on the cheaper one. Maybe they bin for slightly higher clock speeds on the more expensive one. We don't really know yet.

When they hit the N2 generation (which I think will happen in 2026 not next year like I see analysts claiming) I predict the non-Pro version will remain on the N3 family, because there will be a cost step up for N2. The following year maybe they bring LPDDR6 for the Pro and Apple Silicon versions and the non-Pro iPhone sticks with 5X. I know LPDDR6's biggest fan @Tigerick will claim Apple is going to adopt it before 2027 but I'll go on record as betting heavily against that lol

Apple is blazing some new territory here with splitting the iPhone SoCs, it will be interesting to see that plays out. My hunch is that they may want to avoid inflationary pressure to raise prices on the base iPhone so reducing SoC/DRAM cost a bit will help them keep its price steady. The fact their iPhone sales mix goes more Pro every year tells me they have some pricing freedom on the high end, so keeping the base model price the same gives them cover for adding more features/cost on the higher models.