Discussion Apple Silicon SoC thread

Page 409 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Eug

Lifer
Mar 11, 2000
24,120
1,766
126
M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:

Screen-Shot-2021-10-18-at-1.20.47-PM.jpg

M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:


M1 Ultra discussion here:


M2 discussion here:


Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:


M4 Family discussion here:

 
Last edited:

MS_AT

Senior member
Jul 15, 2024
868
1,762
96
M4 Max running macOS running Parallels running Windows on ARM is the fastest Windows laptop in the world.
Well, there are some caveats. It is almost double the price of the Windows tablet. Then you need to factor in license cost for both Windows and Parallels. Plus while portable this tablet is probably not the fastest Windows x64 laptop;)

But anyway back in the beginning of the year I proposed to our IT department to evaluate M4 Macbook as replacement for developer machines, using Parallels + Windows, hoping it would be faster than native Meteor Lake for code compilation. The feedback I got after a month was that they were not able to make our toolsets to run. Python was crashing and in general the whole VM felt unstable.

And so my dreams of experiencing the power of Macbook for development on Windows came to an end...

(No I have not tried to debug it myself, I don't have any more details to share as I don't know much more. And before anyone suggest dropping Windows we simply cannot do so).

Others are probably more fortunate, but geekbench does not tell the whole story;)
 

gdansk

Diamond Member
Feb 8, 2011
4,568
7,681
136
geekbench does not tell the whole story
You're right. The acolytes do exaggerate it but the laptop competition is a bit weak at present. Apple is basically as far ahead of all the ARM vendors too. N1X nowhere to be seen and from early benchmarks it is in the same league as Strix Halo. Qualcomm has been silent but maybe they're the best shot.
 
Last edited:

jpiniero

Lifer
Oct 1, 2010
16,799
7,249
136
This is supposed to be a big camera upgrade cycle with a periscope telephoto. I think night vision is unlikely, if only because it becomes a privacy nightmare for Apple.

Actually on reading more, I agree that it's more like a thermal camera, like the FLIR ones. Course that could be creepy...
 

poke01

Diamond Member
Mar 8, 2022
4,196
5,543
106
Well, there are some caveats. It is almost double the price of the Windows tablet. Then you need to factor in license cost for both Windows and Parallels. Plus while portable this tablet is probably not the fastest Windows x64 laptop;)

But anyway back in the beginning of the year I proposed to our IT department to evaluate M4 Macbook as replacement for developer machines, using Parallels + Windows, hoping it would be faster than native Meteor Lake for code compilation. The feedback I got after a month was that they were not able to make our toolsets to run. Python was crashing and in general the whole VM felt unstable.

And so my dreams of experiencing the power of Macbook for development on Windows came to an end...

(No I have not tried to debug it myself, I don't have any more details to share as I don't know much more. And before anyone suggest dropping Windows we simply cannot do so).

Others are probably more fortunate, but geekbench does not tell the whole story;)
The problem is using parallels and not QEMU based VM like UTM.

Best of all no stupid subscription.
 

poke01

Diamond Member
Mar 8, 2022
4,196
5,543
106
You're right. The acolytes to exaggerate it but the laptop competition is a bit weak at present. Apple is basically as far ahead of all the ARM vendors too. N1X nowhere to be seen and from early benchmarks it is in the same league as Strix Halo. Qualcomm has been silent but maybe they're the best shot.
Qualcomm will likely show their product in late September.
 

dr1337

Senior member
May 25, 2020
523
807
136
I hope not. After googles event, I’m put off by live cringe events
Before 2020, apples live events were still great, and most of the companies leadership is still the same currently.

Google is their own company and they generally doesn't care about image. Otherwise they wouldn't have restructured and removed "don't be evil" as their motto. Because that was genuinely pretty crazy and scary.
 

Doug S

Diamond Member
Feb 8, 2020
3,572
6,305
136
Actually on reading more, I agree that it's more like a thermal camera, like the FLIR ones. Course that could be creepy...

Seems unlikely they could keep secret a development like that - especially at the massive production scales that would be required even if it was Pro Max only. The only way I could see that without leaks happening far in advance would be if Apple was building them itself in the Maxim fab they bought years ago.

While FLIR would be cool most of the use cases for it unless you work in construction or something are one time. FLIR cameras are something you rent, not own, and if you want to own one they've made attachments for iOS/Android phones for years.

FLIR cameras don't have the same "creepy" concerns as IR cameras that can (sorta) see through clothes. If it was an issue they could have the "person recognition" features it already has use a visible light camera alongside the FLIR so it "censors" FLIR data for people.
 
Jul 27, 2020
27,978
19,117
146
And so my dreams of experiencing the power of Macbook for development on Windows came to an end...
Real world is quite different from the world most Mac aficionados live in.

Oh well. Life goes on while Apple successfully keeps their RDF up and going strong to rake in billions.

I would definitely make Apple hardware work for my purposes if I had to but I would still probably not pay more than $2000 for it hence my applications would be limited to max 32GB RAM I guess whereas I could get almost 128GB RAM by paying just a bit more for Strix Halo.

Oh well. Maybe some day when I'm not alive anymore.

Don't worry guys if you feel a "presence" while working on your affordable Macs in the future. It could just be me hovering behind your shoulder and being fascinated :p
 

poke01

Diamond Member
Mar 8, 2022
4,196
5,543
106
Wake me up when M4 Max 36GB laptop becomes available for $2500 and Windows on ARM achieves 90% Steam library compatibility.
Mx Max chips go on sale in Australia whenever a new M line chip comes out. I think last time there was a M3 Max 48GB 1TB 16" macbook for ~$2000USD.

Probably the USA and even UAE may have similar deals. (forget Europe, its not happening there for Apple stuff lol)
 
Last edited:
Jul 27, 2020
27,978
19,117
146
even UAE may have simiar deals.
Nah. Apple and Nvidia are synonymous with price gouging in UAE. If you are getting a deal on these especially on new hardware, be very, very suspicious. Most people and businesses with hardware from these brands will sit on it for years rather than sell it below market prices.

One guy even rudely sold his M1 laptop to someone else for $25 more when I had already made a deal with him to come pick up the laptop in the evening. I told him that I had no issue paying $25 more if he really wanted more but I think the issue was that he was a mommy's boy and my voice scared him.

I had a similar issue with a Nintendo 3DS used purchase where the guy kept promising to meet me and then would come up with excuses so I asked a friend with the same nationality as the seller to go with me to him. He hardly spoke to me and my friend closed the deal. I think same issue. Mommy's boy.
 
  • Wow
Reactions: Tlh97 and poke01

name99

Senior member
Sep 11, 2010
653
545
136
Not even close. This is primarily about saving power.
From my PDF vol 8:

Suppose you have a scalar CPU that's calculating a long dot product. Essentially the code looks like a loop of C=A*B+C, perhaps with A and B cycling through register 0, 1, 2, 3, ... and some other code sequentially loading these registers 0, 1, 2, 3.
So a consequence of this design is that we're continually reading register C from the register file, adding it into the multiply, then storing C back in the register file. That's not ideal! Now in any decent modern CPU, we'll in fact pull C off the bypass bus as it is being written back to the register file, so we can at least avoid the latency of re-reading from the register file. (It's still a shame we keep writing back the intermediate C result which is immediately overwritten, but that's a problem for later).

Switch to the GPU, and imagine we're similarly calculating a whole lot of long dot products in parallel. We'd have a similar code structure of C=A*B+C, only with A and B each 32 elements long and being pointwise multiplied, then accumulated into a 32-wide register C, to calculate 32 dot products in parallel.
Once again we have a situation where we waste time and energy moving the register C back and forth between the execution unit and the operand cache. Can we avoid this?
Suppose we provided a single register of storage attached to each lane of each execution unit (maybe the FP16, FP32 and INT pipelines all have their own separate storage per lane, maybe they share a common 32bit storage, that's a detail). Now we can define a new instruction that looks something like
local=A*B+local, and avoid all that back and forth movement of C.

This is essentially what (2024) https://patents.google.com/patent/US20250103292A1 Matrix Multiplier Caching is about.
The more common task we care about is matrix multiplication, not exactly dot products, and you should recall that we the Apple GPU provides a matrix multiply instruction that handles the multiplication of something like two side by side 4*4 submatrices with a 4*4 submatrix to give two 4*4 results, achieved by packing these matrices in appropriate order into the 32 lanes of A and B registers, and then shuffling the values around as we pass the registers repeatedly through the FMAC execution units.
This is nice if we want to calculate a 4*4 matrix product, but usually we have larger matrices, so we have to repeat this operation. At which point (you can work through the algebra if you like, or can just imagine it in your head as operating like dot products, only with the "basic units" being 4*4 matrices that are multiplied together and accumulated) we are back to our earlier dot product explanation.
As long as we have one register of local storage available to each lane, we can keep accumulating our matrix multiplication while avoiding the latency and energy costs of frequent movement of the C values.

Interestingly this also avoids use of the third track of the operand cache into the execution units, which in turn means that we don't have to expand the bandwidth of the operand cache too much to get interesting consequences. Suppose the operand cache today can supply three operands per cycle, and suppose we boosted it to supply four operands per cycle. This would in turn allow us to supply two operands to each of two pipelines, meaning that, for example, we could now run two large matrix multiplies in parallel without as much expansion of the hardware as earlier seemed necessary... Of course this still requires two of the appropriate execution pipelines, but it mostly solves what looked like the hardest part of the problem of making GPU execution wider.
 

Doug S

Diamond Member
Feb 8, 2020
3,572
6,305
136
So I've seen a much better explanation for the event invite than "Apple is adding FLIR". That it has to do with the vapor chamber cooling - probably they'll be showing off some nifty FLIR photos of 16 vs 17 to demonstrate the difference it makes in reducing hot spots.

Now I suppose there's a 1% chance they pull a Steve after showing them off and say "by the way, those images were prepared using the new FLIR capability in the iPhone 17 Pro Max" to thunderous applause. I just don't think FLIR is desired by nearly enough people to justify making it a standard part of a mass market phone.
 
  • Like
Reactions: Tlh97 and Mopetar

Eug

Lifer
Mar 11, 2000
24,120
1,766
126
DigiTimes says Apple has locked down nearly half of TSMC’s initial N2 production. In 2025 that will be 45000-50000 wafers per month (or >20000 per month for Apple), at $30000 per wafer. Qualcomm is the #2 volume customer. N2 output is projected to double in 2026.


That’s a lot of money!
 
  • Like
Reactions: Tlh97 and Mopetar

name99

Senior member
Sep 11, 2010
653
545
136
So I've seen a much better explanation for the event invite than "Apple is adding FLIR". That it has to do with the vapor chamber cooling - probably they'll be showing off some nifty FLIR photos of 16 vs 17 to demonstrate the difference it makes in reducing hot spots.

Now I suppose there's a 1% chance they pull a Steve after showing them off and say "by the way, those images were prepared using the new FLIR capability in the iPhone 17 Pro Max" to thunderous applause. I just don't think FLIR is desired by nearly enough people to justify making it a standard part of a mass market phone.
Is a vapor chamber a slam dunk?
If we consider the generic concept of using a phase change to absorb heat, you obviously also have solid-liquid transitions, but also even solid-solid transitions.
While (usually, but not always) liquid to gas has higher latent heat than solid to liquid or solid to solid, it also comes with the hassles of (again usually) a substantial change in volume.

I know nothing about how this space is playing out in other phones or even on PCs, but I could imagine Apple prioritizing other issues than raw "amount of heat that can be absorbed" leading to their use of a phase change material but not specifically a vapor chamber.