Discussion Apple Silicon SoC thread

Page 47 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Eug

Lifer
Mar 11, 2000
23,598
1,015
126
M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:

Screen-Shot-2021-10-18-at-1.20.47-PM.jpg

M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:


M1 Ultra discussion here:


M2 discussion here:


Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:

 
Last edited:

Doug S

Platinum Member
Feb 8, 2020
2,280
3,538
136
Better place to ask this here I suppose - has anyone mamaged to bench mark the NPU portion of the M1?

Apple put a lot of area and resources into upping its performance for A14 & M1, so they presumably have a considerable end goal in mind.

Apple provided the only information that matters - it was 5 trillion operations per second, now it is 11 trillion operations per second. Keep in mind that NPUs are not like CPUs, they are very simple so that TOPS figure is all you really need. All its abilities come from whatever software is run on it, you can't really get "uarch improvements" like you can with a CPU.

So far all we know about what the NPU's goal is is to provide "AI" stuff like Siri without using the cloud for improved privacy. That can't be all of it - the original NPU was .5 TOPS so it has already scaled by 22x in the few short years since Apple added it, so we'll have to wait and see what else they do with it.

Maybe they are assuming developers will come up with some uses for it too, hopefully having that as a baseline that will be present in every Mac at that level or higher performance will lead to some interesting stuff down the road.
 

Qwertilot

Golden Member
Nov 28, 2013
1,604
257
126
Maybe they are assuming developers will come up with some uses for it too, hopefully having that as a baseline that will be present in every Mac at that level or higher performance will lead to some interesting stuff down the road.

Thanks - useful.

I guess I was wondering a bit how it does Vs say a GPU/CPU combination at running - or even training - a neural net.

It obviously won't compete with A100 :) Shouldn't the tight integration between the elements of the SoC logically help out a good bit?

An M1 based laptop might even be a good machine for testing ideas out on before deploying into a vast workstation/cloud etc.
 

name99

Senior member
Sep 11, 2010
404
303
136
Apple provided the only information that matters - it was 5 trillion operations per second, now it is 11 trillion operations per second. Keep in mind that NPUs are not like CPUs, they are very simple so that TOPS figure is all you really need. All its abilities come from whatever software is run on it, you can't really get "uarch improvements" like you can with a CPU.

This (no "uarch improvements") is far from true. Improvements in this vein include
- special casing 0s and +-1s in the MAC's
- support for narrower than 8-bit weights
- exactly how you structure your "cache" system (ie how often you have to reload weights, and how computed can flow back into the system for recurrent networks)
- (And of course the wins from the infamous UMA that we've already heard so much about -- misunderstood, dismissed, flat out lied about, from the usual crowd)

At some point we are also going to see a recapitulation of GPU evolution. This will likely include branches and synchronization primitives, if they aren't there already.
 

Mopetar

Diamond Member
Jan 31, 2011
7,862
6,051
136
As for what Intel chips they'll use, presumably the latest ones assuming Apple can get them in sufficient quantity. They obviously have nothing to fear from performance testing going head to head against Intel's best.

I honestly wish they'd just use AMD chips to bridge the gap. I might be interested in buying a 16-core iMac with a Zen 3 in it, but why would I want to buy another x86 Mac if it's not even using the best x86 chips or giving me a good reason (i.e., tons of cores) to take it over an SoC that can beat the Intel parts in a lot of tasks due to dedicated hardware?

Unless they still have some contract with Intel to be the sole supplier of Apple's x86 chips, I don't see any reason to keep them around.
 

name99

Senior member
Sep 11, 2010
404
303
136
Thanks - useful.

I guess I was wondering a bit how it does Vs say a GPU/CPU combination at running - or even training - a neural net.

It obviously won't compete with A100 :) Shouldn't the tight integration between the elements of the SoC logically help out a good bit?

An M1 based laptop might even be a good machine for testing ideas out on before deploying into a vast workstation/cloud etc.

Apple is comfortable with training on their devices right now...
Sure it's not at nV hundreds of watts level - yet...

It is likely that at least some of the training is being done using AMX on the CPU, and this being Apple and the design being what it is, likely that training is making use of "accelerator level parallelism", ie perform some of the operations on the AMX part of the CPU while other training operations occur simultaneously on the NPU. Once again uniform address space, coherency across all devices, powerful sync/atomic primitives, and large SLC mean this should be both fairly easy to program (ie compared to doing the same thing on other platforms) and reasonably performant (perhaps no more than one to two hundred cycles from making a change on the CPU to having it appear on the NPU or vice versa?)

As for what's the point? The big win at the end of all this is language.
Ever better translation. Better writing assistants (grammar, spelling). Semantic search (ie "search what I mean, not the literal words".
Basically a PC that ever years moves from being a really dumb secretary to an ever smarter secretary.

Of course this is all something of a gamble. No-one is sure the extent to which the existing techniques (adequate, not great) can be improved. No-one is sure that HW optimized for existing techniques will be a good fit for five years from now.
That's what defines ENGINEERING-led companies -- they take these huge gambles on something that may work, but who can be sure. IBM in the 60s (with S360) but then they got colonized by the finance parasites. Intel in the 70s through early 2000s, but then they got colonized by the finance parasites. Qualcomm for a few years after they productized CDMA but then...

Enjoy Apple's engineering-driven management right now. It won't last 😢. We're probably safe as long as Tim is in charge; maybe even as long as Jony is in charge of his division.
But in the end the parasites always win...
 
  • Like
Reactions: Qwertilot

nxre

Member
Nov 19, 2020
60
103
66
For Zen 2 on N7 this is completely wrong and a common(ly repeated) mistake, AMD did use the high density cells as well. Considering the talks about Zen 3 using the same improvements as the Zen 2 XT chips that likely applies to Zen 3 too.
Thanks, I stand corrected.
I assumed becaused A12 is 6,9Billion transistors in around ~83mm2 while a Zen 2 chiplet is around the same size for 3,9Billion transistor they would be using different cells.
If Zen 3 on TSMC HD cells can achieve 5Ghz clock speeds thats impressive
 
  • Like
Reactions: Tlh97

insertcarehere

Senior member
Jan 17, 2013
639
607
136
On the topic of node advantages, I think Apple and AMD don't use the same cells even on the same node. AFAIK apple uses the high density cells while AMD uses the high performance ones, so even ISO-node comparisons would be complicated.

I think the more pertinent point is that, apples-to-apples, a given node will take more time to be good to operate at ~5ghz (Zen3) than ~3ghz (Apple/Qualcomm). Besides Apple, Snapdragon SoCs have also adopted cutting edge TSMC nodes more quickly than AMD CPUs despite fairly low margins per die.
 
  • Like
Reactions: Tlh97

moinmoin

Diamond Member
Jun 1, 2017
4,959
7,686
136
Thanks, I stand corrected.
I assumed becaused A12 is 6,9Billion transistors in around ~83mm2 while a Zen 2 chiplet is around the same size for 3,9Billion transistor they would be using different cells.
If Zen 3 on TSMC HD cells can achieve 5Ghz clock speeds thats impressive
My guess is that AMD is using the HD 6 track library to be able to selectively make areas more power efficient. While with HP 7.5T you can't go denser than that later, the opposite way using 6T basically as the grid but spread out the cells more to achieve high frequencies is still possible. This would explain how Zen 2 has significantly fewer transistors in the same area than A12 even though both use the same node.
 
  • Like
Reactions: Tlh97

LightningZ71

Golden Member
Mar 10, 2017
1,628
1,898
136
The A12X/A12Z is a 4+4 design just like the M1, so they clearly wouldn't have had to compromise on the number of Firestorm cores if it was built on N7, just on the L2/SLC sizing. From the A12 generation to the A14 generation the biggest transistor increase was in the NPU, which doubled in size from 8 to 16 cores, they would have stuck with 8.

The M1 core is a full 6 billion more transistors than the A12Z. The enlarged NPU takes up nowhere near that much. AT pointed out that the Lightning core in the A13 was just under 30% larger than the HP core in the A12. The Firestorm core, according to AT, has widened vector capabilities, significant improvements in it's FP resources, and has even more massive "L1" caches than Lightning. The Lightning cores were ~2.6mm sqare in the A13. The Firestorm would easily have been north of 3mm each, making them take up at least 50% more space than the HP cluster in A12Z. And that's without the expanded caches. This also doesn't take into account that the efficiency cores have also been improved and expanded through two genations as well.

Implementing M1 on N7 would have required either a much larger die, or significant resource sacrifices throughout the chip. It certainly would not fit in the current price/performance bracket that it fits in at present.
 
  • Like
Reactions: Tlh97 and moinmoin

Heartbreaker

Diamond Member
Apr 3, 2006
4,228
5,228
136
The M1 core is a full 6 billion more transistors than the A12Z. The enlarged NPU takes up nowhere near that much. AT pointed out that the Lightning core in the A13 was just under 30% larger than the HP core in the A12. The Firestorm core, according to AT, has widened vector capabilities, significant improvements in it's FP resources, and has even more massive "L1" caches than Lightning. The Lightning cores were ~2.6mm sqare in the A13. The Firestorm would easily have been north of 3mm each, making them take up at least 50% more space than the HP cluster in A12Z. And that's without the expanded caches. This also doesn't take into account that the efficiency cores have also been improved and expanded through two genations as well.

Implementing M1 on N7 would have required either a much larger die, or significant resource sacrifices throughout the chip. It certainly would not fit in the current price/performance bracket that it fits in at present.

Or Smaller NPU and Smaller GPU, and a bit smaller cache, or made a slightly larger die.

I think we can be very certain that they wouldn't have dropped below 4 performance cores even if they were still on 7nm.

It's ridiculous argument, and rather pointless hypothetical.
 
  • Like
Reactions: Doug S and Gideon

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
What Apple has done here isn't a miracle. It's been obvious for quite a long time that the way forward is to go wider. At the end of the day, throughput is what matters most. Not IPC, not clocks. But how much you can get done in a given amount of time. Higher clocks has been bled dry and then some. So that's out. Only leaves getting more out of the cycles you have. By keeping clocks down around the silicon's sweet spot they get great energy savings. And by going so wide they get a lot of throughput.

I'm not so sure about this. With all the fanfare (especially from Andrei F.) surrounding Apple's ultra wide designs, I think people are being premature in assuming that wider is the way to go, but that could be because of their novelty more than anything. Zen 3 is a four wide design that is still able to achieve incredible performance relative to the eight wide M1 despite being on a bigger node, and using a chiplet design. Granted it does manage to do this with a much higher power draw, but it was designed to achieve high clock speeds and be used across various platforms where power draw isn't as restricted, unlike the M1 which was designed to be as power efficient as possible.

x86 will have to go wider as well at some point, though because of the variable length instructions that is vastly more difficult.

Intel and AMD have shown no inclination to go to ultra wide designs. I think they prefer their CPUs to be able to achieve 5ghz clock speeds, and considering how Intel has managed to still make their CPUs relevant for many years by virtue of a sheer clock speed advantage despite using a very old microarchitecture, I can't say they're wrong to do that.

Love Apple or hate them. But at least appreciate that they are pushing the boundaries further and hopefully pushing both x86 and other ARM builders forward as well. It's a good product that is past due, and I for one am glad to see it.

I agree. I personally don't care for Apple, but I have to admit they do push boundaries and move the industry forward in many ways.
 

Heartbreaker

Diamond Member
Apr 3, 2006
4,228
5,228
136
This guy compares the 5600x against the Apple M1 using a few native applications and the results aren't as favorable to the M1 as you'd think.

How to construct a titled test 101:

He's a Linux Fan, and he compares a full power 6 core desktop, against a low power 4 core mobile processor, pulling out the old Phoronix (Let's prove Linux is best) benchmarks, and only multi-threaded ones at that.

No one would/should expect the M1 to win in such a tilted test.
 

thunng8

Member
Jan 8, 2013
152
61
101
This guy compares the 5600x against the Apple M1 using a few native applications and the results aren't as favorable to the M1 as you'd think.

Lots of Phoronix tests in there. It was generally agreed that these tests have heavy optimisations for x86 a few pages back.

This is ok, as if you do run those applications, then an x86 would run them faster (but to be honest, looking at the suit, most of the application are unknown to me).
 

Mopetar

Diamond Member
Jan 31, 2011
7,862
6,051
136
Lots of Phoronix tests in there. It was generally agreed that these tests have heavy optimisations for x86 a few pages back.

This is ok, as if you do run those applications, then an x86 would run them faster (but to be honest, looking at the suit, most of the application are unknown to me).

The funniest part about all of this is that most of these Macs will go to the kind of people for whom a heavy load is having both Facebook and Twitter tabs going at the same time. The real benchmark for the majority of users is how low the idle power consumption is.
 

DrMrLordX

Lifer
Apr 27, 2000
21,651
10,871
136
No one would/should expect the M1 to win in such a tilted test.

The guy's reason for the video is in the first minute:

"I will say, in these benchmarks, Mac silicon is the best Mac they've ever made for performance; however, it is not the best CPU on the market".

Also look at the section of the video about Geekbench 5. He doesn't really trust the scores. I don't either. GB5 makes it look like the M1 is actually faster than a 5600x.

Even if some of the Phoronix MT tests are recompiled to support M1 fully, I think the 5600X would still win most of those benches, as would a 4900H and a next-gen 5800u (when it comes out).

The M1 is a great SoC. It's being over-sold as the best SoC/CPU on the market. It isn't, and people should stop assuming that it is.
 

Heartbreaker

Diamond Member
Apr 3, 2006
4,228
5,228
136
The guy's reason for the video is in the first minute:

"I will say, in these benchmarks, Mac silicon is the best Mac they've ever made for performance; however, it is not the best CPU on the market".

Also look at the section of the video about Geekbench 5. He doesn't really trust the scores. I don't either. GB5 makes it look like the M1 is actually faster than a 5600x.

Even if some of the Phoronix MT tests are recompiled to support M1 fully, I think the 5600X would still win most of those benches, as would a 4900H and a next-gen 5800u (when it comes out).

The M1 is a great SoC. It's being over-sold as the best SoC/CPU on the market. It isn't, and people should stop assuming that it is.

Yes he is clear up front. Doesn't like the results, so tilt the test, to get the results he (and apparently you) like better.

Also note that he didn't seem to even be able to notice that 5600X scored higher on GB MT before he set off to "debunk" the M1 "win", that only existed in his head.

He only "debunked" a strawman that he created.
 
Last edited:

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
How to construct a titled test 101:

He's a Linux Fan, and he compares a full power 6 core desktop, against a low power 4 core mobile processor, pulling out the old Phoronix (Let's prove Linux is best) benchmarks, and only multi-threaded ones at that.

I thought the M1 was an 8 core CPU?

And a large percentage of modern desktop applications are optimized for multithreaded CPUs, ie browsers. Even mobile APUs are multicore these days, so nothing wrong with predominantly multithreaded benchmarks.

No one would/should expect the M1 to win in such a tilted test.

But it's fine when Geekbench scores which favor mobile CPUs show the M1 in a positive light :cool:
 

IvanKaramazov

Member
Jun 29, 2020
56
102
66
But it's fine when Geekbench scores which favor mobile CPUs show the M1 in a positive light :cool:

Geekbench has issues and arguably shouldn’t be used as the primary benchmark by as many sites, but I’ve never seen compelling evidence that it actually “favors mobile CPUs”, despite constant claims to that effect. If you read the many, extended discussions on it by Torvalds and everyone at real world tech, for example, there is much debate about whether the selected workloads and means of testing are really representative of meaningful, real world workload, but no ones decrying it as a mobile-friendly benchmark, or arguing that it unfairly favors Apple.
 

Roland00Address

Platinum Member
Dec 17, 2008
2,196
260
126
I thought the M1 was an 8 core CPU?
M1 is 4 big cores + 4 little cores (Firestorm + Icestorm, the Firestorm use more power and are faster, Icestorm is the energy efficient ones that are also smaller amount of die compared to the "large" cores.)

AMD Ryzen 5600x is a 65w 6 core 12 threaded Zen 3 CPU (aka faster than any laptop Zen 2 chip which maxes out at 45w. Both more power and newer version than any AMD laptop shipping.)

The fact that a 65w 6 core (12 thread) cpu is competing with a 15w 4+4 core cpu is more a compliment to the 15w mobile chip.

With the 4 benchmarks he did besides geekbench (located here, it is in his youtube comments)

The AMD part is 20 to 45% faster. Honestly I would hope a desktop chip that can use 4.0+ x the power is faster. Especially since it is also a Nov Silicon release and not something that is 6, 12, 18 months old.

The m1 is an "ultrabook" chip, it should not be in the same range of classes as a desktop chip that has 4x the power based off the history of ultrabook chips from 2008 to Now. (First macbook air is why I use the year 2008, Intel launching the ultrabook intiative was 2012, based off the success of the 2010 macbook air that was the first model of macbook air that had ssd in all of their models while the 2008 only had ssds in some of their models.)

This is a "big deal."
 
  • Like
Reactions: Heartbreaker

thunng8

Member
Jan 8, 2013
152
61
101
I thought the M1 was an 8 core CPU?

And a large percentage of modern desktop applications are optimized for multithreaded CPUs, ie browsers. Even mobile APUs are multicore these days, so nothing wrong with predominantly multithreaded benchmarks.



But it's fine when Geekbench scores which favor mobile CPUs show the M1 in a positive light :cool:
Talking about browsers, the M1 mops the floor on every single browser benchmark out there. Even real world, performance impressions of page load, scrolling and running multiple tabs show a major uplift in performance. Originally people thought it was because Safari was so well optimized, but guess what? Chrome was just compiled for the M1 and shows a similar superior score.


And you cannot dismiss all the other real world benchmarks out there. Lightroom and Premiere Pro running under Rosetta in many cases outperforms the best x86 processor out there.

There have been so many example of commerical software running faster on the M1. We aren't even cosidering battery life into the equation either. Approx 2X more battery life compared to Ryzen or Tiger Lake.

Sure it might not run Phoronix as fast, but there is still a lot of time for FOSS to optimize for the ARM architecture, like they have being doing for x86 over many many years.
Geekbench has issues and arguably shouldn’t be used as the primary benchmark by as many sites, but I’ve never seen compelling evidence that it actually “favors mobile CPUs”, despite constant claims to that effect. If you read the many, extended discussions on it by Torvalds and everyone at real world tech, for example, there is much debate about whether the selected workloads and means of testing are really representative of meaningful, real world workload, but no ones decrying it as a mobile-friendly benchmark, or arguing that it unfairly favors Apple.
Yes, he is grasing at straws trying to debunk a benchmark. If he doesn't like geekbench, why not use SpecCPU? It shows a remarkly correlation with geekbench - and the M1 at a per core level is very close to the top of the range 5950x while using multiple times less power.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Geekbench has issues and arguably shouldn’t be used as the primary benchmark by as many sites, but I’ve never seen compelling evidence that it actually “favors mobile CPUs”, despite constant claims to that effect. If you read the many, extended discussions on it by Torvalds and everyone at real world tech, for example, there is much debate about whether the selected workloads and means of testing are really representative of meaningful, real world workload, but no ones decrying it as a mobile-friendly benchmark, or arguing that it unfairly favors Apple.

What I mean by favor mobile CPUs, I mean the benchmarks themselves are very short so as to not cause throttling.
 
  • Like
Reactions: Tlh97

insertcarehere

Senior member
Jan 17, 2013
639
607
136
What I mean by favor mobile CPUs, I mean the benchmarks themselves are very short so as to not cause throttling.

But mobile CPUs throttle because they are placed in a chassis that cannot dissipate heat well (y'know, a smartphone), not because of any flaw with the chips themselves. Given that the M1 placed with active cooling doesn't throttle anyways I don't see how that favors it vs x86.
 

scannall

Golden Member
Jan 1, 2012
1,946
1,638
136
This guy compares the 5600x against the Apple M1 using a few native applications and the results aren't as favorable to the M1 as you'd think.

i like and respect Phoronix. But I don't see the point of running a server suite against a low end consumer laptop part vs a part drawing about 8 times the wattage in benchmarks designed for unlimited power draw.