Discussion Apple Silicon SoC thread

Eug · Nov 10, 2020

M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:

M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:

Page 78 - Discussion - Apple Silicon SoC thread

Page 78 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M1 Ultra discussion here:

Page 109 - Discussion - Apple Silicon SoC thread

Page 109 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M2 discussion here:

Page 127 - Discussion - Apple Silicon SoC thread

Page 127 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:

Page 215 - Discussion - Apple Silicon SoC thread

Page 215 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M4 Family discussion here:

Page 263 - Discussion - Apple Silicon SoC thread

Page 263 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

Doug S · Nov 26, 2020

Qwertilot said:
Better place to ask this here I suppose - has anyone mamaged to bench mark the NPU portion of the M1?

Apple put a lot of area and resources into upping its performance for A14 & M1, so they presumably have a considerable end goal in mind.

Apple provided the only information that matters - it was 5 trillion operations per second, now it is 11 trillion operations per second. Keep in mind that NPUs are not like CPUs, they are very simple so that TOPS figure is all you really need. All its abilities come from whatever software is run on it, you can't really get "uarch improvements" like you can with a CPU.

So far all we know about what the NPU's goal is is to provide "AI" stuff like Siri without using the cloud for improved privacy. That can't be all of it - the original NPU was .5 TOPS so it has already scaled by 22x in the few short years since Apple added it, so we'll have to wait and see what else they do with it.

Maybe they are assuming developers will come up with some uses for it too, hopefully having that as a baseline that will be present in every Mac at that level or higher performance will lead to some interesting stuff down the road.

Qwertilot · Nov 26, 2020

Doug S said:
Maybe they are assuming developers will come up with some uses for it too, hopefully having that as a baseline that will be present in every Mac at that level or higher performance will lead to some interesting stuff down the road.

Thanks - useful.

I guess I was wondering a bit how it does Vs say a GPU/CPU combination at running - or even training - a neural net.

It obviously won't compete with A100

Shouldn't the tight integration between the elements of the SoC logically help out a good bit?

An M1 based laptop might even be a good machine for testing ideas out on before deploying into a vast workstation/cloud etc.

name99 · Nov 26, 2020

Doug S said:
Apple provided the only information that matters - it was 5 trillion operations per second, now it is 11 trillion operations per second. Keep in mind that NPUs are not like CPUs, they are very simple so that TOPS figure is all you really need. All its abilities come from whatever software is run on it, you can't really get "uarch improvements" like you can with a CPU.

This (no "uarch improvements") is far from true. Improvements in this vein include
- special casing 0s and +-1s in the MAC's
- support for narrower than 8-bit weights
- exactly how you structure your "cache" system (ie how often you have to reload weights, and how computed can flow back into the system for recurrent networks)
- (And of course the wins from the infamous UMA that we've already heard so much about -- misunderstood, dismissed, flat out lied about, from the usual crowd)

At some point we are also going to see a recapitulation of GPU evolution. This will likely include branches and synchronization primitives, if they aren't there already.

Mopetar · Nov 26, 2020

Doug S said:
As for what Intel chips they'll use, presumably the latest ones assuming Apple can get them in sufficient quantity. They obviously have nothing to fear from performance testing going head to head against Intel's best.

I honestly wish they'd just use AMD chips to bridge the gap. I might be interested in buying a 16-core iMac with a Zen 3 in it, but why would I want to buy another x86 Mac if it's not even using the best x86 chips or giving me a good reason (i.e., tons of cores) to take it over an SoC that can beat the Intel parts in a lot of tasks due to dedicated hardware?

Unless they still have some contract with Intel to be the sole supplier of Apple's x86 chips, I don't see any reason to keep them around.

name99 · Nov 26, 2020

Qwertilot said:
Thanks - useful.

I guess I was wondering a bit how it does Vs say a GPU/CPU combination at running - or even training - a neural net.

It obviously won't compete with A100 Shouldn't the tight integration between the elements of the SoC logically help out a good bit?

An M1 based laptop might even be a good machine for testing ideas out on before deploying into a vast workstation/cloud etc.

Apple is comfortable with training on their devices right now...

Accelerating TensorFlow Performance on Mac

Accelerating TensorFlow 2 performance on Mac

blog.tensorflow.org

Sure it's not at nV hundreds of watts level - yet...

It is likely that at least some of the training is being done using AMX on the CPU, and this being Apple and the design being what it is, likely that training is making use of "accelerator level parallelism", ie perform some of the operations on the AMX part of the CPU while other training operations occur simultaneously on the NPU. Once again uniform address space, coherency across all devices, powerful sync/atomic primitives, and large SLC mean this should be both fairly easy to program (ie compared to doing the same thing on other platforms) and reasonably performant (perhaps no more than one to two hundred cycles from making a change on the CPU to having it appear on the NPU or vice versa?)

As for what's the point? The big win at the end of all this is language.
Ever better translation. Better writing assistants (grammar, spelling). Semantic search (ie "search what I mean, not the literal words".
Basically a PC that ever years moves from being a really dumb secretary to an ever smarter secretary.

Of course this is all something of a gamble. No-one is sure the extent to which the existing techniques (adequate, not great) can be improved. No-one is sure that HW optimized for existing techniques will be a good fit for five years from now.
That's what defines ENGINEERING-led companies -- they take these huge gambles on something that may work, but who can be sure. IBM in the 60s (with S360) but then they got colonized by the finance parasites. Intel in the 70s through early 2000s, but then they got colonized by the finance parasites. Qualcomm for a few years after they productized CDMA but then...

Enjoy Apple's engineering-driven management right now. It won't last 😢. We're probably safe as long as Tim is in charge; maybe even as long as Jony is in charge of his division.
But in the end the parasites always win...

nxre · Nov 26, 2020

moinmoin said:
For Zen 2 on N7 this is completely wrong and a common(ly repeated) mistake, AMD did use the high density cells as well. Considering the talks about Zen 3 using the same improvements as the Zen 2 XT chips that likely applies to Zen 3 too.

Thanks, I stand corrected.
I assumed becaused A12 is 6,9Billion transistors in around ~83mm2 while a Zen 2 chiplet is around the same size for 3,9Billion transistor they would be using different cells.
If Zen 3 on TSMC HD cells can achieve 5Ghz clock speeds thats impressive

insertcarehere · Nov 26, 2020

nxre said:
On the topic of node advantages, I think Apple and AMD don't use the same cells even on the same node. AFAIK apple uses the high density cells while AMD uses the high performance ones, so even ISO-node comparisons would be complicated.

I think the more pertinent point is that, apples-to-apples, a given node will take more time to be good to operate at ~5ghz (Zen3) than ~3ghz (Apple/Qualcomm). Besides Apple, Snapdragon SoCs have also adopted cutting edge TSMC nodes more quickly than AMD CPUs despite fairly low margins per die.

moinmoin · Nov 26, 2020

nxre said:
Thanks, I stand corrected.
I assumed becaused A12 is 6,9Billion transistors in around ~83mm2 while a Zen 2 chiplet is around the same size for 3,9Billion transistor they would be using different cells.
If Zen 3 on TSMC HD cells can achieve 5Ghz clock speeds thats impressive

My guess is that AMD is using the HD 6 track library to be able to selectively make areas more power efficient. While with HP 7.5T you can't go denser than that later, the opposite way using 6T basically as the grid but spread out the cells more to achieve high frequencies is still possible. This would explain how Zen 2 has significantly fewer transistors in the same area than A12 even though both use the same node.

LightningZ71 · Nov 26, 2020

Doug S said:
The A12X/A12Z is a 4+4 design just like the M1, so they clearly wouldn't have had to compromise on the number of Firestorm cores if it was built on N7, just on the L2/SLC sizing. From the A12 generation to the A14 generation the biggest transistor increase was in the NPU, which doubled in size from 8 to 16 cores, they would have stuck with 8.

The M1 core is a full 6 billion more transistors than the A12Z. The enlarged NPU takes up nowhere near that much. AT pointed out that the Lightning core in the A13 was just under 30% larger than the HP core in the A12. The Firestorm core, according to AT, has widened vector capabilities, significant improvements in it's FP resources, and has even more massive "L1" caches than Lightning. The Lightning cores were ~2.6mm sqare in the A13. The Firestorm would easily have been north of 3mm each, making them take up at least 50% more space than the HP cluster in A12Z. And that's without the expanded caches. This also doesn't take into account that the efficiency cores have also been improved and expanded through two genations as well.

Implementing M1 on N7 would have required either a much larger die, or significant resource sacrifices throughout the chip. It certainly would not fit in the current price/performance bracket that it fits in at present.

Heartbreaker · Nov 26, 2020

LightningZ71 said:
The M1 core is a full 6 billion more transistors than the A12Z. The enlarged NPU takes up nowhere near that much. AT pointed out that the Lightning core in the A13 was just under 30% larger than the HP core in the A12. The Firestorm core, according to AT, has widened vector capabilities, significant improvements in it's FP resources, and has even more massive "L1" caches than Lightning. The Lightning cores were ~2.6mm sqare in the A13. The Firestorm would easily have been north of 3mm each, making them take up at least 50% more space than the HP cluster in A12Z. And that's without the expanded caches. This also doesn't take into account that the efficiency cores have also been improved and expanded through two genations as well.

Implementing M1 on N7 would have required either a much larger die, or significant resource sacrifices throughout the chip. It certainly would not fit in the current price/performance bracket that it fits in at present.

Or Smaller NPU and Smaller GPU, and a bit smaller cache, or made a slightly larger die.

I think we can be very certain that they wouldn't have dropped below 4 performance cores even if they were still on 7nm.

It's ridiculous argument, and rather pointless hypothetical.

Carfax83 · Nov 26, 2020

scannall said:
What Apple has done here isn't a miracle. It's been obvious for quite a long time that the way forward is to go wider. At the end of the day, throughput is what matters most. Not IPC, not clocks. But how much you can get done in a given amount of time. Higher clocks has been bled dry and then some. So that's out. Only leaves getting more out of the cycles you have. By keeping clocks down around the silicon's sweet spot they get great energy savings. And by going so wide they get a lot of throughput.

I'm not so sure about this. With all the fanfare (especially from Andrei F.) surrounding Apple's ultra wide designs, I think people are being premature in assuming that wider is the way to go, but that could be because of their novelty more than anything. Zen 3 is a four wide design that is still able to achieve incredible performance relative to the eight wide M1 despite being on a bigger node, and using a chiplet design. Granted it does manage to do this with a much higher power draw, but it was designed to achieve high clock speeds and be used across various platforms where power draw isn't as restricted, unlike the M1 which was designed to be as power efficient as possible.

x86 will have to go wider as well at some point, though because of the variable length instructions that is vastly more difficult.

Intel and AMD have shown no inclination to go to ultra wide designs. I think they prefer their CPUs to be able to achieve 5ghz clock speeds, and considering how Intel has managed to still make their CPUs relevant for many years by virtue of a sheer clock speed advantage despite using a very old microarchitecture, I can't say they're wrong to do that.

Love Apple or hate them. But at least appreciate that they are pushing the boundaries further and hopefully pushing both x86 and other ARM builders forward as well. It's a good product that is past due, and I for one am glad to see it.

I agree. I personally don't care for Apple, but I have to admit they do push boundaries and move the industry forward in many ways.

Carfax83 · Nov 26, 2020

This guy compares the 5600x against the Apple M1 using a few native applications and the results aren't as favorable to the M1 as you'd think.

Heartbreaker · Nov 26, 2020

Carfax83 said:
This guy compares the 5600x against the Apple M1 using a few native applications and the results aren't as favorable to the M1 as you'd think.

How to construct a titled test 101:

He's a Linux Fan, and he compares a full power 6 core desktop, against a low power 4 core mobile processor, pulling out the old Phoronix (Let's prove Linux is best) benchmarks, and only multi-threaded ones at that.

No one would/should expect the M1 to win in such a tilted test.

thunng8 · Nov 26, 2020

Carfax83 said:
This guy compares the 5600x against the Apple M1 using a few native applications and the results aren't as favorable to the M1 as you'd think.

Lots of Phoronix tests in there. It was generally agreed that these tests have heavy optimisations for x86 a few pages back.

This is ok, as if you do run those applications, then an x86 would run them faster (but to be honest, looking at the suit, most of the application are unknown to me).

Mopetar · Nov 26, 2020

thunng8 said:
Lots of Phoronix tests in there. It was generally agreed that these tests have heavy optimisations for x86 a few pages back.

This is ok, as if you do run those applications, then an x86 would run them faster (but to be honest, looking at the suit, most of the application are unknown to me).

The funniest part about all of this is that most of these Macs will go to the kind of people for whom a heavy load is having both Facebook and Twitter tabs going at the same time. The real benchmark for the majority of users is how low the idle power consumption is.

DrMrLordX · Nov 26, 2020

guidryp said:
No one would/should expect the M1 to win in such a tilted test.

The guy's reason for the video is in the first minute:

"I will say, in these benchmarks, Mac silicon is the best Mac they've ever made for performance; however, it is not the best CPU on the market".

Also look at the section of the video about Geekbench 5. He doesn't really trust the scores. I don't either. GB5 makes it look like the M1 is actually faster than a 5600x.

Even if some of the Phoronix MT tests are recompiled to support M1 fully, I think the 5600X would still win most of those benches, as would a 4900H and a next-gen 5800u (when it comes out).

The M1 is a great SoC. It's being over-sold as the best SoC/CPU on the market. It isn't, and people should stop assuming that it is.

Heartbreaker · Nov 26, 2020

DrMrLordX said:
The guy's reason for the video is in the first minute:

"I will say, in these benchmarks, Mac silicon is the best Mac they've ever made for performance; however, it is not the best CPU on the market".

Also look at the section of the video about Geekbench 5. He doesn't really trust the scores. I don't either. GB5 makes it look like the M1 is actually faster than a 5600x.

Even if some of the Phoronix MT tests are recompiled to support M1 fully, I think the 5600X would still win most of those benches, as would a 4900H and a next-gen 5800u (when it comes out).

The M1 is a great SoC. It's being over-sold as the best SoC/CPU on the market. It isn't, and people should stop assuming that it is.

Yes he is clear up front. Doesn't like the results, so tilt the test, to get the results he (and apparently you) like better.

Also note that he didn't seem to even be able to notice that 5600X scored higher on GB MT before he set off to "debunk" the M1 "win", that only existed in his head.

He only "debunked" a strawman that he created.

DrMrLordX · Nov 26, 2020

guidryp said:
Yes he is clear up front. Doesn't like the results, so tilt the test, to get the results he (and apparently you) like better.

More like "sees hyperbole, debunks it".

Carfax83 · Nov 26, 2020

guidryp said:
How to construct a titled test 101:

He's a Linux Fan, and he compares a full power 6 core desktop, against a low power 4 core mobile processor, pulling out the old Phoronix (Let's prove Linux is best) benchmarks, and only multi-threaded ones at that.

I thought the M1 was an 8 core CPU?

And a large percentage of modern desktop applications are optimized for multithreaded CPUs, ie browsers. Even mobile APUs are multicore these days, so nothing wrong with predominantly multithreaded benchmarks.

No one would/should expect the M1 to win in such a tilted test.

But it's fine when Geekbench scores which favor mobile CPUs show the M1 in a positive light

IvanKaramazov · Nov 26, 2020

Carfax83 said:
But it's fine when Geekbench scores which favor mobile CPUs show the M1 in a positive light

Geekbench has issues and arguably shouldn’t be used as the primary benchmark by as many sites, but I’ve never seen compelling evidence that it actually “favors mobile CPUs”, despite constant claims to that effect. If you read the many, extended discussions on it by Torvalds and everyone at real world tech, for example, there is much debate about whether the selected workloads and means of testing are really representative of meaningful, real world workload, but no ones decrying it as a mobile-friendly benchmark, or arguing that it unfairly favors Apple.

Roland00Address · Nov 26, 2020

Carfax83 said:
I thought the M1 was an 8 core CPU?

M1 is 4 big cores + 4 little cores (Firestorm + Icestorm, the Firestorm use more power and are faster, Icestorm is the energy efficient ones that are also smaller amount of die compared to the "large" cores.)

AMD Ryzen 5600x is a 65w 6 core 12 threaded Zen 3 CPU (aka faster than any laptop Zen 2 chip which maxes out at 45w. Both more power and newer version than any AMD laptop shipping.)

The fact that a 65w 6 core (12 thread) cpu is competing with a 15w 4+4 core cpu is more a compliment to the 15w mobile chip.

With the 4 benchmarks he did besides geekbench (located here, it is in his youtube comments)

Applem1-amd5600, Applem1-first, Applem1-intel Benchmark Comparison - OpenBenchmarking.org

openbenchmarking.org

The AMD part is 20 to 45% faster. Honestly I would hope a desktop chip that can use 4.0+ x the power is faster. Especially since it is also a Nov Silicon release and not something that is 6, 12, 18 months old.

The m1 is an "ultrabook" chip, it should not be in the same range of classes as a desktop chip that has 4x the power based off the history of ultrabook chips from 2008 to Now. (First macbook air is why I use the year 2008, Intel launching the ultrabook intiative was 2012, based off the success of the 2010 macbook air that was the first model of macbook air that had ssd in all of their models while the 2008 only had ssds in some of their models.)

This is a "big deal."

thunng8 · Nov 26, 2020

Carfax83 said:
I thought the M1 was an 8 core CPU?

And a large percentage of modern desktop applications are optimized for multithreaded CPUs, ie browsers. Even mobile APUs are multicore these days, so nothing wrong with predominantly multithreaded benchmarks.

But it's fine when Geekbench scores which favor mobile CPUs show the M1 in a positive light

Talking about browsers, the M1 mops the floor on every single browser benchmark out there. Even real world, performance impressions of page load, scrolling and running multiple tabs show a major uplift in performance. Originally people thought it was because Safari was so well optimized, but guess what? Chrome was just compiled for the M1 and shows a similar superior score.

https://twitter.com/x/status/1329552544138485760

And you cannot dismiss all the other real world benchmarks out there. Lightroom and Premiere Pro running under Rosetta in many cases outperforms the best x86 processor out there.

There have been so many example of commerical software running faster on the M1. We aren't even cosidering battery life into the equation either. Approx 2X more battery life compared to Ryzen or Tiger Lake.

Sure it might not run Phoronix as fast, but there is still a lot of time for FOSS to optimize for the ARM architecture, like they have being doing for x86 over many many years.

IvanKaramazov said:
Geekbench has issues and arguably shouldn’t be used as the primary benchmark by as many sites, but I’ve never seen compelling evidence that it actually “favors mobile CPUs”, despite constant claims to that effect. If you read the many, extended discussions on it by Torvalds and everyone at real world tech, for example, there is much debate about whether the selected workloads and means of testing are really representative of meaningful, real world workload, but no ones decrying it as a mobile-friendly benchmark, or arguing that it unfairly favors Apple.

Yes, he is grasing at straws trying to debunk a benchmark. If he doesn't like geekbench, why not use SpecCPU? It shows a remarkly correlation with geekbench - and the M1 at a per core level is very close to the top of the range 5950x while using multiple times less power.

Carfax83 · Nov 27, 2020

IvanKaramazov said:
Geekbench has issues and arguably shouldn’t be used as the primary benchmark by as many sites, but I’ve never seen compelling evidence that it actually “favors mobile CPUs”, despite constant claims to that effect. If you read the many, extended discussions on it by Torvalds and everyone at real world tech, for example, there is much debate about whether the selected workloads and means of testing are really representative of meaningful, real world workload, but no ones decrying it as a mobile-friendly benchmark, or arguing that it unfairly favors Apple.

What I mean by favor mobile CPUs, I mean the benchmarks themselves are very short so as to not cause throttling.

insertcarehere · Nov 27, 2020

Carfax83 said:
What I mean by favor mobile CPUs, I mean the benchmarks themselves are very short so as to not cause throttling.

But mobile CPUs throttle because they are placed in a chassis that cannot dissipate heat well (y'know, a smartphone), not because of any flaw with the chips themselves. Given that the M1 placed with active cooling doesn't throttle anyways I don't see how that favors it vs x86.

scannall · Nov 27, 2020

Carfax83 said:
This guy compares the 5600x against the Apple M1 using a few native applications and the results aren't as favorable to the M1 as you'd think.

i like and respect Phoronix. But I don't see the point of running a server suite against a low end consumer laptop part vs a part drawing about 8 times the wattage in benchmarks designed for unlimited power draw.

Discussion Apple Silicon SoC thread

Lifer

Diamond Member

Golden Member

Senior member

Diamond Member

Senior member

Member

Senior member

Diamond Member

Platinum Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Member

Diamond Member

Lifer

Diamond Member

Lifer

Diamond Member

Member

Platinum Member

Member

Diamond Member

Senior member

Golden Member