Discussion Apple Silicon SoC thread

Eug · Nov 10, 2020

M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:

M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:

Page 78 - Discussion - Apple Silicon SoC thread

Page 78 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M1 Ultra discussion here:

Page 109 - Discussion - Apple Silicon SoC thread

Page 109 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M2 discussion here:

Page 127 - Discussion - Apple Silicon SoC thread

Page 127 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:

Page 215 - Discussion - Apple Silicon SoC thread

Page 215 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M4 Family discussion here:

Page 263 - Discussion - Apple Silicon SoC thread

Page 263 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M5 Family discussion here:

Page 431 - Discussion - Apple Silicon SoC thread

Page 431 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

Eug · Nov 11, 2020

NeoLuxembourg said:
I was no fan of the ARM transition and the release yesterday did not help.

We will oder a unit for testing, but compared to the 6-core 32GB Mac minis we have at work, this is not that impressive.

I hope the improved ST performance makes up for the 16GB limitation.

As mentioned, the 6-core Mac mini has not yet been replaced, and is still for sale. This entry level M1 Mac mini replaces the entry level 3.6 GHz quad-core Core i3-8100B Mac mini.

The main drawback with the new M1 Mac mini is the fewer ports and the loss of user RAM upgradability with the 16 GB limit. (For the replaced MacBook Air and Pro the 16 GB limit has not changed though. Those entry level models were also limited to 16 GB.)

Qwertilot · Nov 11, 2020

beginner99 said:
Interesting but I will never invest in a closed system.

On that note the big gains most likley come from better performance rather than less power. Yes, they list better battery life but not much better given they are comparing ice lake based macs to their new m1 macs. Ice Lake wasn't clearly better in terms of battery life than 14nm products in contrast to Tiger Lake.

Given all the fixed elements of the power draw on a laptop - like the screen - getting that level of extra battery life purely from swapping the CPU is honestly fairly amazing.

The very integrated nature of it all - the RAM packaging etc - must be giving them some quite considerable power draw advantages . I don't think anyone else is even really thinking of doing that?

beginner99 said:
Ultimately apple is coming from mobiles into laptops while Intel/AMD are coming from server into laptops. Eg. apple chip is geared to consumer use while x86 are geared for server use. Huge, wide cores with excellent ST performance make sense for consumer products. But how far will they scale?
My point being as Apple "creeps" up the device range to higher power, their advantage will diminish. Is it coincidence we aren't seeing any compilation benchmarks? All the video/image stuff could be accelerated by a custom chip/core/dsp on the SOC like QuickSync on intels side.

Huge, wide cores scale very easily if you use more of them and let them take more power?

Yes, they can off load some extra things - mainly to the GPU/dedicated machine learning cores I would think. That's something they've earned by having things like unified memory and enough software control to let them actually use that properly. Obviously far from a new dream, but AMD never had the money or leverage to make it work.

insertcarehere · Nov 11, 2020

Leeea said:
One of the odd things about Anandtech's tests is they run on the default memory profile. IE, they do not enable XMP profile. This effectively takes the 3600 MHz memory you would see on that Ryzen, and sets it to 2400 MHz. A 34% nerf. With Ryzen series, that also nerfs the infinity fabric, or internal cpu communication. The intel chip is supposed to run at 4267 or 3200, but running it at 2400 nerfs it by 44% and 25% respectively.

Normally not that big of a deal. However, in Spec2006, which is somewhat a memory focused benchmark, that is likely to effect things.

The Zen 3/Zen 2/10900k test setups were running at their manufacturer's official maximum supported DRAM frequency, how does that effectively "set" to DDR4-2400?. Notwithstanding that the tested A14 is most likely using a 64bit memory bus (ala A12/A13) which puts it at a bandwidth deficit to the 128bit memory bus (dual-channel) that the desktop test benches would have.

Leeea said:
It is interesting to note that Spec2006 also seems biased to large cache sizes. I would point out the official spec result for Intel's 5775C. The 5775C has one of the largest CPU caches of all time, was very power efficient, and turned out a score of 66.3. In 2015. With Anandtech's test above, it seems the i9-10900k only scored 59 in 2020. Rather odd result, eh?

http://spec.cs.miami.edu/cpu2006/results/res2015q4/cpu2006-20151116-38058.pdf

Your link points to a result using Intel-specific compilers which Anand doesn't use for the benches (Look at that libquantum test for why). The 5775C looks a lot more normal with Anandtech's methodology.

In any rate, that also puts the A14 (8MB shared L2, no L3) on the back foot compared to Zen 3 (32MB shared L3), Zen 2 (16MB shared L3), and 10900k (20MB shared L3), given that in a single-threaded test the latter chips can use all the L3 for 1 core..

dacostafilipe · Nov 11, 2020

Qwertilot said:
They're not replacing those models yet. That's for their next wave. This is entry level and cheap.

That's why I need to test those. If I need a new Mac mini tomorrow, I have to know if having a 16GB would be okay or if I have to still get the Intel version.

As we mostly do development, we have zero usage for all the special stuff on the SoC (AI,GPU,...). ST performance is the most important metric, but as we can have multiple VMs running at the same time, RAM and core count also is an important factor. I need to check compatibility with our dev tools and how well they perform, specially the VM configurations, as our servers all run x86 ...

Eug · Nov 11, 2020

NeoLuxembourg said:
That's why I need to test those. If I need a new Mac mini tomorrow, I have to know if having a 16GB would be okay or if I have to still get the Intel version.

As we mostly do development, we have zero usage for all the special stuff on the SoC (AI,GPU,...). ST performance is the most important metric, but as we can have multiple VMs running at the same time, RAM and core count also is an important factor. I need to check compatibility with our dev tools and how well they perform, specially the VM configurations, as our servers all run x86 ...

You can’t run x86 VMs on M1 models.

Entropyq3 · Nov 11, 2020

Eug said:
Does someone have a list of differences between what we think M1 is, vs. what we think A14X will be?

I think it will be exactly the same chip.
They will simply pull out more I/O in the macs. (M1 supports PCIe4, I haven’t given up hope the Mini will allow user replaceable SSDs.)

Eug · Nov 11, 2020

Entropyq3 said:
I think it will be exactly the same chip.
They will simply pull out more I/O in the macs. (M1 supports PCIe4, I haven’t given up hope the Mini will allow user replaceable SSDs.)

Are you saying you think the hypothetical A14X has the same integrated I/O controller as M1 but just that the iPad Pros will not utilize it to the same extent?

Or do you think A14X won’t exist then and they’ll just call it M1 in the iPad Pros too?

I had been guessing that M1 and A14X would have the same cores but slightly different designs, but I have no idea if this would make sense from a business perspective.

beginner99 · Nov 11, 2020

Qwertilot said:
Huge, wide cores scale very easily if you use more of them and let them take more power?

But it's not efficient if most of the core is unused for many workloads. Fine if you only have 4 of them, an issue if there are 32 or 64. Processing web requests doesn't need a ultra-wide beefy core. You are better off having 2 smaller ones at higher clock-speeds.

jpiniero · Nov 11, 2020

Entropyq3 said:
I think it will be exactly the same chip.
They will simply pull out more I/O in the macs. (M1 supports PCIe4, I haven’t given up hope the Mini will allow user replaceable SSDs.)

It's definitely soldered.

Gideon · Nov 11, 2020

beginner99 said:
But it's not efficient if most of the core is unused for many workloads. Fine if you only have 4 of them, an issue if there are 32 or 64. Processing web requests doesn't need a ultra-wide beefy core. You are better off having 2 smaller ones at higher clock-speeds.

This is only true if those smaller cores at higher clocks do not burn more power for similar amount of requests processed.

If you are not die-area or power limited on your wide cores (e.g. you can add as many in total) and their power/performance curve is more efficient, you can just run them at lower clocks for the same performance (but less power):

Die-are of course can become a major concern stopping you from adding competitive number of cores, but with the manufacturing processes improving, area used by I/O and system-wide stuff tends to dwarf the area used for cores on modern chips (just compare Rome's I/O die to the compute chiplets, which are also mostly cache).

Qwertilot · Nov 11, 2020

They've also got those (not very!) little cores to help out for some things like that. Something else where their control over the software side of things helps a lot.

Entropyq3 · Nov 11, 2020

Eug said:
Are you saying you think the hypothetical A14X has the same integrated I/O controller as M1 but just that the iPad Pros will not utilize it to the same extent?

Or do you think A14X won’t exist then and they’ll just call it M1 in the iPad Pros too?

The first.
Given the total cost for a 5nm design of this complexity, and how little die area needs to be spent for the extra I/O capabilities of the Mini and laptops, I can’t imagine that making separate chips for the iPad Pros and the Macs makes financial sense. The larger production volumes reduces costs and allows for binning options that include the iPad Pros as a tier. It’s a win all around.

Entropyq3 · Nov 11, 2020

jpiniero said:
It's definitely soldered.

Just crush my hopes, will ya!
Seriously, while I’m inclined to believe this, do you have a source?

dacostafilipe · Nov 11, 2020

Eug said:
You can’t run x86 VMs on M1 models.

Not talking about running x86 VMs, but ARM VMs.

That's why I need to reconfigure the VMs, check software stack compatibility and adapt the workflow.

Heartbreaker · Nov 11, 2020

JDG1980 said:
The vast majority of laptops sold are low-end junk, so this is believable.

Sure, but 98% are not low end junk. I have no doubt these new "low end" Macs with will be better than most decent mid range laptops as well.

And this is just Apples new low end. Wait till we see mid range and high end next year. Those will be monsters.

You also have to think about, how Apple has full end to end stack control of both HW and SW here, and they integrated everything specialized into the silicon as well. This is going to efficiently offload tasks seamlessly to the most capable specialized units in the SoC.

Apples problem going forward might be that these Low end systems could serve nearly everyone, and it will be hard to up-sell beyond the ego driven buyers.

Excelsior · Nov 11, 2020

amrnuke said:
I for one think this is going to be a watershed moment and will surely be one of the biggest highlights for Tim Cook's reign, no matter what comes. This will formally erase that blurry line between phones and computers.

As for the performance, using just SPEC and GB5 it would be easy to get excited. This will be a stellar laptop and entry level desktop Mac for most users. Undoubtedly, it'll handle word processing, light to medium spreadsheet work, presentations, browsing, and so on with ease....

I've been following Apple closely for over two decades, including going to a macworld expo. I remember the big switch to OSX and to Intel. I think this move is huge,and I'm no fanboy. I've built Intel systems, currently run an AMD desktop and the only newish Apple product I have right now is a 7th gen iPad.

But I can't wait to see the benchmarks for this...I would potentially consider an Apple laptop again just based on this release.

jpiniero · Nov 11, 2020

Entropyq3 said:
Just crush my hopes, will ya!
Seriously, while I’m inclined to believe this, do you have a source?

All of Apple's laptops have been soldered RAM & storage for a couple years now. With USB4, you might get decent performance out of external storage eventually.

amrnuke · Nov 11, 2020

Excelsior said:
I've been following Apple closely for over two decades, including going to a macworld expo. I remember the big switch to OSX and to Intel. I think this move is huge,and I'm no fanboy. I've built Intel systems, currently run an AMD desktop and the only newish Apple product I have right now is a 7th gen iPad.

But I can't wait to see the benchmarks for this...I would potentially consider an Apple laptop again just based on this release.

One thing I think that will be very interesting to me going forward is not just "can they pass AMD in raw SPEC" but "can they do so scaled out". It's one thing to win in SPECint2006 and SPECfp2006 and SPEC 2017 1T, but quite another to have that scale out well to multithreaded workloads. That's where I see things being interesting. If they've found a way to truly essentially quadruple (or more) the ST performance metrics out to multithreaded workloads (and I mean that we really need to look beyond 4 cores, to 6 or 8 big cores), then this is huge. If not (which I anticipate they'll have some more scaling issues than AMD or Intel) then at least they're on incredibly solid footing.

We also have some issues here, in that almost all of Apple's recent Axx performance increase has come from clock speeds and leveraging TSMC processes. Apple have a huge lead in SPEC per GHz and power consumption but AMD are making more gains year over year exclusive of the node advancements. So in nearly all respects the gap is narrowing. Even in efficiency. 5600X beats 3600 by quite a bit in compute workloads but draws 76W peak power vs the 3600's 88W. So that's really interesting.

Anyway, this is an M1 thread, and I only mention Zen 3 because I think it's the leading efficiency and performance comparison.

Heartbreaker · Nov 11, 2020

amrnuke said:
One thing I think that will be very interesting to me going forward is not just "can they pass AMD in raw SPEC" but "can they do so scaled out". It's one thing to win in SPECint2006 and SPECfp2006 and SPEC 2017 1T, but quite another to have that scale out well to multithreaded workloads. That's where I see things being interesting. If they've found a way to truly essentially quadruple (or more) the ST performance metrics out to multithreaded workloads (and I mean that we really need to look beyond 4 cores, to 6 or 8 big cores), then this is huge. If not (which I anticipate they'll have some more scaling issues than AMD or Intel) then at least they're on incredibly solid footing.

IMO, Apples issue with MT workload comparisons will be the lack of SMT. SMT gives AMD/Intel ~30% extra performance out of MT workloads. So 4 cores can provide over 500% MT scaling over ST performance with SMT. 4 Apple cores without SMT would obviously be limited to 400% scaling or less.

But people overrate how much MT performance most users need. Cinebench is a benchmark for 3D rendering, something >95% of home users don't do at all. Video compression has dedicated HW in the SoC, and you fast run out of use cases for home computer MT workloads.

So, these new parts for low end home Macs, will like over-serve most of that market.

We probably have a long wait for the ARM based Mac Pros, that will show what Apple has for the serious MT capability.

amosliu137 · Nov 11, 2020

Someone test cinebench r23. Compared with 9750h by “我用第三人称”，apple chip is very impressive.

Heartbreaker · Nov 11, 2020

amosliu137 said:
Someone test cinebench r23. Compared with 9750h by “我用第三人称”，apple chip is very impressive.

I am not impressed.

It's OK, if that's a MBA with it's limited power budget. But disappointing if that is the new Mac Mini with more power to use, especially with all Apples talk of massive performance increases.

What will be nice is we can finally get away from using Geekbench as a CPU bench...

AmericanLocomotive · Nov 11, 2020

Eug said:
As mentioned, Apple doesn't care about Renoir here, and quite frankly, I'd say the vast majority of their target market doesn't care either. In fact, I'd say 99% of them don't even know what Renoir is. It would be stupid for Apple to focus on that.

But once again, Apple made many, many comparisons to the laptop market as a whole. So that means they should be making direct comparisons to Renoir Windows laptops. But they are not.

They're using qualified statements like "98% of laptops sold this year" or "The latest competing mobile processor". Well gee, the vast majority of normal laptops are $400 dual-core Pentium units. What about the MILLIONS of Chromebooks with super slow Celerons that have been purchased by school districts this year? Additionally, the "latest" competing mobile processors would be from Intel, which still aren't remotely competitive with Renoir MT performance, or MT perf/watt in TDP constrained scenarios.

Don't get me wrong, I'm sure this chip will be blazing fast, but it's definitely not the world-ender some are making it out to be. If it was, Apple would be confidently saying as such.

guidryp said:
Sure, but 98% are not low end junk. I have no doubt these new "low end" Macs with will be better than most decent mid range laptops as well.

You need to look carefully how they qualified that statement. They didn't say "Faster than 98% of all laptops available for sale", they said "Faster than 98% of all laptops sold in the past year". If HP, Dell, and Acer have sold 25 million Chromebooks, 10 million Celeron and Pentium Laptops, and 1 million Tiger Lake and Renoir systems, that 98% wouldn't be very hard to achieve at all.

amosliu137 said:
Someone test cinebench r23. Compared with 9750h by “我用第三人称”，apple chip is very impressive.

If that is real, it sort of confirms my suspicions. R23 scores are about 2.56x higher than R20 scores. A 4700U power limited to 10w will score around 1600 R20 MT points: https://docs.google.com/spreadsheets/d/1svDb5U2xtju1_sn1pB4hLzeJX3QySIOwf8w2vKyJvD8/edit#gid=0 Which would result in an R23 score of roughly ~4100, which compares very well to that Apple chip.

Edit: Looks like it may be an A12Z chip from a Transition Development Kit. So I suspect the M1 will have quite a bit of performance uplift compared to it. But the Transition Development Kit isn't a 10w TDP constrained Macbook Air either...

insertcarehere · Nov 11, 2020

guidryp said:
IMO, Apples issue with MT workload comparisons will be the lack of SMT. SMT gives AMD/Intel ~30% extra performance out of MT workloads. So 4 cores can provide over 500% MT scaling over ST performance with SMT. 4 Apple cores without SMT would obviously be limited to 400% scaling or less.

Apple has their own version of quasi-SMT using those super efficient icestorm cores that can add to MT total throughput, with 25-30% of the big cores performance at <= 10% power draw, with the added bonus of not needing to load the main core if not needed for tasks.

Eug · Nov 11, 2020

amosliu137 said:
Someone test cinebench r23. Compared with 9750h by “我用第三人称”，apple chip is very impressive.

Interesting. I didn't know Cinebench R23 had gone native. It turns out the press release for that only came out an hour ago.

How good is Cinebench at determining clock speeds? It says 2.5 GHz single-core and 2.3 GHz multi-core.

amrnuke · Nov 11, 2020

guidryp said:
IMO, Apples issue with MT workload comparisons will be the lack of SMT. SMT gives AMD/Intel ~30% extra performance out of MT workloads. So 4 cores can provide over 500% MT scaling over ST performance with SMT. 4 Apple cores without SMT would obviously be limited to 400% scaling or less.

But people overrate how much MT performance most users need. Cinebench is a benchmark for 3D rendering, something >95% of home users don't do at all. Video compression has dedicated HW in the SoC, and you fast run out of use cases for home computer MT workloads.

So, these new parts for low end home Macs, will like over-serve most of that market.

We probably have a long wait for the ARM based Mac Pros, that will show what Apple has for the serious MT capability.

Agree, no one is rendering on a MacBook Air. But Photoshop, Illustrator, Final Cut, Lightroom, etc all leverage more cores and such apps were heavily featured in Apple's presentation. So I think it's at least reasonable to evaluate whether these single-core claims pan out to multi-core real world usage situations.

As for what's really required, let's be honest, most people aren't going to use even the 13" MBP for the above tasks. For most Western world users, their smartphone is sufficient. But some people like having a proper keyboard, I don't blame them. In that case, honestly, I'm not sure what major benefit the MBA with M1 would have over an iPad + Smart Keyboard for most users. If people are buying a MacBook Air or MBP 13" in the hopes of having a device with an actual keyboard and trackpad, then they'd probably be better-served by an iPad $329 + Smart Keyboard $159 and pocket the $500 difference.

As for evaluating MT capabilities, we should be able to compare a Renoir 4300GE to the M1 - the M1 has 4 big and 4 little, which is somewhat analogous to AMD's 4 big + SMT setup.

Discussion Apple Silicon SoC thread

Lifer

Lifer

Golden Member

Senior member

Senior member

Lifer

Junior Member

Lifer

Diamond Member

Lifer

Platinum Member

Golden Member

Junior Member

Junior Member

Senior member

Diamond Member

Lifer

Lifer

Golden Member

Diamond Member

Member

Attachments

Diamond Member

Member

Senior member

Lifer

Golden Member