Discussion Apple Silicon SoC thread

Eug · Nov 10, 2020

M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:

M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:

Page 78 - Discussion - Apple Silicon SoC thread

Page 78 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M1 Ultra discussion here:

Page 109 - Discussion - Apple Silicon SoC thread

Page 109 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M2 discussion here:

Page 127 - Discussion - Apple Silicon SoC thread

Page 127 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:

Page 215 - Discussion - Apple Silicon SoC thread

Page 215 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M4 Family discussion here:

Page 263 - Discussion - Apple Silicon SoC thread

Page 263 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M5 Family discussion here:

Page 431 - Discussion - Apple Silicon SoC thread

Page 431 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

BorisTheBlade82 · Nov 2, 2021

DrMrLordX said:
It's about what the CPU can do unaided by dedicated hardware, period.

While I do fully agree with your concern, is there any indication that the M1 family only does well in workloads where it has dedicated hardware? I mean I consider SPEC to be a very wide set of workloads. And there are no obvious bumps. While M1 seems to have some small weakness WRT CB23, that basically just means that it is "only" on par with TGL in ST. That won't be enough for ADL but otherwise it is not too shabby.

DrMrLordX · Nov 2, 2021

BorisTheBlade82 said:
While I do fully agree with your concern, is there any indication that the M1 family only does well in workloads where it has dedicated hardware?

That's what I wish to discover for myself and (ideally) for the benefit of the community as a whole. Cinebench R23 indicates that it might be the case, but I'm given the assurance that CBR23 might not be properly optimized for M1/M1 Pro/M1 Max (probably something to do with half-arsed NEON support). At least in the case of software that can be compiled from source, you can have some reasonable expectation as to which ISA extensions are properly supported by the benchmark.

Doug S · Nov 2, 2021

Geegeeoh said:
I have over 20 TB of h264 video. I'm transcoding to h265 to save space.
Sadly it looks like hardware acceleration won't help a bit.

I don't think many have this need.

Even if you had hardware acceleration why would you care? I assume there is no pressing timetable for this, so whether it takes an hour, a day or a month is irrelevant to you since you don't have to sit around twiddling your thumbs waiting for it to finish.

And in another few years you'll do the same thing transcoding to h.266

Hitman928 · Nov 2, 2021

igor_kavinski said:
M1-max Benchmarks - OpenBenchmarking.org

Unfortunately, I can't figure out how to do a comparison with other CPUs. Help, please?

Here's a link with some AMD and Intel desktop results.

https://openbenchmarking.org/result/2011040-FI-AMDRYZEN976&sgm=1&swl=1&ppd_Q29yZSBpOSAxMDkwMEs=549&rmm=Ryzen+5+2600%2CRyzen+5+2600X&ppd_Unl6ZW4gOSAzOTAwWFQ=455&ppd_Unl6ZW4gOSAzOTUwWA=709&hgv=Ryzen+9+5900X%2CRyzen+9+5950X&ppd_Unl6ZW4gOSA1OTAwWA=549&ppd_Unl6ZW4gOSA1OTUwWA=799&sor

Looks like the tests M1 were run on are pretty much single threaded/very lightly threaded. The M1 wins most of them against CML/Zen3 but Zen 3 does take the lead in a couple. Be very careful though as the person who ran the M1 tests is unknown and the compilers between the M1 and the x86 CPUs is very different (GCC9.3 for x86 vs Apple's version of Clang13 for M1) and there isn't much in terms of how the tests were compiled/run for the M1.

Carfax83 · Nov 2, 2021

Nothingness said:
I'd say browsing is also very demanding. Code size is much larger than encoder and uses many things encoding software usually don't (e.g. JIT). And it's much more used than encoding so its performance matters more to end users

Yeah browsing is also a good one as well. Browsers have definitely become more and more sophisticated and complex over the years, taking full advantage of multicore/multithreaded CPUs and hardware acceleration.

Abwx · Nov 2, 2021

Test of a laptop at NBC, so 67W from the wall with a 68W adapter and 82W with a 120W one, probably that with a downsized PSU the difference is drained from the battery.

Apple MacBook Pro 14 2021 M1 Pro Laptop in Review: How much "Pro" do you get with the base model?

Notebookcheck reviews the brand-new Apple MacBook Pro 14 with the M1 Pro CPU, 16 GB RAM, Liquid Retina XDR-Display with 120 Hz and improved speakers.

www.notebookcheck.net

Eug · Nov 2, 2021

Doug S said:
Even if you had hardware acceleration why would you care? I assume there is no pressing timetable for this, so whether it takes an hour, a day or a month is irrelevant to you since you don't have to sit around twiddling your thumbs waiting for it to finish.

And in another few years you'll do the same thing transcoding to h.266

For me, multitasking performance, power utilization, battery life, and fan noise.

That's why I stuck with my 2010 iMac until 2017. I refused to get any Mac without hardware h.265 HEVC acceleration. That didn't arrive until Kaby Lake. Same with my MacBook. It was even worse for the laptop since I was stuck on a 2009 Core 2 Duo MacBook Pro until 2017.

However, I don't need ProRes acceleration, and probably won't need it for the foreseeable future (next few years)... but that's just me.

For those doing editing with ProRes (which is a lot of video oriented people), ProRes acceleration isn't just about encoding speed. As I've mentioned before, it's about editing smoothness. If you're dealing with multiple streams of 4K or 8K ProRes footage, you're usually not gonna get smooth scrubbing unless you have a 16-core Xeon or something, and maybe you might even need a $2500 Afterburner card... or else you could just get a M1 Pro/Max laptop with hardware ProRes acceleration. Furthermore, with hardware acceleration, depending upon the project you might even be able to render in the background while continuing to do your other work.

EDIT:

BTW, since I'm talking about my Kaby Lake Macs, I'll give you some practical performance measures.

Sony Nature 4Kp60 10-bit HDR h.265 video clip. I think it was around 77 Mbps. Note that Apple didn't build in hardware h.265 accelerated video playback into OS X until 10.13 High Sierra.

Core i7-7700K iMac without hardware h.265 10-bit decode (OS X 10.12 Sierra) - Cannot play back clip cleanly, and the fans are at maximum. Cannot multitask. This machine scores about 4800 in Geekbench 5 multi-core.

Core i7-7600 iMac with hardware h.265 10-bit decode (OS X 10.13 High Sierra) - Clean playback, with about 10% CPU usage. Machine is silent. Multitasking is very smooth. This machine scores about 3800 in Geekbench 5 multi-core.

Core m3-7Y32 12" MacBook with hardware h.265 10-bit decode (OS X 10.13 High Sierra) - Clean playback, with about 25% CPU usage. Machine is well... fanless. Multitasking is OK. This machine scores about 1600 in Geekbench 5 multi-core.

---

I did make one miscalculation with my 2017 Mac purchases though. I assumed in 2017 that we'd get 4K Netflix and iTunes support by 2018 on any Mac that was Kaby Lake or later. That didn't happen. It turns out Apple requires T2 for that, and that didn't show up in the iMac until 2020. (On Windows, only Kaby Lake is required.) So I guess that's fine, as I'm glad I didn't wait for that feature, or I would have been waiting an awfully long time. As for the 12" MacBook, the last model was 2017 anyway.

biostud · Nov 3, 2021

To balance a SoC:

When Apple decide to do their M1 series they could more or less put whatever they wanted into the SoC, Performance cores, Efficiency cores, GPU units, HW accelerators etc and to me it seems like they have done an excellent job of balancing the transistor budget between CPU/GPU/dedicated HW acc units./" the rest".

Obviously dedicated HW acceleration units might become obsolete down the road, but so will an "old CPU". But what it can do now for running multiple video streams in ProRes compared to 1 or 2 extra CPU cores, is worth much more.

Does anyone knows the "silicon cost" of HW accelerators vs CPU cores?

DrMrLordX · Nov 3, 2021

biostud said:
Does anyone knows the "silicon cost" of HW accelerators vs CPU cores?

You would need a proper study of a die shot of the core in question to guess at how much of the area was committed to dedicated hw.

biostud · Nov 3, 2021

DrMrLordX said:
You would need a proper study of a die shot of the core in question to guess at how much of the area was committed to dedicated hw.

I know. I just think to qualify the discussion about HW acceleration, you need to know how much die area is dedicated to it.

DrMrLordX · Nov 3, 2021

biostud said:
I know. I just think to qualify the discussion about HW acceleration, you need to know how much die area is dedicated to it.

That would certainly put into perspective the issue of how much die area is committed to hw acceleration versus, you know, something else. Cache, extra cores, whatever.

Eug · Nov 3, 2021

I’m curious as to what M2 will be. Will it include ProRes acceleration? The answer to that is undoubtedly yes, since A15 already includes ProRes acceleration. I too am interested to know how much die space it takes up, but I figure it can’t be that much since this iPhone SoC already has it.

BTW, while I personally won’t likely need ProRes any time soon, at this point I think anyone wanting to do anything at all with ProRes should stay away from M1 Macs if they can. M1 Pro and M1 Max yes, but M1 no. So that means 14” or 16” MacBook Pro, or wait for 2022 to get an M1 Pro/Max Mac mini or 27” iMac, or else get an M2 machine when it comes out. (Or of course, if a hardcore professional, get a 2022 Mac Pro.)

dr1337 · Nov 3, 2021

Doug S said:
Even if you had hardware acceleration why would you care? I assume there is no pressing timetable for this, so whether it takes an hour, a day or a month is irrelevant to you since you don't have to sit around twiddling your thumbs waiting for it to finish.

Lol you sound like the kind of person who's used a fx4300 for a decade and still likes the performance. Seriously, if you're gonna make that argument, then you're also saying theres no reason for the m1 max to exist.

Frankly I think we all want more computer performance. Just because someone has a workload that they can wait on to finish, doesn't mean thats anywhere remotely ideal. Not to mention some people actually have to pay for their own electricity...

Roland00Address · Nov 3, 2021

biostud said:
Does anyone knows the "silicon cost" of HW accelerators vs CPU cores?

It is not something that can be described in a simple variable or a simple line.

For example what is silicon cost? Are we talking passive silicon such as die size, or active silicon in use? What I mean by that is we can not have 100% of the physical silicon in use at a time both due to thermals, but also physical wires to all the transistors due to power reasons. Thus while Hardware Accelerators take up die space on silicon, just like CPU cores, GPU core, cache, and dozens of other things…

Hardware Accelerators are actually more energy efficient performance per watt. And thus actually can allow you to use more of the chip active at a time, or a higher frequency active at a time where your hardware accelerators on one side of the chip is active, and your generalized cpu core chips are also active on the chip and the physical location of where you place these components can help determine at what frequency you can run both parts due to both thermal and power reasons.

In sum this stuff gets real complicated real quick, much like real world architecture where you are designing a building there is physics involved, but an architect also thinks like an art student not just for aesthetic reasons, but how you design the support columns and the foundations can determine the rest of the building. This is why both AMD and Intel are now using machine learning to figure out dozens of different ways to place several micro components in different places, give a projection, and then human hands and minds choose what they think is the best design of where to place the raw different types of silicon like CPU, GPU, Hardware Accelerator cores etc… and this has both increased physical density but also improved performance and efficincy.

Doug S · Nov 3, 2021

dr1337 said:
Lol you sound like the kind of person who's used a fx4300 for a decade and still likes the performance. Seriously, if you're gonna make that argument, then you're also saying theres no reason for the m1 max to exist.

Frankly I think we all want more computer performance. Just because someone has a workload that they can wait on to finish, doesn't mean thats anywhere remotely ideal. Not to mention some people actually have to pay for their own electricity...

What he's describing is a one time workload. He converts his 20 TB of MPEG4 to HEVC, then he doesn't need to do it again. That's why it doesn't really matter how long it takes. It would be a different matter if he said he has a new 20 TB of stuff to convert every week.

Geegeeoh · Nov 3, 2021

20TB, not a bit more or less. Once and never ever again.

/sarcasm off

jamescox · Nov 3, 2021

DrMrLordX said:
It's about what the CPU can do unaided by dedicated hardware, period. It's about, what can your system do with a new workload you weren't planning to run when you bought the machine, or a workload that the manufacturer decided not to support with dedicated hw?

You really don't know? People have been building encode boxes around here (and on other enthusiast forums and A/V forums) for decades now. Media PCs are expected to carry out those tasks at numerous different quality levels and/or resolutions that may not be supported by dedicated hw.

That sounds like an attempt at “future proofing” and, in my experience, that almost never actually works. One might be faster now, but when you look back on it a few years from now, the performance differences are likely going to be very small compared to modern hardware. I would look at encoding benchmarks myself to get an idea of where the processor stands, but I am not a video professional so the ProRes acceleration is irrelevant to me. For someone who actually uses ProRes, having hardware acceleration for it is going to be the deciding factor. There generally isn’t really any comparison between hardware accelerated and purely software solutions.

MacBooks are expensive, high end machines. People usually use them for quite a long time. If you are really concerned about some unknown application rendering everything before it obsolete, then you probably should just buy a cheap machine and upgrade every 2 years or something like that rather than buying an expensive, professional oriented machine hoping it is somehow “future proof”.

biostud · Nov 4, 2021

Isn't the main reason for hardware acceleration smooth playback (and power savings) in real-time when editing on a laptop? Sure it is also nice when encoding, but if your job requires lots of encoding, you probably have some kind of render farm?

arandomguy · Nov 4, 2021

The content creation economy is more and distributed now and exists at multiple levels of engagement.

Access and being accustomed to technology also means more people are getting into content creation in general without any understanding or interest in the actual technical aspects behind it (akin to the difference between driving a car and knowing how to fix a car). Plenty of people just want to create on their laptop and do so.

I actually think tech oriented enthusiasts are often out of touch with how people use technology outside of that circle.

Mopetar · Nov 4, 2021

biostud said:
Isn't the main reason for hardware acceleration smooth playback (and power savings) in real-time when editing on a laptop? Sure it is also nice when encoding, but if your job requires lots of encoding, you probably have some kind of render farm?

Maybe if you're part of a larger company, but people who work freelance or are part of a small firm probably don't. If the machines handling the rendering are a few years old (because who's replacing their render farm on any kind of annual basis?) a new laptop with dedicated hardware might not be all that far off in terms of performance.

Although smooth editing feels nice, quicker turnaround is probably a bigger deal because it means you can be more productive in a given day. Most editors aren't dealing with the insane kind of blockbuster films that have all kinds of CGI and other special effects that necessitates handing it off to a render farm to get done in a timely manner in the first place.

nxre · Nov 4, 2021

Some fun comparisons based on Anandtech's data

Alder Lake p-core is ~10% faster than Firestorm in SpecInt at 5x the power.
Alder Lake e-core is ~40% slower than Firestorm in SpecInt at 2x the power.

Seems like Apple really made the right choice by designing their own silicon, the efficiency gap is going nowhere.

dmens · Nov 4, 2021

nxre said:
Some fun comparisons based on Anandtech's data

Alder Lake p-core is ~10% faster than Firestorm in SpecInt at 5x the power.
Alder Lake e-core is ~40% slower than Firestorm in SpecInt at 2x the power.

Seems like Apple really made the right choice by designing their own silicon, the efficiency gap is going nowhere.

Heh, I said the new atoms will suck down 5 watts, looks like I was off by a factor of 2... in the correct direction. The efficiency gap is already increasing. Try comparing the Apple Icestorm core vs the new Atom for extra laughs.

jeanlain · Nov 4, 2021

nxre said:
Some fun comparisons based on Anandtech's data

Alder Lake p-core is ~10% faster than Firestorm in SpecInt at 5x the power.
Alder Lake e-core is ~40% slower than Firestorm in SpecInt at 2x the power.

Interesting. Is it based on these numbers?

In SPEC, in terms of package power, the P-cores averaged 25.3W in the integer suite and 29.2W in the FP suite, in contrast to respectively 10.7W and 11.5W for the E-cores, both under single-threaded scenarios. Idle package power ran in at 1.9W.

nxre · Nov 4, 2021

jeanlain said:
Interesting. Is it based on these numbers?

Yes, I used those numbers.

StinkyPinky · Nov 4, 2021

nxre said:
Some fun comparisons based on Anandtech's data

Alder Lake p-core is ~10% faster than Firestorm in SpecInt at 5x the power.
Alder Lake e-core is ~40% slower than Firestorm in SpecInt at 2x the power.

Seems like Apple really made the right choice by designing their own silicon, the efficiency gap is going nowhere.

Gotta wonder how they are going to compete in the mobile space with those numbers. Desktop space not so important for power, especially as we don't know how the M1 will scale up.

Of course some would say that they are different platform and they only need to beat AMD, but I disagree. There is enough crossover to say they are in direct competition.

Discussion Apple Silicon SoC thread

Lifer

Senior member

Lifer

Diamond Member

Diamond Member

Diamond Member

Lifer

Lifer

Lifer

Lifer

Lifer

Lifer

Lifer

Senior member

Platinum Member

Diamond Member

Member

Senior member

Lifer

Senior member

Diamond Member

Member

Platinum Member

Member

Member

Diamond Member