Discussion Apple Silicon SoC thread

Eug · Nov 10, 2020

M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:

M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:

Page 78 - Discussion - Apple Silicon SoC thread

Page 78 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M1 Ultra discussion here:

Page 109 - Discussion - Apple Silicon SoC thread

Page 109 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M2 discussion here:

Page 127 - Discussion - Apple Silicon SoC thread

Page 127 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:

Page 215 - Discussion - Apple Silicon SoC thread

Page 215 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

Gideon · Nov 12, 2020

senttoschool said:
The market for pure Windows development is now very small. No software really only works for Windows nowadays. This isn't the early 2000s. Most applications have moved to the cloud, to the browser, or to mobile. Pure Windows apps are only for ultra niche professions.

Thus, if you're doing apps on the browser, mobile, or cloud, an ARM Mac is generally fine. I'm sure Docker support will be added very soon.

Yeah it will run, but problems will arrive with stacks that can't be switched to ARM linux:

ARM Mac: Why I'm Worried About Virtualization

bmalehorn.com

It won't be the end of the world and eventually it will go away (as ARM Linux machines gain prevalence) but it will take time.

biostud · Nov 12, 2020

senttoschool said:
Whoever reviews the M1 chip, I hope they will incorporate benchmarks for things like machine learning, HDR video processing, cryptography acceleration, and storage controller.

The M1 dedicates a lot of silicon to non-CPU and non-GPU accelerators. It would be unfair to only measure CPU and GPU tasks.

Yes, if you do a review of the platform. Since this is a soc, it is basically 4.Big+4.Little+accelerators configuration, and in the x86 there is no equivalent. Both Intel and AMD supports GPU hardware video encoding with the GPU/CPU so that could be compared to the M1 video encoding accelerators. In software that can benefit from the M1 Accelerators and software that cannot benefit from + 4 cores it seems like a very powerful chip. In sofware that is heavily threaded and cannot use the accelerators, I suspect a 5xxxU based on zen3 will be more powerful. And that leaves intel.....

biostud · Nov 12, 2020

Also this is a 5nm 16B chip vs 7nm 9.8B chip (AMD 4800U). If it wasn't a lot faster something would be very wrong with the design.

Qwertilot · Nov 12, 2020

That transistor count does of course include the iGpu and a whole number of other chips.

Vaguely interesting to wander if a lot of lap/even desktop (?!) designs from other people are ever going to go this way.

Choice and flexibility is nice, but there do seem to be some very large advantages to be had.
(Tangential to how amazingly good Apple's CPU cores are)

IntelUser2000 · Nov 12, 2020

IvanKaramazov said:
Unless I'm misunderstanding something, it's actually a bit faster MT than the Ryzen 9 4900HS, their higher-watt laptop part. The 4800U is a good bit slower than that.

Those scores are wrong.

If you search for it the top results for the 4800U is much higher. Since Geekbench is a user-submitted benchmark, especially in laptops you need to use top scores.

7200-7400 Multi is expected for 4800U.

darkswordsman17 · Nov 12, 2020

Qwertilot said:
That transistor count does of course include the iGpu and a whole number of other chips.

Vaguely interesting to wander if a lot of lap/even desktop (?!) designs from other people are ever going to go this way.

Choice and flexibility is nice, but there do seem to be some very large advantages to be had.
(Tangential to how amazingly good Apple's CPU cores are)

So does a modern APU. Not everything that Apple's is but its not that disparate. The main thing not there is the AI focused processing.

I think Intel's are moving to integrate more, and I think they're even banking their future on it, by looking to leverage multiple chips from different processes into a single package, enabling mixing and matching for different markets/customers. Don't be surprised if Apple goes that route as well in the future, even if just to scale up for higher end products like Mac Pro. AMDs likely will especially with the Xilinx deal. Nvidia's will, and I think many others already are (isn't adding AI pieces already part of the ARM roadmap?).

dacostafilipe · Nov 12, 2020

biostud said:
Also this is a 5nm 16B chip vs 7nm 9.8B chip (AMD 4800U). If it wasn't a lot faster something would be very wrong with the design.

And the special RAM configuration really should help with latency, not something we will see in desktop in the near future because upgrades

IntelUser2000 · Nov 12, 2020

NeoLuxembourg said:
And the special RAM configuration really should help with latency, not something we will see in desktop in the near future because upgrades

I doubt its there for latency. It's similar to PoP DRAM configs on mobile chips. It's for saving board space.

The A14 gets stellar performance using PoP DRAM. The way M1 is packaged might allow larger DRAM configurations to be available than on typical PoP packages.

biostud said:
Also this is a 5nm 16B chip vs 7nm 9.8B chip (AMD 4800U). If it wasn't a lot faster something would be very wrong with the design.

Arguing what compiler version it uses or saying it has more transistors is silly considering the form factor and TDP differences.

Come on guys! I like x86 but what they have is far beyond what the x86 camp has. Look at the core block diagram!

Take a look at generational improvements for Zen and Core and see how long it'll take to get there:

AMD Zen 3 Ryzen Deep Dive Review: 5950X, 5900X, 5800X and 5600X Tested

www.anandtech.com

Intel Sunny Cove Core To Deliver A Major Improvement In Single-Thread Performance, Bigger Improvements To Follow

Intel disclosed additional details about their upcoming Sunny Cove core, claiming a large improvement in single-thread performance.

fuse.wikichip.org

Core THAT wide won't be seen from Intel/AMD late next year with Golden Cove and Zen 4. Maybe with Redwood Cove and Zen 5.

Using Y-class 10W TDP envelope. Intel's Y-class chips are 20-30% behind U-class chips just in single thread.

I remember Apple A7. Bay Trail came out a week prior and was heralded by Intel as beating all mobile competitors. And it was true. Then the Apple A7 came and beat Bay Trail. It signaled something symbolic - that Intel's best effort during their peak won't be enough.

Intel ended up giving Bay Trail, and the 14nm shrink Cherry Trail for nearly free for 2 years before giving up entirely.

Entropyq3 · Nov 12, 2020

name99 said:
That's how this plays out in the real world; people who have no interest in belonging to the PC vs Apple tribe, they just have a job to do and want the tool that does it well.
Gamers are NEVER going to be on board because for most of them half the fun is building their own PC and dicking around with overclocking. That's not the experience Apple is selling and never will be, so there's zero point in Apple either try to appeal to them, or them complaining that Apple doesn't do what they want.

Nitpick on a pet peeve - you're not describing gamers, you are describing PC-tech enthusiasts. While the groups are not completely distinct, neither do they overlap all that much.
Gamers are unhappy with the relatively limited selection of quality titles under MacOS.
PC-tech enthusiasts are unhappy, because Apple hardware doesn't invite tinkering. (And they are not the target audience for Apples presentations.)

It's important to keep the two groups distinct in ones thinking, because the number of people who plays games is vastly greater than the number of people who want to muck around in the bowels of their hardware. (Although, todays wimpy overclockers don't have to break out the soldering irons, they flip a switch in an app (or BIOS if they are hardcore ;-)) and gain all of 4% performance since PC hardware IHV is already tuning their products within inches of their limits.)

dacostafilipe · Nov 12, 2020

IntelUser2000 said:
I doubt its there for latency. It's similar to PoP DRAM configs on mobile chips. It's for saving board space.

Yes, but having the DRAM on the same substrate will still have benefits in terms of latency, what in turn will help with performance.

IntelUser2000 said:
Come on guys! I like x86 but what they have is far beyond what the x86 camp has. Look at the core block diagram!

It has nothing to do with ISA. The SoC performance looks great, but it's on 5nm with a lot of transistors, you can't deny that!

IntelUser2000 · Nov 12, 2020

NeoLuxembourg said:
Yes, but having the DRAM on the same substrate will still have benefits in terms of latency, what in turn will help with performance.

The DRAM latency is far better on AMD/Intel than on Apple. If the implementation is equivalent, yes bringing it closer can yield benefits. Remember how the IMC on Athlon 64 brought 20-25% out of the 30% gains in ST, while doing the same with Nehalem brought 5-10%? That's because Intel started at a better memory controller and prefetching with Core 2.

Also, bringing it closer won't always mean its faster. The pre-Silvermont Atoms had the memory controller on die but performed same as the chipset MC parts. It has to be designed to take advantage of it.

It has nothing to do with ISA. The SoC performance looks great, but it's on 5nm with a lot of transistors, you can't deny that!

Yes I know. That's why I said x86 camp not x86. The two vendors are hopeless.

AmericanLocomotive · Nov 12, 2020

amosliu137 said:
A14 use clang 11.0 in iOS to get 1600. Intel in mac uses clang 11.0 too. So clang 12.0 is not magic here.

I'm having trouble finding results for Geekbench 5, but looking at Geekbench 4 results, there are pretty significant performance differences between the same Apple system running Windows through Bootcamp and running OSX.

If someone with Bootcamp could run GB5 under both OSX and Windows, that'd be pretty useful to make a comparison.

biostud said:
Also this is a 5nm 16B chip vs 7nm 9.8B chip (AMD 4800U). If it wasn't a lot faster something would be very wrong with the design.

Yup, that's another thing a lot of people seem to be overlooking. This is a massive chip - packing almost as many transistors as a 2080Ti, and over 2.5x the transistor count of single CCD Ryzen 3000 series.

That's 'gotta be a big reason why AMD and Intel haven't made their cores super wide. That requires transistors, and that costs money. That's fine when you're selling $1500 macbooks, not fine when you're trying to sell $500 entry level computers.

Heartbreaker · Nov 12, 2020

AmericanLocomotive said:
Yup, that's another thing a lot of people seem to be overlooking. This is a massive chip - packing almost as many transistors as a 2080Ti, and over 2.5x the transistor count of single CCD Ryzen 3000 series.

Why would you compare M1 transistor count, to a single chiplet 3000 series, unless, as I am suspecting given your recent ranting about the lack of Renoir comparisons, your motivation is primarily AMD advocacy?

You need to factor that the M1 is a full SoC, that has a very large percentage of transistors devoted to functions beyond CPU cores, that will obviously be lacking in a single Chiplet AMD Ryzen 3000, which doesn't even have GPU cores.

A large portion of the die is in the GPU, claimed to be the most powerful iGPU in a PC part, and another large portion is in the Neural engine. Then there will be other functions pulled in like security and SSD controllers.

While the cores are almost certainly the biggest in class, simply comparing total transistors, and making pronouncements without factoring the extra SoC functionality, seems disingenuous or ignorant.

shady28 · Nov 12, 2020

IntelUser2000 said:
Those scores are wrong.

If you search for it the top results for the 4800U is much higher. Since Geekbench is a user-submitted benchmark, especially in laptops you need to use top scores.

7200-7400 Multi is expected for 4800U.

Keep in mind the M1 has 4 slow cores, and 4 fast cores. Direct comparison to 8 core "conventional" chips are only good to find a frame of reference. Compared to the 4-core Tiger Lake 1185G7 for example, M1 is just a bit faster on multi (10-15%), and marginally faster in single (<10%). Of course, M1 does it at 2/3 the clock speed.

Cache<->memory bandwidth / latency and so on factor in under real use, I have a suspicion that will be the M1's weak spot esp given how much work Apple did to the packaging \ chip layout related to memory.

Tuna-Fish · Nov 12, 2020

IntelUser2000 said:
Core THAT wide won't be seen from Intel/AMD late next year with Golden Cove and Zen 4. Maybe with Redwood Cove and Zen 5.

I don't think there will be a core truly that wide from x86 even then. Nor should there necessarily be, maintaining higher clocks might well be a better way to provide the performance.

The one big advantage ARM, especially 64-bit ARM has over x86 is that growing decode width grows it's power and complexity linearly. For ARM, 8-wide decode is ~ twice as power-hungry and large as a 4-wide decode. For x86, increasing decode width grows the complexity and power use of the decoders way faster than linearly, mainly because they have to build in massive muxes to line up instructions because instructions are wildly variable-width. This means that the ideal width of the machine from an engineering standpoint is much wider for an ARM machine than it is for x86. The width of M1 isn't free, they pay for it in many ways, including clock speed.

IvanKaramazov · Nov 12, 2020

IntelUser2000 said:
Those scores are wrong.

If you search for it the top results for the 4800U is much higher. Since Geekbench is a user-submitted benchmark, especially in laptops you need to use top scores.

7200-7400 Multi is expected for 4800U.

Responding to this because I suspect I'm misunderstanding how best to read these results. I thought precisely because it was a user-submitted benchmark, the more solid place to look for typical scores was in the Processor Benchmarks listing, which is what I linked. I believe those are normalized, averaged based on all submissions with outliers excluded. Searching for the 4800u does indeed show some scores above 7000, but they appear to be the rare exception rather than the rule. Obviously individual chips always show idiosyncrasies in one direction or the other.

Or is the idea that the higher scores at normal frequency are best, as all the rest are scored where users are running the benchmark with 12 Chrome tabs open and a video playing? Because that actually does make sense, now that I type it.

insertcarehere · Nov 12, 2020

AmericanLocomotive said:
Yup, that's another thing a lot of people seem to be overlooking. This is a massive chip - packing almost as many transistors as a 2080Ti, and over 2.5x the transistor count of single CCD Ryzen 3000 series.

That's 'gotta be a big reason why AMD and Intel haven't made their cores super wide. That requires transistors, and that costs money.

Transistors themselves don't cost money, the die space to house said transistors costs money. The fact that the Apple SoCs allow such high transistor densities (~16B transistors at 120mm^2, ~25% smaller than Renoir with ~60% more transistors) is a net positive, which is likely helped by the lower clock speeds of the Apple cores, as they should allow denser transistor layouts while still being adequate for heat dissipation.

beginner99 · Nov 12, 2020

AmericanLocomotive said:
Yup, that's another thing a lot of people seem to be overlooking. This is a massive chip - packing almost as many transistors as a 2080Ti, and over 2.5x the transistor count of single CCD Ryzen 3000 series.

That's 'gotta be a big reason why AMD and Intel haven't made their cores super wide. That requires transistors, and that costs money. That's fine when you're selling $1500 macbooks, not fine when you're trying to sell $500 entry level computers.

Fully agree and I have in the past written the same thing. Apple also gets it's efficiency at the cost of die space which as you say is fine if the cheapest product it will go into is the $999 MacBook air.

On top of that I repeat my previous statement that apple comes from consumer world where few wide cores have obvious benefits. Now they are scaling up. x86 comes from server world in which it is far easier to make use of many cores and they are scaling their cores down. It's clear Apple will have a significant advantage in consumer usage with their SOCs.

But again, it's irrelevant for me as it's a completely locked platform.

gdansk · Nov 12, 2020

I'm amazed they'll actually have these in stock. ETA for a MacBook Air says late November. Everything else cool this fall has been perpetually out of stock.

Doug S · Nov 12, 2020

guidryp said:
Cinebench has both scores, and the picture shown before had both.

Single: 987
Multi: 4530

MT Scaling Ratio: 4.59x - This is less scaling that Intel/AMD 4 Core with SMT, which is usually a bit over 5x.

So it's all about the performance core count. The efficiency cores don't add much.

If that's the locked Devkit with 2.5GHz, then I guess it will depend how fast they can run M1.

Any info about clock speeds for M1?

According to Andrei the "efficiency cores" in the A14/M1 are roughly equivalent to the big core in Apple's A9 (i.e. iPhone 6S generation) and it looks like the best GB5 scores for that were around 550 or 1/3 the performance of the A14/M1 big core. So even if they could scale perfectly they're looking at about 5.3x best case.

moinmoin · Nov 12, 2020

If there's one company that knows how to handle big hardware launches it's Apple. iPhone launches are proof enough of this imo.

defferoo · Nov 12, 2020

gdansk said:
I'm amazed they'll actually have these in stock. ETA for a MacBook Air says late November. Everything else cool this fall has been perpetually out of stock.

the volume for these are quite low compared to iPhones, probably a piece of cake for Apple to have enough stock.

Carfax83 · Nov 12, 2020

Tuna-Fish said:
I don't think there will be a core truly that wide from x86 even then. Nor should there necessarily be, maintaining higher clocks might well be a better way to provide the performance.

A lot of people always seem to downplay clock speed, acting like it's not important. Correct me if I'm wrong, but IPC x frequency = performance. So clock speed is an integral component to get good performance.

The 10900K competed very well with the 3900x and 3950x when it released despite having a very old architecture and less IPC, primarily due to having extremely high clock speeds and much lower memory latency.

Of course Zen 3 put an end to that. I hope that when the M1 releases, we have something better to compare it with than Geekbench and Spec2006.

The one big advantage ARM, especially 64-bit ARM has over x86 is that growing decode width grows it's power and complexity linearly. For ARM, 8-wide decode is ~ twice as power-hungry and large as a 4-wide decode. For x86, increasing decode width grows the complexity and power use of the decoders way faster than linearly, mainly because they have to build in massive muxes to line up instructions because instructions are wildly variable-width. This means that the ideal width of the machine from an engineering standpoint is much wider for an ARM machine than it is for x86. The width of M1 isn't free, they pay for it in many ways, including clock speed.

Doesn't x86-64 solved the decode width problem by using micro-op caches?

gdansk · Nov 12, 2020

defferoo said:
the volume for these are quite low compared to iPhones, probably a piece of cake for Apple to have enough stock.

No doubt but it's nice to see ETAs amid the lack of information from Sony, Microsoft, Nvidia, and AMD.

KeithP · Nov 12, 2020

Affinity has release M1 native versions of their designer apps. Native performance looks very promising…

To sum it up, M1 makes our apps run faster, smoother and feel more responsive than ever before (we’ve already even seen speed increases of over 3x faster running on the new MacBook Air). It’s definitely a big step forward for Mac, and we can’t wait to see how the rest of the Mac range develops with Apple silicon in the future.

So, at least with the MacBook AIr, it definitely seems like a big jump. The only question on the Air will be how much the lack of active cooling affects performance.

-KeithP

Discussion Apple Silicon SoC thread

Lifer

Golden Member

Lifer

Lifer

Golden Member

Elite Member

Lifer

Senior member

Elite Member

Junior Member

Senior member

Elite Member

Member

Diamond Member

Platinum Member

Golden Member

Member

Senior member

Diamond Member

Platinum Member

Platinum Member

Diamond Member

Member

Diamond Member

Platinum Member

Diamond Member