Discussion Apple Silicon SoC thread

Page 24 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Eug

Lifer
Mar 11, 2000
23,586
1,000
126
M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:

Screen-Shot-2021-10-18-at-1.20.47-PM.jpg

M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:


M1 Ultra discussion here:


M2 discussion here:


Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:

 
Last edited:

itsmydamnation

Platinum Member
Feb 6, 2011
2,764
3,131
136
for reference

single core R23 on a 4700u

my idle : 3.x watts package power consumption ( ~ 5-10% constant cpu usage)
1.x of that being CPU cores

R23 single thread:
Score : 1151
pinned via affinity to 1 core
single core clock 4.2ghz
12.x watts package power consumption
9.x watts power consumption on said core

R23 mutli thread (8C/8T)
Score: 5851
Core clocks: 3.0ghz all cores
TDP limit 17watts
17.x watts package power consumption
2.x watts per core power consumption

will do again with higher performance mode on


R23 mutli thread (8C/8T)
Score: 6606
Core clocks: 3.3ghz all cores
TDP limit 25watts
25.x watts package power consumption
3.[0-2] watter per core power consumption
 
Last edited:

LightningZ71

Golden Member
Mar 10, 2017
1,627
1,898
136
Who is this "we" you are speaking for, because I certainly don't believe the Mac Pro will with 16 performance cores is the best case.

I don't think outsiders have any idea at all, what kind of Topology Apple will use for an ARM based Mac Pro/iMac Pro, but I would expect them to abandon efficiency cores, since the benefit from the die space used will largely be wasted on a non mobile, high performance workstation.

As far as 64 Core AMD, outperforming the iMac Pro in number crunching, that is already the case today, when Mac Pro maxes out with a 28 core Xeon. I don't think it's an issue for the target Audience.

What Apple will need to show is outperforming the 28 Core Xeon Mac Pros. Which I expect they will do.

I am very excited to see what the ARM Mac Pro will look like. So many unknowns on topology of both the CPU and GPU portions.

So, to be clear, what you, and a few other posters are proposing, is that Apple corp, with absolutely no history of doing anything like it in the past, is generating a large, high core count, processor package that is either hugely monolithic or a large collection of smaller dies and some sort of I/O mechanism, that has at least a competitive collection of I/O capabilities and memory bandwidth with what should be the next HCC Xeon that would be deployed in the Mac Pro? And, that Apple can justify the truly massive amount of R&D cost to its shareholders for such a product when they have no internal use for such a product (that we know of) and have nowhere near the number of sold systems in the past decade in that product segment to justify amortizing those costs across?

I realize that Apple must be the second coming of silicon jebus or something, but this is beyond the pale even for the collection of ARM and Apple evangelists that are polluting this thread.

It is far, FAR more likely that Apple will continue what they have been doing for the last two decades, iterating their existing chip designs a bit more each time. It is wishful thinking that they will produce anything that is much more than twice the total die area of the M1 for their next chip. N5 is a new node. Period. It has a massive cost per wafer as we have already seen in these forums. Just the cost per chip for something at twice the die area of M1 will make it massively expensive, even at yields of 90%+. That iteration looks a lot like an 8+8 setup, or something very similar. There's no sense in making a larger chip than that as it makes even the low end Mac Pros extremely expensive if they are to be feature complete with the higher end ones, save for core count or achieved frequency. So, how do they achieve the throughput needed for a high end Mac Pro without a massive chip? They have to glue them together. That's not exactly exotic in the industry at present and makes perfect sense for what they need. The glue logic needed should be doable for them.

Apple didn't get to be one of the richest companies on Earth by making poor money decisions. Dumping massive piles of money into a low volume processor design that couldn't hope to pay for its R&D costs at ten time the previous version's sales numbers is definitely a stupid move.
 
  • Like
Reactions: Tlh97

jeanlain

Member
Oct 26, 2020
149
122
86
Yeah, I tried doing this as well, extrapolating potential Cinebench scores based on the Geekbench delta between A12Z and M1. It's not a stupid way to guess, as if you compare GB and Cinebench for A12Z and the Intel chips they track pretty well. A (wildly) speculative, extrapolated 1537 would put the M1 faster than the 15w Tiger Lake ST, and in the ballpark of the 28w Tiger Lake ST. Which sounds about right. The same math on the MT scores would put the M1 just shy of 7000, while the 28w TGL sits around 6000. Again, sounds likely.
First cinebench result from the new MacBook pro : 7508
 

IvanKaramazov

Member
Jun 29, 2020
56
102
66
The 4800U gives a lower multicore geekbench score than the M1. It would appear that cinebench is less favourable to Apple SoCs.
Yep, ST Cinebench slots it right at the TGL 28w 1165G7 and a bit lower than the 1185G7, so same story there.

I suppose the next step to knowing how to evaluate this is people testing the actual package and core power draw during the test. i.e. is the M1 running that at 15w, 20w, 25w...

EDIT - If it wasn't @never_released I wouldn't give it credit, but he's claiming 11.0.1 overclocks the cores slightly. The Macs are shipping on 11.0. Wouldn't expect that to make a huge difference though, even if he's right.
 

name99

Senior member
Sep 11, 2010
404
303
136
Yep, ST Cinebench slots it right at the TGL 28w 1165G7 and a bit lower than the 1185G7, so same story there.

I suppose the next step to knowing how to evaluate this is people testing the actual package and core power draw during the test. i.e. is the M1 running that at 15w, 20w, 25w...

EDIT - If it wasn't @never_released I wouldn't give it credit, but he's claiming 11.0.1 overclocks the cores slightly. The Macs are shipping on 11.0. Wouldn't expect that to make a huge difference though, even if he's right.

Ultimately this isn't surprising.
We saw that the performance controller for the A14 shipped out in sub-optimal form and was fixed (at least to some extent) in 14.2
It would not be weird for the M1 to follow that same trajectory, shipping out with a suboptimal (but guaranteed safe) performance controller, and at least one tweak (maybe more later) as the chip is characterized and pushed to extremes in Apple labs.
 

Heartbreaker

Diamond Member
Apr 3, 2006
4,226
5,228
136
So, to be clear, what you, and a few other posters are proposing, is that Apple corp, with absolutely no history of doing anything like it in the past, is generating a large, high core count, processor package that is either hugely monolithic or a large collection of smaller dies and some sort of I/O mechanism, that has at least a competitive collection of I/O capabilities and memory bandwidth with what should be the next HCC Xeon that would be deployed in the Mac Pro? And, that Apple can justify the truly massive amount of R&D cost to its shareholders for such a product when they have no internal use for such a product (that we know of) and have nowhere near the number of sold systems in the past decade in that product segment to justify amortizing those costs across?

There is nothing novel about putting more CPU cores in a package, nor in adding a wider bus, that requires significant R&D for a company Apples size.

That is trivial compared to the pushing the edge of state of the art Core design.

The only real issue is the tape out and masking costs vs the volumes expected. This would tend to favor a chiplet design reused over many years to amortize that die for the lower volume of Mac Pro and iMac Pro.

Which is why it will be interesting to see what Apple does for the Pro lineup.

But if Apple doesn't offer something exceeding 28 core Xeon performance, then they are regressing, which seems unlikely.
 

name99

Senior member
Sep 11, 2010
404
303
136
So, to be clear, what you, and a few other posters are proposing, is that Apple corp, with absolutely no history of doing anything like it in the past, is generating a large, high core count, processor package that is either hugely monolithic or a large collection of smaller dies and some sort of I/O mechanism, that has at least a competitive collection of I/O capabilities and memory bandwidth with what should be the next HCC Xeon that would be deployed in the Mac Pro? And, that Apple can justify the truly massive amount of R&D cost to its shareholders for such a product when they have no internal use for such a product (that we know of) and have nowhere near the number of sold systems in the past decade in that product segment to justify amortizing those costs across?

I realize that Apple must be the second coming of silicon jebus or something, but this is beyond the pale even for the collection of ARM and Apple evangelists that are polluting this thread.

It is far, FAR more likely that Apple will continue what they have been doing for the last two decades, iterating their existing chip designs a bit more each time. It is wishful thinking that they will produce anything that is much more than twice the total die area of the M1 for their next chip. N5 is a new node. Period. It has a massive cost per wafer as we have already seen in these forums. Just the cost per chip for something at twice the die area of M1 will make it massively expensive, even at yields of 90%+. That iteration looks a lot like an 8+8 setup, or something very similar. There's no sense in making a larger chip than that as it makes even the low end Mac Pros extremely expensive if they are to be feature complete with the higher end ones, save for core count or achieved frequency. So, how do they achieve the throughput needed for a high end Mac Pro without a massive chip? They have to glue them together. That's not exactly exotic in the industry at present and makes perfect sense for what they need. The glue logic needed should be doable for them.

Apple didn't get to be one of the richest companies on Earth by making poor money decisions. Dumping massive piles of money into a low volume processor design that couldn't hope to pay for its R&D costs at ten time the previous version's sales numbers is definitely a stupid move.

That's exactly what we're proposing.
And the fact that you find it unlikely shows that you haven't been paying attention.
This was perhaps (barely) excusable in 2007. In 2020???


And I think you misjudge the economics of new processes:
"It has a massive cost per wafer as we have already seen in these forums. "
The general consensus among those who study these things (like Scotten Jones) is that the original report that claimed this massive jump in 5nm wafer costs is very wrong. Correct direction, sure, but wildly off in the almost 2x jump. Of course since Scotten makes his living by selling these sorts of numbers, he won't give us what he considers to be a realistic cost :-(

The A13 was budgeted at around $64 (again by those who model these things). If we bump it by 50% (this is considered aggressive when you look at the iPhone 12 BOM) we're at $90.
So we can take an M1 at maybe $135 or so?, about 1.5x (quibble about the price of the RAM if you insist on wasting time).

The actual silicon cost, for devices starting at $5K, just does not seem that high. Double the area (not a perfect match -- you can toss one set of the media stuff, ISP, security and a bunch of other stuff; but you probably want to bump the large CPU's to let's say at least 16) and you're still at a cost of $300; can slap two in the highest end Mac Pro's.

But this is still somewhat unrealistic. The large SoC's will be based on the A15, and everything we know (Apple history, why these devices are fast, Kirin 9000) suggests transistor density will rise by 25% or so, meaning prices fall 25%. Then there is the question of how aggressively Apple want to start down the new path of chiplets.

Bottom line is, I see zero reason for your argument. As the simplest rejoinder, why can Apple not do exactly what AMD did? Common stuff in the middle, chiplets surrounding it. Solution for everything from high end Mac Pro and Apple internal servers down to single chiplets for the iMac and MBP?
There are many ways one can imagine splitting the functionality between chiplets, and arguing about this right now is not really productive. But as a generic concept, it sets the floor on what is feasible at a low budget. And if Apple are willing to pay more than a low budget for whatever reason...

You seem to imagine that the only market for these cores in annual Mac Pro sales. I say that's misjudged in multiple ways.
First there will be Apple internal uses (data warehouses next year, cars in a few years?)
Second these chips don't need to be updated every year. We already see that the iPad SoCs are updated every two years or so and that's worked out fine. It wouldn't surprise me if the A15 "Xeon equivalents" last two years, are replaced fixing whatever was considered most immediately problematic in the design, and then get updated every three years or so.
Even at this slow pace, there are various ways that Apple can bump these pro models occasionally should they want to to generate a little excitement (as we saw with the A12X to A12Z iPad bump).
 

name99

Senior member
Sep 11, 2010
404
303
136
Those Cinebench r23 results put it at approximately 18.5% faster ppc versus the 12Z score and 40% faster ppc versus Zen3.

Even that Zen3 ppc is misleading unless you are using single-threaded zen3. Most of the "single core" Cinebench results are single core but SMT.

I'm no interest in fighting this, but from an understanding point of view, if you want to compare like with like as much as possible at a *throughput* level, then you should scale up the M1 ST result by about 25% to account for a companion small core, since (more or less) Apple's solution for "lightweight low area extra throughput" is performance cores, not SMT.
You can see from the MT result that a performance core is worth around 25% of a large core.
 

Hitman928

Diamond Member
Apr 15, 2012
5,243
7,792
136
Even that Zen3 ppc is misleading unless you are using single-threaded zen3. Most of the "single core" Cinebench results are single core but SMT.

I'm no interest in fighting this, but from an understanding point of view, if you want to compare like with like as much as possible at a *throughput* level, then you should scale up the M1 ST result by about 25% to account for a companion small core, since (more or less) Apple's solution for "lightweight low area extra throughput" is performance cores, not SMT.
You can see from the MT result that a performance core is worth around 25% of a large core.

I did my calculation on single core simulation only. MT (SMT or not) doesn't come into play at all.
 
  • Like
Reactions: Tlh97

name99

Senior member
Sep 11, 2010
404
303
136
On the other hand we have this:

Of course that was Cinebench R15, but it was also only a year ago.
Honestly nothing about the "traditional" x86 world, including how they do benchmarking, makes any sense to me. Did Cinebench make some massive change in the past year for how they handle SMT?

If we compare the MT to ST numbers for x86 the 6 core CPUs (best MT match to M1) get about 5x their ST cores. I took that as meaning the base unit was a single CPU (using HT, which is what most of the result on their "single CPU" page indicate)
and they were getting sub-optimal scaling when using all 6 cores.

I guess it could be that we are seeing 1 thread vs 12 SMT threads, and the falloff (not just 5/6, but 5/6* ~1/1.25) so about 2/3 is thermal throttling?
 
Last edited:

name99

Senior member
Sep 11, 2010
404
303
136
On the other hand we have this:

Of course that was Cinebench R15, but it was also only a year ago.
Honestly nothing about the "traditional" x86 world, including how they do benchmarking, makes any sense to me. Did Cinebench make some massive change in the past year for how they handle SMT?
On the other hand we have this:

Of course that was Cinebench R15, but it was also only a year ago.
Honestly nothing about the "traditional" x86 world, including how they do benchmarking, makes any sense to me. Did Cinebench make some massive change in the past year for how they handle SMT?

If we compare the MT to ST numbers for x86 the 6 core CPUs (best MT match to M1) get about 5x their ST cores. I took that as meaning the base unit was a single CPU (using HT, which is what most of the result on their "single CPU" page indicate)
and they were getting sub-optimal scaling when using all 6 cores.

I guess it could be that we are seeing 1 thread vs 12 SMT threads, and the falloff (not just 5/6, but 5/6* ~1/1.25) so about 2/3 is thermal throttling?

OK, I understand.
- What Cinebench calls Single Core numbers really are single threaded numbers. Using the term core here, then badging most of the entries with HT, REALLY confuses the issue.
- The low quality scaling (2/3 as I said) really is a consequence of thermal throttling, it really is that bad, that you're losing 1/3 of your potential throughput.
Will be interesting to see how bad the throttling is on a passively cooled M1.
 
  • Like
Reactions: Antey

shady28

Platinum Member
Apr 11, 2004
2,520
397
126
One has to trawl through about 28 pages of GB results to find that 1716 score, which is paired with an anomalously low MT score for some reason. It's 8 pages at the moment to the first score above 1600. The theoretically faster 1185G7 is hardly shipping anywhere and only has 3 pages of results. Very few are above 1600 and the absolute highest reported is 1607.

The highest score currently reported for the new MBP is 1740, and the absolute lowest yet is 1559. And that is the sole score below 1600. And of course that leaves MT entirely out of the discussion, as well as the much faster GPU, and the likelihood that M1 is achieving its results at literally half the power of 28w TGL. Perhaps when a wider variety of metrics are available to compare, TGL will be competitive with M1. But based purely on what we know so far it doesn't seem likely.

The key phrase in my post was "This is a Linux platform though."

Linux is clearly scoring higher on GeekBench than the Windows OS'. There are quite a few Linux results for the Dell 9310. So OS, library optimization, compiler, are playing a part here. That was the meaning.

Here's another Linux result:

Capture.JPG
 

Antey

Member
Jul 4, 2019
105
153
116
He won the Internet today.

Scores are not revolutionary, even for mobile. Market disruption event not found.

if those M1 chips were AMD's and not Apple's i'm sure most of the people here would be calling them revolutionary. scores are super competitive, tdp and power consumption is super competitive (idle power consumption even more so). i'm not satisfied though, i want a deep dive review, all kind of benchmarks, power consumption tests, gaming? everything.
 

Eug

Lifer
Mar 11, 2000
23,586
1,000
126
He won the Internet today.

Scores are not revolutionary, even for mobile. Market disruption event not found.
What?! I’m not sure if you’re joking or not but those numbers are excellent for the target estimated TDP* class.

But yes, he did win the geek internet today. Very popular tweet.

*I say “estimated TDP” because we don’t actually know what the real effective TDP is.
 

Heartbreaker

Diamond Member
Apr 3, 2006
4,226
5,228
136
Scores are not revolutionary, even for mobile. Market disruption event not found.

I sense a lot of sour grapes for what is an amazing SoC.

The only CPU with better ST performance are Zen 3 Desktop machines, and that is vs a a low power notebook.

And that is the tip of the iceberg because while Zen 3 is just a desktop CPU, this SoC does everything from Graphics to ML, with super fast SSD controller, and probably a kitchen sink in there as well. ;)

There isn't even what I would consider a close second, when considering well rounded and powerful SoC for laptops.

If this was something that was a available to for Windows laptops, I bet a lot of the naysayers would be queuing up. But since it's not in their chosen ecosystem we get a lot of sour grapes.
 

shady28

Platinum Member
Apr 11, 2004
2,520
397
126
I sense a lot of sour grapes for what is an amazing SoC.

The only CPU with better ST performance are Zen 3 Desktop machines, and that is vs a a low power notebook.

And that is the tip of the iceberg because while Zen 3 is just a desktop CPU, this SoC does everything from Graphics to ML, with super fast SSD controller, and probably a kitchen sink in there as well. ;)

There isn't even what I would consider a close second, when considering well rounded and powerful SoC for laptops.

If this was something that was a available to for Windows laptops, I bet a lot of the naysayers would be queuing up. But since it's not in their chosen ecosystem we get a lot of sour grapes.

No you got me wrong, look at my earlier posts in this thread. I thought this might actually be revolutionary, as in "skip the willow cove and zen 3 go straight to what AMD and Intel will give you in 2022 or 2023" kind of leap. But the benchmarks show it is not that.

Look, a Tiger Lake can get 1650-1720 GB single thread running Linux on a Dell 9310. M1 is basically a peer to Willow Cove and Zen 3. We can dicker back and forth on details but if M1 were a generation ahead of these platforms I'd expect a solid undeniable +20-30% better than those platforms in *something* and at least a tie in everything else. This ain't it.
 

name99

Senior member
Sep 11, 2010
404
303
136
I sense a lot of sour grapes for what is an amazing SoC.

The only CPU with better ST performance are Zen 3 Desktop machines, and that is vs a a low power notebook.

And that is the tip of the iceberg because while Zen 3 is just a desktop CPU, this SoC does everything from Graphics to ML, with super fast SSD controller, and probably a kitchen sink in there as well. ;)

There isn't even what I would consider a close second, when considering well rounded and powerful SoC for laptops.

If this was something that was a available to for Windows laptops, I bet a lot of the naysayers would be queuing up. But since it's not in their chosen ecosystem we get a lot of sour grapes.
394.jpg


Soon you'll be as mad as I am by this behavior!