Discussion Apple Silicon SoC thread

Page 17 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Eug

Lifer
Mar 11, 2000
23,587
1,001
126
M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:

Screen-Shot-2021-10-18-at-1.20.47-PM.jpg

M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:


M1 Ultra discussion here:


M2 discussion here:


Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:

 
Last edited:

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
2.5 - 3 damn instructions per clock. In mixed FP workload (that is not FMA), -> is either "scalar" or vector 32bit and mixes instructions, AMD gets amazing throughput. it is usually not as bad for Intel due to other core resource limitations, but the point is that 2014 era hw can't compete with 2020 hw.

That's a good point, but as Hitman928 already said, the 2700x is based on Zen+ core, which has 128 bit paths.

And Apple A14 is even wider in execution ports and more relevant for GB5 type workloads and if native CB20 comes out for ARM, they will shine there as well.

I'll definitely be waiting for this one. FINALLY, SOMETHING OTHER THAN SPEC2006 and Geekbench! :D This will go a long way in determining whether wider, lower clocked architectures are more worthy of pursuit compared to higher clocked ones.
 
  • Like
Reactions: Tlh97

Heartbreaker

Diamond Member
Apr 3, 2006
4,228
5,228
136
That is not the way Apple thinks. They have said this hundreds of time, but people gonna believe what they want to believe...

Apple do not see their products competing with each other, they complement each other. The ideal Apple customer (and most of them are ideal, as close as their budget will allow) does not ask "should I get a watch OR a phone", they get both, because they do slightly different jobs. They do not ask "should I get an iPad OR a macbook, again they get both and use each as best for the job at hand".

I am sure Apple would love it if you bought an Apple Watch, and iPhone, and iPad, and Macbook, and iMac, and Homepod, etc...

But that wasn't the point.

Apple is extremely unlikely to start selling $800 Macbook Airs. That is what I was arguing against.

If you wan't an Apple computer for $800, it's going to be an iPad, not a Macbook.

This is about the people that think $800+ is too much for a Macbook. The answer obviously isn't buy an iPad and a Macbook that costs $800+. It's an iPad or nothing, because Apple is extremely unlikely to price MBA at $800.
 
  • Like
Reactions: Tlh97

Leeea

Diamond Member
Apr 3, 2020
3,625
5,368
136
This will go a long way in determining whether wider, lower clocked architectures are more worthy of pursuit compared to higher clocked ones.

This has always been a trade off. Look at Nvidia vs AMD. Nvidia has always traditionally gone wider with slower clocks. AMD traditionally goes narrow with high clocks. That ball has been bounced between the two for a long time. Nvidia has traditionally lead, but AMD has had its time in the light also. Rumor has it AMD will be back on top in a few days.

I think the big thing though is a narrower high clocked unit has a smaller physical size, so is much cheaper to produce. Both more chips per wafer and lower probability of bad chips.
 
  • Like
Reactions: Tlh97

Mopetar

Diamond Member
Jan 31, 2011
7,842
5,994
136
Go and ask for a low-end Ferrari in a respectable shop, see how fast the door finds you.

That was my point. There really isn't a "low-end" Ferarri unless you compare their least expensive model to their most expensive products. The low-end Ferarri is probably faster than 98% of other cars as well.

x86 won't die, that's not the claim. What will happen is that it becomes ever more relegated to biches, like IBM's z machines.

I realize what you meant to type, but I laughed at this way harder than I probably should have.
 
  • Like
Reactions: Tlh97 and Leeea

Leeea

Diamond Member
Apr 3, 2020
3,625
5,368
136
x86 won't die, that's not the claim. What will happen is that it becomes ever more relegated to niches, like IBM's z machines. Important to people in those niches, but not the ecosystem you build upon when you start something new.

That is fantasy talk.

x86 was not always on top, there are a long list of cpus over the years that when they came out were superior to x86:
DEC Alpha
IBMs power series
the Sparc series
the Motorola 68000
the Amigas
etc

Each of those systems had a time where they straight up beat x86. In some cases blew the doors off of x86. Yet x86 was never relegated to a nich, or even minority market share.

----------------------

So here we have Apples newest creation. Built on an expensive cutting edge 5 nm process with a large expensive die size. Even if it blew the doors off of x86, it would just be repeating the history of large RISC CPUs.
 
  • Haha
  • Like
Reactions: Tlh97 and dmens

name99

Senior member
Sep 11, 2010
404
303
136
What A14Z in theory, pretty much a M1 would map just onto A14Z almost 1 to 1.

A12Z was 4 big cores, 4 little cores, 8 gpu cores, and can go up to 16gb of ram with the mac mini developer edition for Big Sur.
M1 ___ is 4 big cores, 4 little cores, 8 gpu cores, and can go up to 16gb of ram with various macbook air, pro, and mac mini editions.

Pretty much we are seeing what a die shrink gets you but also the differences between Firestorm plus Icestorm cores instead of Tempest plus Vortex. Branding this new chip M1 instead of A14Z is merely "branding and marketing."

M1 probably contains a little more IO than would an A14X/Z targeting an iPad.
But that's the only difference I can think of, and I can't imagine it's important enough (ie consumes enough area) for Apple to bother with a variant die.
 

name99

Senior member
Sep 11, 2010
404
303
136
I am sure Apple would love it if you bought an Apple Watch, and iPhone, and iPad, and Macbook, and iMac, and Homepod, etc...

But that wasn't the point.

Apple is extremely unlikely to start selling $800 Macbook Airs. That is what I was arguing against.

If you wan't an Apple computer for $800, it's going to be an iPad, not a Macbook.

This is about the people that think $800+ is too much for a Macbook. The answer obviously isn't buy an iPad and a Macbook that costs $800+. It's an iPad or nothing, because Apple is extremely unlikely to price MBA at $800.

But why are you so insistent that Apple wouldn't sell a $800 Macbook?
That's the point where your argument falls apart. You insist that Apple is driven by traditional market segmentation and fear of self-cannibalism even when people who have studied the company for thirty years and worked there for ten years keep telling you that's not how Apple thinks.
 

name99

Senior member
Sep 11, 2010
404
303
136
That is fantasy talk.

x86 was not always on top, there are a long list of cpus over the years that when they came out were superior to x86:
DEC Alpha
IBMs power series
the Sparc series
the Motorola 68000
the Amigas
etc

Each of those systems had a time where they straight up beat x86. In some cases blew the doors off of x86. Yet x86 was never relegated to a nich, or even minority market share.

----------------------

So here we have Apples newest creation. Built on an expensive cutting edge 5 nm process with a large expensive die size. Even if it blew the doors off of x86, it would just be repeating the history of large RISC CPUs.

And what's the generally accepted reason that Intel beat them? Lower price, higher volume, virtuous circle.
That's the circle that's ending now, even as it's expanding for ARM...
 

name99

Senior member
Sep 11, 2010
404
303
136
This has always been a trade off. Look at Nvidia vs AMD. Nvidia has always traditionally gone wider with slower clocks. AMD traditionally goes narrow with high clocks. That ball has been bounced between the two for a long time. Nvidia has traditionally lead, but AMD has had its time in the light also. Rumor has it AMD will be back on top in a few days.

I think the big thing though is a narrower high clocked unit has a smaller physical size, so is much cheaper to produce. Both more chips per wafer and lower probability of bad chips.

I can't speak to GPUs, but this is nonsense for cores. Intel and AMD's cores. I'm not interested in looking up the numbers again, but it was absolutely true the last time I checked, comparing Tiger Lake (on supposedly 7nm equivalent), Zen2, and A13.
If you're interested in scoring points you can probably find some stupid way of cutting up the chip (must include L2 but must ignore L3 or whatever) where you can just get Intel/AMD to come out ahead, but the honest observer would say that they're not much different in size.

Yes Intel and AMD use fewer transistors, but they use larger transistors and cells to hit those frequencies.
 

insertcarehere

Senior member
Jan 17, 2013
639
607
136
I can't speak to GPUs, but this is nonsense for cores. Intel and AMD's cores. I'm not interested in looking up the numbers again, but it was absolutely true the last time I checked, comparing Tiger Lake (on supposedly 7nm equivalent), Zen2, and A13.
If you're interested in scoring points you can probably find some stupid way of cutting up the chip (must include L2 but must ignore L3 or whatever) where you can just get Intel/AMD to come out ahead, but the honest observer would say that they're not much different in size.

Yes Intel and AMD use fewer transistors, but they use larger transistors and cells to hit those frequencies.

Anandtech themselves did an analysis with the A13 die shot, the two performance cores + 8MB shared L2 (no L3 on chip) is ~9mm^2, while AMD stated that a 4 core complex in Zen 2 (w/16 MB L3) is 31.3mm^2, both on 7nm process nodes, with tiger lake cores supposedly larger than Zen 2. People in this thread keep talking about how a "wide" design like this comes with a die space penalty but I don't see it here.

Sans L2, the "wide" A13 lightning core w/L1 is 2.61mm^2 while the "narrow" Zen 2/Zen 3 cores w/L1 are... 2.83/3.24mm^2 respectively, on equivalent processes.
 
Last edited:

Thala

Golden Member
Nov 12, 2014
1,355
653
136
Anandtech themselves did an analysis with the A13 die shot, the two performance cores + 8MB shared L2 (no L3 on chip) is ~9mm^2, while AMD stated that a 4 core complex in Zen 2 (w/16 MB L3) is 31.3mm^2, both on 7nm process nodes, with tiger lake cores supposedly larger than Zen 2. People in this thread keep talking about how a "wide" design like this comes with a die space penalty but I don't see it here.

Thats because with ARM you can have wider design and still have similar gate count - x86 is just that awful. Jim Keller commented on this as well, when he was working on K2.
 

amrnuke

Golden Member
Apr 24, 2019
1,181
1,772
136
Anandtech themselves did an analysis with the A13 die shot, the two performance cores + 8MB shared L2 (no L3 on chip) is ~9mm^2, while AMD stated that a 4 core complex in Zen 2 (w/16 MB L3) is 31.3mm^2, both on 7nm process nodes, with tiger lake cores supposedly larger than Zen 2. People in this thread keep talking about how a "wide" design like this comes with a die space penalty but I don't see it here.
It's extremely difficult to compare an A13 with a Zen 2 core when you start including cache. Because of the structuring of the A13's cache, you have:
- A 2-core complex incl all L1$ and L2$, but not SLC
- A 4-core complex where you include L1$, L2$, and L3$

However, the SLC on A13 acts as a last-level cache (like Zen2's L3$), and while its latency is higher than Zen2's L3$ latency (largely this was probably a design decision since A13 features 8MB L2$), the latency is still lower than Zen2's system memory latency.

You can see things get fuzzy.

So it's a little disingenuous to exclude SLC but include L3$. And it's similarly disingenuous to exclude Zen2's L3$ but include A13's massive L2$. Where we could probably all agree is on just including the core + L1I/D$. And that calculation has already been done. I just think it's important to compare apples to apples, and you seem to be using a bad example for your intepretations.

(Zen2 core area incl L1$ = 2.83 mm2; A13 core area incl L1$ = 2.07 mm2)

Zen2 core + L2$ + 4MB slice of L3$ has 475m transistors on an area of 7.83mm2 for a density of 60m xtors/mm2
A13 has a density of 8.5b transistors on an area of 98mm2 for the whole A13 SoC for a density of 86m xtors/mm2

The cores and caches tend to be more transistor dense, and the explanation for Zen2's lower density I've heard has been that higher clocked chips don't lend well to maximization of transistor density as well as lower-clocked chips. So if anything, narrower/faster x86 cores need lower density. Wide designs lend well to more transistors but can pack them tighter.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
Thats because with ARM you can have wider design and still have similar gate count - x86 is just that awful. Jim Keller commented on this as well, when he was working on K2.

I am not saying anything about power consumption or how late it is.

Just in terms of core size and performance they can get pretty damn close:

On TSMC 7nm+ process, the A76 cores in Kirin 990 are 1.19mm2. Let's say its 1.15mm2. On Intel's 10nm process, the Tremont core in Lakefield is 0.88mm2(courtesy of Wikichip).

The Huawei Mate 30 Pro gets 750-770 GB5 Single thread score. Pentium Silver N6000 which uses the Tremont architecture gets 720-740 on Windows, and Linux is going to boost the ST scores a few %, meaning they are at parity.
 

MarkizSchnitzel

Senior member
Nov 10, 2013
403
31
91
@Eug
Can you name a few usecases from your work where ipad would be the preferable form?
I am not imaginative enough to think of any by myself. My work is mostly with CAD and excel, and multitasking. I can't possibly imagine a workflow where a tablet form factor could be anywhere near. By tablet, I mean iOS.

As long as you to use an app + email + chat app + browser, multitasking and switching windows is so incredibly slower. Not to mention managing files, possibly from network locations.

I just can't imagine any task to be more efficiently done without a pointer and a file manager and robust multitasking.
 

Eug

Lifer
Mar 11, 2000
23,587
1,001
126
@Eug
Can you name a few usecases from your work where ipad would be the preferable form?
I am not imaginative enough to think of any by myself. My work is mostly with CAD and excel, and multitasking. I can't possibly imagine a workflow where a tablet form factor could be anywhere near. By tablet, I mean iOS.
I don't do CAD.

Excel up until recently sucked on the iPad because there was no proper mouse support. Now Office includes proper mouse support.

Note that I have a 18 mm keyspaced full-sized Smart Keyboard for this iPad. I‘m much less productive using the on-screen keyboard.

As long as you to use an app + email + chat app + browser, multitasking and switching windows is so incredibly slower. Not to mention managing files, possibly from network locations.
These days, I don't usually need to manage many files, but most of the time it's in the cloud. However, if it is local, AirDrop works for Apple clients, whether it’s a Mac or an iPhone or an iPad. External storage is not ideal on my 10.5" Lightning iPad, but much easier on the USB-C iPads.

Email + Chat + Browser works great. Browser supports many tabs, and if you want you can put iMessage and Safari next to each other, although there is more room on a 12.9” iPad than there is on a 10.5”.

6125E810-859E-4A9F-B85B-6254D350C2FF.png
86B9E033-47DA-4D11-A2FC-10A75D3E92FA.png
811AD694-941E-45C2-A09E-8503A790E778.png
FE5FA51C-D64F-4C4D-981E-D21D001A9EE7.png
89AC6922-38B1-4DE8-A405-6DE35F84CBE7.png
1801D467-5888-4915-B99F-25708F6B5319.png
 
Last edited:
  • Like
Reactions: coercitiv

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
I think the big thing though is a narrower high clocked unit has a smaller physical size, so is much cheaper to produce. Both more chips per wafer and lower probability of bad chips.

One also has to wonder whether there are other benefits to pursuing designs with higher clock speeds, as both Intel and AMD seem hellbent on doing so.

I remember reading a long time ago somewhere that clock speed was the linchpin in Intel's design philosophy due to how it enhanced other aspects of the CPU, like SIMD and SMT. Given how aggressively they tried to ramp up the megahertz with the P4, Core, and Core i7 series, there might be some truth to this.

And of course from a marketing perspective, a 5ghz CPU sounds much more impressive than a 3ghz one.
 

insertcarehere

Senior member
Jan 17, 2013
639
607
136
It's extremely difficult to compare an A13 with a Zen 2 core when you start including cache. Because of the structuring of the A13's cache, you have:
- A 2-core complex incl all L1$ and L2$, but not SLC
- A 4-core complex where you include L1$, L2$, and L3$

However, the SLC on A13 acts as a last-level cache (like Zen2's L3$), and while its latency is higher than Zen2's L3$ latency (largely this was probably a design decision since A13 features 8MB L2$), the latency is still lower than Zen2's system memory latency.

You can see things get fuzzy.

So it's a little disingenuous to exclude SLC but include L3$. And it's similarly disingenuous to exclude Zen2's L3$ but include A13's massive L2$. Where we could probably all agree is on just including the core + L1I/D$. And that calculation has already been done. I just think it's important to compare apples to apples, and you seem to be using a bad example for your intepretations.

(Zen2 core area incl L1$ = 2.83 mm2; A13 core area incl L1$ = 2.07 mm2)

Zen2 core + L2$ + 4MB slice of L3$ has 475m transistors on an area of 7.83mm2 for a density of 60m xtors/mm2
A13 has a density of 8.5b transistors on an area of 98mm2 for the whole A13 SoC for a density of 86m xtors/mm2

The cores and caches tend to be more transistor dense, and the explanation for Zen2's lower density I've heard has been that higher clocked chips don't lend well to maximization of transistor density as well as lower-clocked chips. So if anything, narrower/faster x86 cores need lower density. Wide designs lend well to more transistors but can pack them tighter.

I agree that including/excluding caches can make measurements very fuzzy either way, but with all these comparisons there's still no evidence that the lower-clocked, wider Apple cores take more space than the higher-clocked, narrower Zen 2/3 cores on equivalent process, and ultimately, die space is what TSMC charges for, not transistor count.
 

iwulff

Junior Member
Jun 3, 2017
24
7
81
The Ryzen processors are more aimed towards server like workloads and this reflects in the Geekbench score details. And the A14 is aiming for consumers and does everything to speed this up by use of their neural co-processors and accelerators and having memory more close by. Performance of the individual processors is good enough to catch up to anything that can't be accelerated this way, while Apple will just look for ways to enhance this along the way most likely or in future iterations. Still it's wierd to see the A14 doing so well in clang (which seems the exception) for example and AMD is probably scoring really high on the AES due to their own accelerator in this regard (as far as I know they use one?).

In my opinion, this benchmark is getting 'fluked' by AMD on AES (and probably some other parts) and by Apple on various workloads by how their architecture works. Is that a bad thing? Not necessairly, but I can understand why an architecture that's quite different (more accelerators, neural engine, on-die ram, etc...) is hard to compare to more traditional setups like that of Intel and AMD. The M1 is a smarter and 'dumber' design at the same time and I assume it's very specific on how to handle a certain workload, much more then a traditional design, and sometimes needs to fallback to raw performance, which is very good and good enough in most cases but not as good as that of Zen 3 for example. That's also the reason why Apple won't use this for their servers without some big changes to the arch. Their igpu is something that's unclear to me in regards to performance, but it's most likely tuned for compute only and gaming will perform just reasonable. A bit like the older AMD Radeon setup's.

I will have to see more benchmarks and real world performance results to really form an opinion of how good the M1 really is. But anything that can't be handled by Apple in an accelerator/neural/on die memory option is probably going to be slower then an Zen 3, but still more energy efficient. Perhaps this is the future of processors?

I'm impressed by the M1 max boost score with such a wide arch and having such a seemingly high perf/watt peformance. I do know when downclocking a bit and optimizing v-core and other settings for both Intel and AMD the perf/watt increases by a lot, but it does seem that they don't get close to the M1. The processnode is of course critical in this regard, and hand in hand with a strong/smart energy efficient architecture you have a deadly set-up. I'm sure AMD and Intel will show a lot of improvements when going to 5nm together with the current focus on energy efficiency, but they are behind Apple by at least 1-2 years for AMD and who knows what for Intel. And even when they will deliver, it probably won't be as energy efficient.
 
  • Like
Reactions: Schmide

yuri69

Senior member
Jul 16, 2013
389
624
136
A layman question:

Apple seemingly aims for the wide-core approach:
  • OoO buffer is almost twice (!) the Tiger/Ice Lake which is already seen as "a large one"
  • 128KiB L1 caches are four (!) times larger than Zen 3's
  • the number of ALUs, LS ports, decoders, etc. is also way bigger than Intel/AMD
All this with absolutely very low power consumption and ~3GHz. What is the secret sauce here?

Adding OoO structures, cache, ports, etc. is easy on paper but *always* blows up complexity, power and die area. The mighty ARM ISA can't magically fix this, right? So what is it?

Are the rest of the silicon industry's greatest minds just dumb? Does just sheer amount of Apple's $$$ work?
 

tempestglen

Member
Dec 5, 2012
81
16
71
A layman question:

Apple seemingly aims for the wide-core approach:
  • OoO buffer is almost twice (!) the Tiger/Ice Lake which is already seen as "a large one"
  • 128KiB L1 caches are four (!) times larger than Zen 3's
  • the number of ALUs, LS ports, decoders, etc. is also way bigger than Intel/AMD
All this with absolutely very low power consumption and ~3GHz. What is the secret sauce here?

Adding OoO structures, cache, ports, etc. is easy on paper but *always* blows up complexity, power and die area. The mighty ARM ISA can't magically fix this, right? So what is it?

Are the rest of the silicon industry's greatest minds just dumb? Does just sheer amount of Apple's $$$ work?

When DEC almost bankrupted in late 1990s and eventually acquired by Compaq in 1998, the best RISC & SoC design team in the world--DEC's Califonia PA branch was led by Dan Dobberpuhl who had been in the supervisor position of the famous Alpha and StrongARM and one of the five most important experts of the whole DEC.

Dan Dobberpuhl and his team(around 150 people) didn't join Intel or AMD as other collegues, then Sibyte and PA Semi were started up. In 2005, their masterpiece, PA6T chipset demonstrated 100% integer & 50% float point performance of intel Core with ONLY 10% power consumption! In 2008 when Apple merged PA Semi, the best RISC & SoC belonged to Apple and that's why A series/M1 have been so outstanding. BTW, Jim keller was under the guide of Dan from as early as 1990s(DEC), then in Sibyte/PAsemi.

So I may answer your question: Intel & AMD have never gotten the best RISC/SoC design team which originated from DEC and finally into Apple.

Wish Dan Dobberpuhl happy in the heaven seeing his heritage blowing people's mind.

https://www.realworldtech.com/pa-semi/6/
https://en.wikipedia.org/wiki/AmigaOne_X1000

dan_dobberpuhl_jim_keller.jpg
 
Last edited:
  • Like
Reactions: Tlh97 and Qwertilot