Discussion Apple Silicon SoC thread

Page 272 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Eug

Lifer
Mar 11, 2000
23,712
1,245
126
M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:

Screen-Shot-2021-10-18-at-1.20.47-PM.jpg

M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:


M1 Ultra discussion here:


M2 discussion here:


Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:


M4 Family discussion here:

 
Last edited:

soresu

Platinum Member
Dec 19, 2014
2,739
1,944
136
Nuvia core isn't consumer to begin with.
Still a little hazy on that.

PR way back implied that Nuvia wanted to design X core, but Qualcomm after acquiring them wanted Y core due to a difference in priorities a la Nuvia originally targeting servers for Phoenix.

The question is, exactly how much change happened in the concept phase before they went full on into implementation - or was the entire point about differences just PR blarg to reassure customers that Nuvia's core would be suitable to purpose?
 
  • Like
Reactions: Nothingness

SarahKerrigan

Senior member
Oct 12, 2014
417
709
136
Is SME even enabled on v9-A Snapdragon SoC's?

Or any other consumer v9-A hardware for that matter.

Afaik Snapdragons enable no SVE, for reasons that are not immediately obvious to me (but variable-length vector machines are a little more of a pain to work with than commonly believed, IME.) I don't think SME even exists in the licensable cores right now.
 
  • Like
Reactions: Nothingness

lopri

Elite Member
Jul 27, 2002
13,211
597
126
I put together a quick and dirty clock-normalized comparison spreadsheet between M4 and A17:

View attachment 98626

I'm using https://browser.geekbench.com/v6/cpu/6010114 for the iPhone data. (Also it only just occurred to me that I could have compared against M3 instead of A17. I am a very stupid lady at times. Still, broad conclusions should still apply.)
Then again this makes me think this is another instance of a US company takeing credit of TSMC's achievement..
 

gdansk

Platinum Member
Feb 8, 2011
2,224
2,847
136
Then again this makes me think this is another instance of a US company takeing credit of TSMC's achievement..
When you're Apple's scale you get to do that. Dual stack OLED was first developed by and is manufactured by the Koreans. Yet if remembered at all it'll be as Apple's Tandem OLED. That's just how it goes.
 

poke01

Senior member
Mar 8, 2022
902
948
106
When you're Apple's scale you get to do that. Dual stack OLED was first developed by and is manufactured by the Koreans. Yet if remembered at all it'll be as Apple's Tandem OLED. That's just how it goes.
Samsung could have implemented it last year in their tabs but didn’t. I presume because they can’t yet make larger panels yet for the 12” and 14” tablets.

LG made the 13” Tandem OLED panel for the iPad and Samsung for the 11”.
 

poke01

Senior member
Mar 8, 2022
902
948
106
Is SME even enabled on v9-A Snapdragon SoC's?

Or any other consumer v9-A hardware for that matter.
Nope, neither Mediatek or Qualcomm has SME enabled.

It’s funny though. There was discourse on Twitter saying that Apple was taking a long time to implement ARMv9 in their cores but they were the first ones to implement SME first.

I do believe that matrix ops will become more important in the future and Apple has plans to use their chips for AI servers.
 

mikegg

Golden Member
Jan 30, 2010
1,813
445
136
I think at this point I have to say GB6 is no longer a good benchmark.

It should have been called GB7 if something as important as SME was added. It skews the results.

I'm waiting for more workloads but nice to see Apple include SME.
Scores are still valid when compared to x86 chips because GB already has AVX support.
 

soresu

Platinum Member
Dec 19, 2014
2,739
1,944
136
I do believe that matrix ops will become more important in the future
Given the whole AI kick that the consumer electronics industry has been on it does seem especially strange for Mediatek to have disabled SME when they don't have the excuse of gimping their current SoC's to make future custom CPU core laden ones seem equal or better.

I wonder if there is a power draw penalty involved even when it is not in use.
 

soresu

Platinum Member
Dec 19, 2014
2,739
1,944
136
Scores are still valid when compared to x86 chips because GB already has AVX support.
Not sure that AVX is directly comparable to SME, even though it has AI/ML type instructions.

Intel's AMX sounds more like it fits the bill as a direct competitor to SME, I think I saw somewhere that it might be included in Zen5, which given the sheer amount of time it has been around isn't a great surprise if so.
 
  • Like
Reactions: Hitman928

roger_k

Member
Sep 23, 2021
74
143
76
Not sure that AVX is directly comparable to SME, even though it has AI/ML type instructions.

Intel's AMX sounds more like it fits the bill as a direct competitor to SME, I think I saw somewhere that it might be included in Zen5, which given the sheer amount of time it has been around isn't a great surprise if so.

SME is not just for matrix multiplication, it also includes instructions for manipulating vectors. Apple's implementation uses 512-bit vectors, similar to AVX512. And while AVX512 undoubtedly has more features, the vector arithmetic and scatter/gather functionality offered by SME is sufficient to cover a large chunk of HPC. On the same topic, matrix multiplication is not only useful for ML. It also makes a great linear algebra accelerator for scientific and engineering applications.
 

soresu

Platinum Member
Dec 19, 2014
2,739
1,944
136
What do you mean by "enabled"? No other ARM CPU currently has the relevant functional block, so there is nothing to enable.
Are you saying that Cortex A5xx, 7xx and X cores do not even have it in their baseline IP despite it being part of the v9-A spec?

..... Huh, I was sure it would at least be in Neoverse V2 or V3, but I can't find any reference to it at all - which you would assume would be there if they wanted it advertised as a part of the feature set.
 

roger_k

Member
Sep 23, 2021
74
143
76
On the topic of IPC improvements in GB6, they are not zero. There are some quite noticeable improvements in several GB6 subtests if one compares multiple results. GB6 scores have a very high variance, which makes interpreting individual results very difficult. But we start seeing patterns if we sample from different results. Here is M4 iPad vs M3 MacBook Air, normalized by frequency. I'd say only 5 out of 15 GB6 subtests show no evidence of IPC improvements. Other tests show either modest 3-5% IPC improvements over the M3, and a few show around 10%. I'd also like to know what's happening to background blur test.



1715497928565.png

Edit: make the plot more readable
 
Last edited:

roger_k

Member
Sep 23, 2021
74
143
76
Are you saying that Cortex A5xx, 7xx and X cores do not even have it in their baseline IP despite it being part of the v9-A spec?

I was also surprised. In retrospect however, it kind of makes sense. Designing a coprocessor like that is not easy. Apple has been working on it for at least 6 years (first matrix accelerators shipped on A13). In fact, I wouldn't be surprised if SME/streaming SVE was added specifically for Apple. It was always kind of suspicious that SME chose the outer product matmul ISA model, and that the instructions enabled by the streaming SVE always closely matched the functionality of Apple AMX.
 

Eug

Lifer
Mar 11, 2000
23,712
1,245
126
On the topic of IPC improvements in GB6, they are not zero. There are some quite noticeable improvements in several GB6 subtests if one compares multiple results. GB6 scores have a very high variance, which makes interpreting individual results very difficult. But we start seeing patterns if we sample from different results. Here is M4 iPad vs M3 MacBook Air, normalized by frequency. I'd say only 5 out of 15 GB6 subtests show no evidence of IPC improvements. Other tests show either modest 3-5% IPC improvements over the M3, and a few show around 10%. I'd also like to know what's happening to background blur test.

View attachment 98786
Computed by extracting GB6 results across multiple tests, normalizing them by frequency (norm=runtime*clock), and calculating resampled normalized ratios.
Data: 48 GB6 entries for M4 iPad and 34 GB6 entries for M3 MacBookAir (only results with SC score > 3000 chosen)
1% outliers removed from both sides, Object Detection subtest ommited


Nice!
 

soresu

Platinum Member
Dec 19, 2014
2,739
1,944
136
In fact, I wouldn't be surprised if SME/streaming SVE was added specifically for Apple
Dunno about SME, but SVE was developed first as an ARM/academic research effort, and then later with Fujitsu for the Fugaku/A64FX supercomputer CPU core into the SVE1 instruction set.

The research instruction set was called Argon (like Neon or Helium):

Advanced SIMD: Extending the Reach of Contemporary SIMD Architectures

Here is a link to the original publication on IEEE, and a link to the full PDF.
 

soresu

Platinum Member
Dec 19, 2014
2,739
1,944
136
It was always kind of suspicious that SME chose the outer product matmul ISA model, and that the instructions enabled by the streaming SVE always closely matched the functionality of Apple AMX
What does "outer product matmul ISA" mean?

Do you mean that SME was designed as a separate ISA from the main CPU core?