Discussion Apple Silicon SoC thread

Page 139 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Eug

Lifer
Mar 11, 2000
23,587
1,001
126
M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:

Screen-Shot-2021-10-18-at-1.20.47-PM.jpg

M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:


M1 Ultra discussion here:


M2 discussion here:


Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:

 
Last edited:

DrMrLordX

Lifer
Apr 27, 2000
21,637
10,856
136
Is it Apple Silicon that is not optimized for Handbrake, or Handbrake that is not optimized for Apple Silicon?

There is a lot of software that is only optimized for x86, and for Mac basically gets "it compiled without errors and seems to run, ship it!" treatment.

TBH someone would have to examine how the ARM builds perform on non-Apple hardware to make that judgment. Preferably to see if NEON is fully-supported by any available build.
 

Doug S

Platinum Member
Feb 8, 2020
2,269
3,521
136
TBH someone would have to examine how the ARM builds perform on non-Apple hardware to make that judgment. Preferably to see if NEON is fully-supported by any available build.

Even if NEON is "supported" there's a big difference between someone spending a few days hand optimizing SIMD code and someone just slapping something together that produces a correct result. Or worse, using a cross compiler to translate x86 SIMD instructions to NEON without any regard for whether what was optimal scheduling on x86 is optimal scheduling on ARM.

Whether NEON is "fully supported" in something doesn't mean squat. Especially since compilers can generate at least some SIMD code, so if it uses NEON it might still have been written on a minimum effort basis.

Now it is open source so nothing stops some Mac expert from deciding to have a look and improve on it. But they're only going to bother if Handbrake is a big portion of their daily workflow, not because a few people like to use it as a benchmark.
 

DrMrLordX

Lifer
Apr 27, 2000
21,637
10,856
136
Even if NEON is "supported" there's a big difference between someone spending a few days hand optimizing SIMD code and someone just slapping something together that produces a correct result. Or worse, using a cross compiler to translate x86 SIMD instructions to NEON without any regard for whether what was optimal scheduling on x86 is optimal scheduling on ARM.

Hence why it should be tested on non-Apple hardware. Unless you want to do a code audit yourself.
 

Thala

Golden Member
Nov 12, 2014
1,355
653
136
The focus should between a AMD 6800U and Intel i7 1260P both of which the M2 GPU beat. The Tiger Lake i5 only scored that high because of the RTX 3050.

The i7 1260P is an 28W CPU and should be rather compared to the M2-Pro and not the base M2. The i7 1260U is the pendant to the M2.
 

poke01

Senior member
Mar 8, 2022
741
725
106
Apple makes fair comparisons to other CPUs in the same weight class difficult due to charging almost twice the price.

No that's not true.
I chose the Dell because this is most comparable windows premium ultrabook with 12th gen P CPU.

Dell XPS Plus with i7 1260P 512SSD 16GB RAM - $ 1,549
M2 MacBook Air with M2 (8CPU+10GPU) 512SSD 16GB RAM - $ 1,699

Difference - $150

Keep in the standard Dell screen comes with a 1200p display while the M2 Air comes with 1664p screen.
Also the M2 Air comes with more ports including a 3.5mm jack which the Dell omits.
The battery life and GPU is much better on M2 Air.

source: https://www.dell.com/en-us/shop/laptops/new-xps-13-plus/spd/xps-13-9320-laptop/xn9320cto030s
source: https://www.apple.com/shop/buy-mac/...m2-chip-with-8-core-cpu-and-10-core-gpu-512gb

So does Apple really charge twice the price of their competitor on the end package?
No

1657334178007.png1657334218896.png
 

poke01

Senior member
Mar 8, 2022
741
725
106
Dell isn't the only OEM selling i7-1260P laptops. A better deal can be found with some patience and perseverance. And these will be much cheaper after 6 months to a year. The M2, not so much.
The M1 MacBook Air can be found around $800-$850 on Amazon or Costco when on sales.
Refurbished M1 Airs are even cheaper on Amazon.
Similarly the M2 Air will go on sale after 6-12 months. The 14"/16" MacBook Pro already go on sale on Amazon and Costo.

I know there are cheaper i7 1260p laptops but if you want a 500nit screen, great metal build, great speakers and trackpad those windows laptops will be the around the same price as the M2 Air right now.
 
  • Like
Reactions: Viknet and scannall

repoman27

Senior member
Dec 17, 2018
342
488
136
The closest Intel comparisons to the M2 at the moment are the Alder Lake-U 9/15W chips. Intel's recommended customer prices for those parts are:

Intel i7-1250U / 1260U / 1255U / 1265U, 9W / 15W, 10-core CPU, 6-core GPU: $426
Intel i5-1230U / 1240U / 1235U / 1245U, 9W / 15W, 10-core CPU, 5-core GPU: $309

You have to back the price of the M2 out of the device price, which is pretty straightforward to do. If you go by Apple's full retail pricing, they're only charging ~$200 for the 8-core CPU / 10-core GPU, and ~$100 for the 8-core CPU / 8-core GPU versions. That's less than half of Intel's RCP for ADL-U.

The M2 will likely find its way into the same devices as the M1, which include the $699 Mac mini and $599 iPad Air. Those products, brand new at retail, have seen discounts that bring the prices down to $569.99 and $549.99 respectively. Let me know when Alder Lake-U devices hit those price points.

Apple spends a lot more than other OEMs on many of their components, and they have significantly higher hardware gross margins. For the most part, that's OK, because they generally make really nice stuff, and even the entry level models include almost every feature present at the top of the line. The biggest issue with Apple's pricing, which unfortunately there is no getting around, is their markup on commodity DRAM and NAND flash memory. If you can't stomach paying $25 / GB for DRAM and $500-$800 / TB for NAND, then you better look elsewhere.
 
Jul 27, 2020
16,339
10,351
106
The biggest issue with Apple's pricing, which unfortunately there is no getting around, is their markup on commodity DRAM and NAND flash memory. If you can't stomach paying $25 / GB for DRAM and $500-$800 / TB for NAND, then you better look elsewhere.
Very true. Their strategy seems to be like Britain's WW2 wartime rationing. Must save the DRAM/storage for those who can really afford it. In reality, they are going for the jugular. They know many people will not be OK with the minimum so they will have no choice but to pay more. I don't know how those minds work who think this is not evil corporation exploitation.
 
  • Like
Reactions: Tlh97 and Lodix

Mopetar

Diamond Member
Jan 31, 2011
7,848
6,013
136
If you spec out a top end PC from most reputable companies it's going to be comparable in price to an equivalent Mac. Apple just doesn't have much for mid-range (cheapest notebook they sell is precious generation MacBook Air for $999) products and they don't even bother with the low-end, leaving those price points for their high-end tablets.

You can find some craptop that's got a combination of parts that make it at least 80% of a Mac in terms of average performance for 50% (or less!) of the price, but the build quality is going to be awful and it's going to be filled with bloatware and have a lot of other annoyances that you'll need to spend time dealing with.
 

repoman27

Senior member
Dec 17, 2018
342
488
136
Someone beat me to it and posted a link to this in the "Intel current and future Lakes & Rapids thread" already, but seeing as it's one of the best shots of the M1 Ultra to date, I figured I'd drop it here as well.

FXVbH0-UsAEtNrL


And as a bonus, this was recently posted:

FXfp9bbUIAEgRcV


edit: source @techanalye1.
 

Ajay

Lifer
Jan 8, 2001
15,468
7,872
136
Is it Apple Silicon that is not optimized for Handbrake, or Handbrake that is not optimized for Apple Silicon?

There is a lot of software that is only optimized for x86, and for Mac basically gets "it compiled without errors and seems to run, ship it!" treatment.
Software gets optimized for hardware in my opinion, not the other way around. Having worked in firmware development, it's absolutely true there.
 

Doug S

Platinum Member
Feb 8, 2020
2,269
3,521
136
Software gets optimized for hardware in my opinion, not the other way around. Having worked in firmware development, it's absolutely true there.

I'm not sure what you're getting at here. Plenty of software is NEVER optimized, or given only the bare minimum of effort. The more widely used hardware is, the more likely someone will consider it worth their time or money to optimize a given piece of software for it. There's little chance anyone would waste time optimizing Handbrake for RISC-V, for instance, because no one is using RISC-V to run it.

Hardware is not optimized with a particular piece of software in mind, but with particular classes of software or functions of software it absolutely is. Countless thousands if not millions of man hours have gone into optimizing hardware from the CPU to the memory subsystem to the storage to maximize performance on that hardware for relational databases in general, and Oracle in particular, for instance.

Its a two way street, the hardware is fixed once it is produced so all you can do is optimize your software to run as best as it can within the limitations of that hardware. But hardware designers absolutely do run benchmarks on simulators to see what effect certain changes may have. Some stuff is obvious, you don't need to test what happens if you increase clock speed or make cache bigger.

If however you make L2 smaller but increase associativity and reduce latency well that's going to make some things faster and some things slower, so whether that makes sense to do depends on how the software you care about performs. If you made such a change and found it makes databases run faster but games run slower then it is a good thing to do if you have a big server market and no gaming market (i.e. IBM POWER) but perhaps not so much if you have no server market and a big gaming market (i.e. Apple)
 
Jul 27, 2020
16,339
10,351
106
Countless thousands if not millions of man hours have gone into optimizing hardware from the CPU to the memory subsystem to the storage to maximize performance on that hardware for relational databases in general, and Oracle in particular, for instance.
I suppose Bulldozer is an extreme example of this. It was designed with the assumption that multicore workloads would soon become the norm. Unfortunately, it sacrificed too much in other areas to achieve that goal. I guess Zen and particularly Zen 3 is probably the most well-balanced x86 architecture.
 

Thala

Golden Member
Nov 12, 2014
1,355
653
136
Even if NEON is "supported" there's a big difference between someone spending a few days hand optimizing SIMD code and someone just slapping something together that produces a correct result. Or worse, using a cross compiler to translate x86 SIMD instructions to NEON without any regard for whether what was optimal scheduling on x86 is optimal scheduling on ARM.

Agreed. Just giving an example: In case of the Intel Embree library (used by Cinebench for instance) it is just a static mapper of each AVX/SSE intrinsic to a set of NEON intrinsics.
 
  • Like
Reactions: Tlh97 and Doug S

Doug S

Platinum Member
Feb 8, 2020
2,269
3,521
136
Agreed. Just giving an example: In case of the Intel Embree library (used by Cinebench for instance) it is just a static mapper of each AVX/SSE intrinsic to a set of NEON intrinsics.

Sadly there are probably a lot of such cases, though I would expect over time as Apple Silicon becomes dominant in the installed base of Macs that people will pay more attention to optimizing for it.

There's also a question of whether it is worthwhile to even bother with NEON optimizations, as Apple may implement SVE2. There's also the Accelerate library to use the Apple'x AMX instructions for matmul type work, and I recently saw a link to some intriguing patents that suggest Apple may blaze its own path with something that greatly improves upon SVE2's flexibility (which they may also hide behind an API like they did with AMX to allow big generational changes without breaking software)

While having all those options would be nice, the uncertainty around which way they go is more than enough to cause developers to decide it is better to wait until there's more clarity. I will say that if Apple implements SVE2 it would really be too bad they didn't do it with A14/M1, to make that the lowest common denominator instead of NEON. But I guess everyone suffers from this, i.e. you have to go all the way back to SSE4 if you want to support the lion's share of the x86 installed base.
 
  • Like
Reactions: igor_kavinski

MadRat

Lifer
Oct 14, 1999
11,910
238
106
So Apple uses roughly 175% of the silicon compared to AMD, and about 190% compared to Intel, per CPU? Seems like AMD and Intel are also on inferior manufacturing processes, so there is that. I'm less impressed by the performance metric results now.
 

pakotlar

Senior member
Aug 22, 2003
731
187
116
So Apple uses roughly 175% of the silicon compared to AMD, and about 190% compared to Intel, per CPU? Seems like AMD and Intel are also on inferior manufacturing processes, so there is that. I'm less impressed by the performance metric results now.

Yeah, so the reason is to drive IPC (wide execution, enough low latency cache to keep those large cores fed) and have enough cores to allow for low voltage and clocks in MT workloads. It begins to become impressive when you factor in that performance is close to Intel/AMD but draws significantly less power, and importantly doesn’t require boosting voltage/clocks for lightly threaded workloads to approach their boosted/PL2 performance, so remains sustainable on battery.

But certainly what they’re doing isn’t magical, and AMD/Intel will be able to match if they throw 100B transistors at the problem.
 
Last edited:

poke01

Senior member
Mar 8, 2022
741
725
106
So Apple uses roughly 175% of the silicon compared to AMD, and about 190% compared to Intel, per CPU? Seems like AMD and Intel are also on inferior manufacturing processes, so there is that. I'm less impressed by the performance metric results now.
Can you tell me which AMD and Intel CPUs you compared the M2 to?
 

repoman27

Senior member
Dec 17, 2018
342
488
136
So Apple uses roughly 175% of the silicon compared to AMD, and about 190% compared to Intel, per CPU? Seems like AMD and Intel are also on inferior manufacturing processes, so there is that. I'm less impressed by the performance metric results now.
Ummm...

AMD Ryzen 7 6800U (Rembrandt 8+0+12)
210 mm²
13.1 B transistors
TSMC N6

Intel Core i7-1260P (Alder Lake 6+8+6, Alder Point PCH-P)
217 mm², 54 mm²
??? transistors
Intel 7, Intel 14nm

Apple M2 (Staten 4+4+10)
148 mm²
20 B transistors
TSMC N5P

The M2 die area is only 70% of Rembrandt, and 55% of Alder Lake + Alder Point. Way more transistors, but hey, sucks to not be on the best available process.