Discussion Apple Silicon SoC thread

Page 420 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Eug

Lifer
Mar 11, 2000
24,114
1,760
126
M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:

Screen-Shot-2021-10-18-at-1.20.47-PM.jpg

M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:


M1 Ultra discussion here:


M2 discussion here:


Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:


M4 Family discussion here:

 
Last edited:

Mopetar

Diamond Member
Jan 31, 2011
8,486
7,723
136
Will AI ever get good enough to just write all the code and replace humans? No. Because you still need a human to tell it what is needed.

1758130446031.jpeg

"I have AI skills. I'm good at dealing with AI. Can't you understand that? What the hell is wrong with you people?"

Let me know when you make a Jump to Conclusions mat.
 

GC2:CS

Member
Jul 6, 2018
35
21
81

E core is thus 10-15 x less power than P core. P core up to 10W in a few mm^2 pasivelly cooled. is there any limit until the engineers throw the towel and say its not worth to go much higher (in a phone) ?

Which past P core is the e core equal to ?
A12 possibly ?

Apple doubled 16-bit floating point math rates in the GPU

Is it easy just like that to double the FP16 ALU’s and keep the efficiency same or even higher ? any gpu architect here ?

Can we assume that A17 on the first gen N3 was some kind of a hiccup ?
 
  • Like
Reactions: Tlh97

Doug S

Diamond Member
Feb 8, 2020
3,567
6,301
136
Its nice that the E cores got such a big bump without increasing power, but based on Geekwan's SPEC and the posted Steel Nomad bench they've bumped the power in the P cores and GPU again. That's not sustainable. The vapor chamber cooling is only spreading the heat around it doesn't cause the phone to radiate heat any better, and based on the Engadget review that's have the expected effect of making the entire case hot - at least I'd call it "hot" if you have to warn someone you're handing it to to avoid surprise, even if it isn't "burn your skin" hot.

Just like Apple has a slider for charging limit maybe they need a slider for external temperature to control when your phone starts to throttle. I'd rather my phone have to throttle a bit more and keep it cool in my hands, others may be fine letting it get even hotter than it does. I remember past iPhones never got above what I'd consider "slightly warm", but my 16 Pro Max can almost reach "uncomfortably warm" in the hot spot if its pushed. The thing is, that only happens for me in a buggy app or a poorly coded web site. I'm not running stuff that pushes my phone to its limits. So I'd crank that throttle down to where it would only get "slightly warm" like the good old days.

The design of the Air might work better in regard to this. At least the way I hold my phone I'm not touching the camera bump, and from what I've read they put the SoC under the camera bump in the Air. So that's what should be getting hot (since it has no vapor chamber and is a titanium case that doesn't conduct heat as well) which is fine by me since I don't touch it. I've been thinking about whether I'll be buying an Air next year when I do my biennial upgrade, that may be a point in its favor.
 
  • Like
Reactions: Tlh97

DZero

Golden Member
Jun 20, 2024
1,623
629
96
Made me think on what tier is that E core, is really powerful. Made me remember to the legendary Intel Pentium M that was the base of the Core uArch

PS: Notice that I am asking for x86 counterparts, for ARM is clearly on par if not beating with the A7XX or C1 Pro stock cores
 

The Hardcard

Senior member
Oct 19, 2021
328
411
106
Is it easy just like that to double the FP16 ALU’s and keep the efficiency same or even higher ? any gpu architect here ?

Can we assume that A17 on the first gen N3 was some kind of a hiccup ?

It appears that Apple’s new ALUs can run 3 FP16 operations concurrently and and at a higher clock speed.
 

DZero

Golden Member
Jun 20, 2024
1,623
629
96
I saw this on Nanoreview.


1758136242179.png
Really interesting.
So yeah, Apple GPU wise is great, but pending to see how performs in games.

CPU wise, I see P core being maxxed out, but E core wise, Apple still has cards.
 
  • Like
Reactions: Tlh97

name99

Senior member
Sep 11, 2010
652
545
136
All the money ultimately comes from consumers. Whether it is washed through a corporation on the way doesn't matter, it ultimately has to be used in a product that consumers (or the government that consumers fund via taxes) buy. Yes you're right that businesses don't "pay" in ads, so what would incentivize them to be the primary/significant funding source for AI? Reducing costs. How would AI reduce costs? By reducing headcount.

Now you can do some simple math based on the ~ $11 trillion in wages paid in the US (yes there's the rest of the world but let's just look at the US) and figure out how much you think AI needs to slice off to make its nut from selling service to businesses. Whatever percentage of that $11 trillion you take, you can probably triple it to get the percentage of people who will lose their jobs (or not be hired if it happens slowly enough they cut via attrition) because the people AI can replace will be those at the lower rungs of the wage scale.

People make "productivity" arguments about AI all the time, but don't consider what productivity is in economic terms. It is increasing revenue/profit/GDP (depending on where you're measuring it) per unit of human labor. That can come from my employer taking away the shovel I was using to dig ditches and giving me a backhoe, or from a smart guy somewhere coming up with a new way of doing things so my employer no longer needs ditches - and therefore no longer needs me and I lose my job. AI is the equivalent of a backhoe for digging ditches more quickly, it is not and will not be the equivalent of the smart guy coming up with a new way of doing things that avoids the need for ditches. So one can argue productivity all they want, but in terms of what AI can deliver it means business needing fewer employees to do the same thing.

So back to the $11 trillion, if you're only taking 1% of that $11 trillion that's not enough to even fund the Oracle deal, let alone everything else OpenAI and whoever else is ultimately a "winner" is spending money on, but that's already 3% higher unemployment. In order to feed itself from the business world, AI would have to give us double digit unemployment. That's going to reduce consumer demand for the very products and services that AI is making more "productive".

Here's where you hear the argument about how every time there have been advances, new types of jobs have appeared. They have, and they probably will again, but they don't appear right away. The industrial revolution was a very slow moving process. Computerization happened more quickly. The internet more quickly than that. AI is moving far more quickly than similar changes have in the past, but the processes that drive the creation of new types of jobs don't. If new jobs aren't appearing at the same rate they're being destroyed you don't just end up a with spike in unemployment and then a recovery, but structural unemployment that lasts much longer. And that has societal costs.
You seem to be engaged in a totally different argument than I am.
I'm not saying anything about jobs; I have no idea how that will play out.
I'm saying what I said, that there are ways to extract money from these investments that go far beyond serving ads to consumers.

And if you want to claim that the numbers don't work out, at the *very least* you can't compare a stock (more or less one time cost of setting up these companies and their data centers) with a flow (ANNUAL GDP or ANNUAL wages or whatever)...
 

poke01

Diamond Member
Mar 8, 2022
4,195
5,542
106
I’ll say this, the Apple10 GPU architecture is better than RDNA4 and Blackwell. Let’s see how XE3 does in Panther lake.
 

name99

Senior member
Sep 11, 2010
652
545
136
What's with the monster A19 Pro FP16 score?

View attachment 130421

And what does this demo test? It's handily beating M3 and M4.

View attachment 130422

EDIT:

Auto-translate says: "GPU Light Tracking Test (Magic Demo)"
Apple has apparently changed from (per core quadrant)
1 FP16 pipe and
1 FP32 pipe to

1 FP16 pipe
1 "mixed" pipe (which is basically FP16*FP16-> either FP32 or FP16
1 FP32 pipe (which can, under the right circumstances also execute FP16.This is new to A19.)

HOWEVER other infrastructure, most importantly the register cache, has not been changed much.
My assumption is that
- A18 could execute one FP16 and one FP32 in the same cycle. Theoretically interesting but practically useless because not much code uses both resolutions at the same time

- A19 VECTOR code (ie "traditional GPU code") can execute two FP16 instructions in the same cycle (more or less what we see above)

- A19 "special" code (which I assume is matrix multiply) can probably execute three FP16 instructions in the same cycle. The specific details of matrix multiply code (and a tweak in the A19) mean that only src registers per op are required rather than three, so the register cache can send out three sets of two registers to three pipelines. But you'll probably only see this boost on matrix multiply (and possibly requiring a recompile) code.
 

mvprod123

Senior member
Jun 22, 2024
398
453
96
Is it easy just like that to double the FP16 ALU’s and keep the efficiency same or even higher ? any gpu architect here ?

Can we assume that A17 on the first gen N3 was some kind of a hiccup ?
Geekerwan claims that the second gen dynamic caching architecture played a crucial role in enhancing graphics performance. In particular, this contributed to better utilisation of RT cores.