Discussion Apple Silicon SoC thread

Page 122 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Eug

Lifer
Mar 11, 2000
23,587
1,001
126
M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:

Screen-Shot-2021-10-18-at-1.20.47-PM.jpg

M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:


M1 Ultra discussion here:


M2 discussion here:


Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:

 
Last edited:

repoman27

Senior member
Dec 17, 2018
342
488
136
IIRC, this only works when swapping another Mac Studio M.2 SSD of the same size! Which is super weird. TB 3/4 run at 40Gbps (5000MBs), so it's pretty fast, just kind of ugly having all these peripherals plugged in.
Just to reiterate, those are not M.2 slots. No Mac has *ever* shipped with an M.2 slot. They are proprietary Apple NAND flash memory module slots (ANS2M?). Swapping a pair of modules from one Mac Studio to another should work fine. Trying to combine modules from two different Mac Studios to "upgrade" one of them probably won't. Once again, Hector Martin goes into some depth on this:

Thunderbolt 3/4 links may be 40 Gbit/s, but Intel is still the sole provider for device silicon, and their current crop of controllers is limited to PCIe Gen3 x4 bandwidth. That limits the PCIe throughput for any single attached device to ~2700 MB/s after accounting for protocol overhead. Not exactly shabby from a performance standpoint, but as you say, so unsightly when hanging off of your sleek Mac Studio.
 

repoman27

Senior member
Dec 17, 2018
342
488
136
To jump in there was an apple cinema from years ago that is in the same niche and it turns out it has a similar power supply.
"Maximum power: 250W (LED Cinema Display while charging MacBook Pro) " from https://support.apple.com/kb/SP597?locale=en_US although it doesn't state the specific power supply design point, a healthy 20% extra should be a minimum though.
The PSU was 250 W, power consumption while on was 93 W, and it could also provide up to 85 W to a MacBook via MagSafe. 250 / 178 = 40% over provisioning or 29% margin.

The port power budget for the Studio Display is higher (109.5 - 145 W), but the power on consumption is way lower at just 30.7 W. 285 / 175.7 = 62% over provisioning or 38% margin.

Anywho, it was enough to raise eyebrows for me at least.
 

Heartbreaker

Diamond Member
Apr 3, 2006
4,228
5,228
136
IIRC, this only works when swapping another Mac Studio M.2 SSD of the same size! Which is super weird.

ifixit tried putting both modules from two separate computers into one, and couldn't get that working, but there is likely be some simple step still unknown required to make two modules work together. Perhaps the pairs of drives meant to work together may have Primary-Secondary identifiers of some sort, and they were in essence trying to get dual primaries working together.

ifixit didn't have different sizes to try (I would ignore the guy in that other video who seemed out of his element).

Still very Early days right now.
 
Last edited:

Eug

Lifer
Mar 11, 2000
23,587
1,001
126
ifixit tried putting both modules from two separate computers into one, and couldn't get that working, but there is likely be some simple step still unknown required to make two modules work together. Perhaps the pairs of drives meant to work together may have Primary-Secondary identifiers of some sort, and they were in essence trying to get dual primaries working together.
They probably forgot to jump the Cable Select pins. ;)
 

Ajay

Lifer
Jan 8, 2001
15,458
7,862
136
Thunderbolt 3/4 links may be 40 Gbit/s, but Intel is still the sole provider for device silicon, and their current crop of controllers is limited to PCIe Gen3 x4 bandwidth. That limits the PCIe throughput for any single attached device to ~2700 MB/s after accounting for protocol overhead
Geez, that sucks. TB3 has been out for a while. Oh, TB4 looks to change that:

Wikipedia said:
Thunderbolt 4 was announced at CES 2020[83] and the final specification was released in July 2020.[84] The key differences between Thunderbolt 4 and Thunderbolt 3 are[85] a minimum bandwidth requirement of 32 Gbit/s for PCIe link, support for dual 4K displays (DisplayPort 1.4),[86] and Intel VT-d-based direct memory access protection to prevent physical DMA attacks.

That should be Gen4 PCIe x 4 lanes, or 4000 MB/s theoretical (minus overhead etc.), but NO. Max storage speeds are 3000MBps. Three GB/sec is very fast, but so much for being able to add Gen4 NVME drives later on. Well, there is hope. Intel leaked, and then deleted, info on Thunderbolt 5 - 80Gbps using PAM3 signalling. So, someday....
 

repoman27

Senior member
Dec 17, 2018
342
488
136
Some more photos from テカナリエ清水 @techanalye1:

FOq5T42VkAEAjah

Intel i4004 from 1971 next to Apple M1 Max.

FOqqWPwVcAUJoqV

This one is almost too much of a tease... We do get to see the Marvell Aquantia AQC113 6-speed 10 GbE controller at least.

edit: one more—
FOFgBSGaUAA1pV_

M1 Pro on left, M1 Max on right.
 
Jul 27, 2020
16,329
10,342
106
The OS is fine. Apple just prioritizes lower clock and fan speeds over maximum raw performance.

1648221588322.png

Plugged in only makes a slight difference of few seconds in this rendering benchmark. Also, other sites have found very little use of the high power mode option available in MacOS. Seems there could be some truth to MacOS being unpolished in areas that Apple doesn't deem important for the average Apple user. Or Apple just doesn't want to push high thermals to avoid the risk of suddenly too many RMAs.
 

Doug S

Platinum Member
Feb 8, 2020
2,266
3,516
136
So Charlie had an intriguing tweet suggesting that Mac Studio is using ECC DRAM. I assume by "count the chips" he's talking the number of chips in the LPDDR5 packages, not the number of LPDDR5 packages themselves.

Can this be true, and reviewers just haven't noticed yet? Third party software probably wouldn't pick up on this if true, since it would need to know to (and how to) look for that. But presumably Apple's own software would, but where would you look for that?

 

Eug

Lifer
Mar 11, 2000
23,587
1,001
126
So Charlie had an intriguing tweet suggesting that Mac Studio is using ECC DRAM. I assume by "count the chips" he's talking the number of chips in the LPDDR5 packages, not the number of LPDDR5 packages themselves.

Can this be true, and reviewers just haven't noticed yet? Third party software probably wouldn't pick up on this if true, since it would need to know to (and how to) look for that. But presumably Apple's own software would, but where would you look for that?

Apple System Report should be checked.
 

Eug

Lifer
Mar 11, 2000
23,587
1,001
126
So Charlie had an intriguing tweet suggesting that Mac Studio is using ECC DRAM. I assume by "count the chips" he's talking the number of chips in the LPDDR5 packages, not the number of LPDDR5 packages themselves.

Can this be true, and reviewers just haven't noticed yet? Third party software probably wouldn't pick up on this if true, since it would need to know to (and how to) look for that. But presumably Apple's own software would, but where would you look for that?

Apple System Report should be checked.
1. Apple System Report for Mac Studio makes no mention of ECC. Interestingly though, there is no mention of ECC at all. For my Intel Macs with non-ECC RAM, whether or not the RAM is removable or soldered, System Report says "ECC: Disabled". I've checked several different Intel Macs now. This is on macOS 12.3 Monterey, the same macOS version that Mac Studio runs.

2. Mac Studio's tech specs make no mention of ECC. The specs for the Mac Pro specifically mention ECC.

3. The base model Mac Studio is priced on the low side for an ECC-equipped Mac, but this is not a solid argument since Mac Pros of the past with ECC have been in the same price range (not accounting for inflation).
 

Heartbreaker

Diamond Member
Apr 3, 2006
4,228
5,228
136
Different types of ECC for different purposes.

Traditional enterprise ECC that uses extra parity bits sent to the CPU, to correct for many kinds of memory errors.

Relatively new in [LP]DDR5, there is on package ECC that corrects errors in the memory package before it's sent to the CPU.

The former may still be expected in Enterprise, even if you have the latter. The latter probably won't show up on any kind of check for ECC because it's completely internal to the memory device.

I haven't looked into it far enough, but I suspect the DDR5 ECC may be introduced because the memory is being pushed so far, it's more likely to create internal errors on it's own.
 

Doug S

Platinum Member
Feb 8, 2020
2,266
3,516
136
Different types of ECC for different purposes.

Traditional enterprise ECC that uses extra parity bits sent to the CPU, to correct for many kinds of memory errors.

Relatively new in [LP]DDR5, there is on package ECC that corrects errors in the memory package before it's sent to the CPU.

The former may still be expected in Enterprise, even if you have the latter. The latter probably won't show up on any kind of check for ECC because it's completely internal to the memory device.

I haven't looked into it far enough, but I suspect the DDR5 ECC may be introduced because the memory is being pushed so far, it's more likely to create internal errors on it's own.


LPDDR5's "link ECC" feature would not affect the number of dies in a package, so that wouldn't have anything to do with Charlie's comment.

If the Studio's Apple System Report doesn't mention ECC as Eug reports, then either Charlie was wrong about this and there is no ECC, or ECC is present and ASR needs to be updated on Apple Silicon Macs, or ECC is present but isn't enabled / operational.
 
Last edited:

repoman27

Senior member
Dec 17, 2018
342
488
136
LPDDR can do in-line ECC, link ECC, or both, none of which would affect the number of chips used. DDR5's on-die ECC doesn't require extra chips because it's built into the dies themselves. Traditional side-band ECC for DDR is the only kind that requires extra chips. Methinks Charlie might be talking out his behind.

 
  • Haha
Reactions: igor_kavinski

Eug

Lifer
Mar 11, 2000
23,587
1,001
126
M1 startup log refers to ECC.


Now, almost 7 seconds after the kernel started to boot, the remaining 7 cores are started up.

10.281418 AppleFireStormErrorHandler AppleARM64ErrorHandler: will not panic on correctible ECC errors

Firestorm are the four high-performance CPU cores in the M1.
 

repoman27

Senior member
Dec 17, 2018
342
488
136
M1 startup log refers to ECC.


Now, almost 7 seconds after the kernel started to boot, the remaining 7 cores are started up.

10.281418 AppleFireStormErrorHandler AppleARM64ErrorHandler: will not panic on correctible ECC errors

Firestorm are the four high-performance CPU cores in the M1.
And that's when booting the regular M1, not Pro, Max, or Ultra, but it isn't necessarily referring to DRAM. I have ioreg output for the M1, M1 Pro, and M1 Max, and the only reference to ECC is:
"l2-ecc-correctable-panic" = <01000000>
So perhaps the L2$ is protected by ECC?

It would be hilarious if when everyone on the internet was explaining how the M1 Macs were so much more efficient with their memory architecture that nobody really needed more than 8GB, they were at the same time incurring the overhead of in-line and/or link ECC.
 

Eug

Lifer
Mar 11, 2000
23,587
1,001
126
And that's when booting the regular M1, not Pro, Max, or Ultra, but it isn't necessarily referring to DRAM. I have ioreg output for the M1, M1 Pro, and M1 Max, and the only reference to ECC is:

So perhaps the L2$ is protected by ECC?
Makes sense*. The author does not address this in the actual article, but if you read the comments, it appears he indirectly speculates this is for the cache.

*As I have mentioned before, I am not an engineer, so this type of stuff is above my pay grade.
 
Last edited:

Doug S

Platinum Member
Feb 8, 2020
2,266
3,516
136
LPDDR can do in-line ECC, link ECC, or both, none of which would affect the number of chips used. DDR5's on-die ECC doesn't require extra chips because it's built into the dies themselves. Traditional side-band ECC for DDR is the only kind that requires extra chips. Methinks Charlie might be talking out his behind.



Synpnsos only says that the inline ECC scheme is typically used for LPDDR because sideband ECC is very expensive because LPDDR's small 16 bit channels would impose a lot of overhead. They leave out that LPDDR's inline ECC scheme is probably at least as expensive since that feature would appear to require a different die that would be produced in low volume. Or zero, for something as relatively new as LPDDR5.

Apple uses LPDDR5 to achieve much wider memory, with LPDDR5 packages that are 128 bits wide (i.e. there is one package for every 128 bits of memory controller width) So for Apple, using traditional sideband ECC would be the preferred solution since they won't take the "narrow channel" overhead hit or special order parts using different dies for a very low volume (when measured against the smartphone world that consumes most LPDDR5 output) Mac.

Apple's LPDDR5 packages presumably contain 8 x16 chips to achieve its 128 bit width - 8Gb chips for the 8GB package and 16Gb chips for the 16GB package. If they added a 9th chip to get to 144 bits wide, they could do ECC on two independent 64 bit channels using byte mode on the 9th chip. Sure, Apple's memory controllers have to talk in 16 bit channels to LPDDR5 chips, but no one is making them do ECC calculations on individual channels. That can easily be done at a 64 bit granularity.

Now none of this proves Charlie right, I'm only asserting that IF Apple is doing ECC on the Studio, they are definitely not going to use that kludgey inline ECC scheme.
 

repoman27

Senior member
Dec 17, 2018
342
488
136
Side-band requires an 8-bit ECC code word for each channel, even when the channel is only 16-bit as in the case of LPDDR, which is why it's utterly impractical. DDR with 64-bit channels used 72-bit DIMMs for ECC. DDR5 with two 32-bit channels per DIMM requires 80-bit (2x 40-bit) DIMMs for ECC.

In-line or in-band isn't exactly the kludge you think it is and does not require special dies. Intel's implementation for Tiger Lake platforms works for standard DDR4 and LPDDR4/4X/5 and results in overhead of 1/32 of total DRAM size. It is also implemented on Alder Lake, but currently only supported under Chrome OS.

I highly doubt Apple is doing ECC of any kind with the M1 family of SoCs because it's not a sensible tradeoff for personal computers / workstations that can only have between 8 and 128 GB of DRAM. But when it comes to LPDDR, nobody is doing side-band ECC.
 

Doug S

Platinum Member
Feb 8, 2020
2,266
3,516
136
Side-band requires an 8-bit ECC code word for each channel, even when the channel is only 16-bit as in the case of LPDDR, which is why it's utterly impractical. DDR with 64-bit channels used 72-bit DIMMs for ECC. DDR5 with two 32-bit channels per DIMM requires 80-bit (2x 40-bit) DIMMs for ECC.

In-line or in-band isn't exactly the kludge you think it is and does not require special dies. Intel's implementation for Tiger Lake platforms works for standard DDR4 and LPDDR4/4X/5 and results in overhead of 1/32 of total DRAM size. It is also implemented on Alder Lake, but currently only supported under Chrome OS.

I highly doubt Apple is doing ECC of any kind with the M1 family of SoCs because it's not a sensible tradeoff for personal computers / workstations that can only have between 8 and 128 GB of DRAM. But when it comes to LPDDR, nobody is doing side-band ECC.


Apple does not have to do ECC calculations on 16 bit wide channels just because that's LPDDR5's basic unit. They could choose to treat a gang of 4 or even 8 16 bit channels as a single unit for ECC calculations. Who is going to tell them they can't? They would still have JEDEC compliant LPDDR5 memory controllers and use JEDEC compliant LPDDR5; there's no one to enforce how they must calculate ECC on the result.
 
  • Like
Reactions: Tlh97 and scannall

repoman27

Senior member
Dec 17, 2018
342
488
136
Apple does not have to do ECC calculations on 16 bit wide channels just because that's LPDDR5's basic unit. They could choose to treat a gang of 4 or even 8 16 bit channels as a single unit for ECC calculations. Who is going to tell them they can't? They would still have JEDEC compliant LPDDR5 memory controllers and use JEDEC compliant LPDDR5; there's no one to enforce how they must calculate ECC on the result.
At a certain point, the laws of physics will tell them they can't.

The ECC calculations do have to be done per channel; that is the only way it works. Yes, you can run LPDDR dies in parallel to create wider channels, but you're sharing the command/address lines when you do that and increasing the loading on them. Five dies for a single-rank 80-bit channel with side-band ECC might be possible from a loading perspective, but you would run into other issues by creating an implementation like that. Power and performance would both take a pretty serious hit, and you would give up most of the benefits of LPDDR by attempting to use it like conventional DDR. Also, the problem of how you build the actual package (die stacking and wire bonding) becomes non-trivial.

This document pertains to LPDDR4, so it assumes dual-channel dies, but it still does a pretty good job of describing the different ways you can connect multiple LPDDR dies and highlighting the pros and cons of each method:

I think we can agree that Apple is one of the top purchasers of LPDDR DRAM, and they obviously have no problem using non-JEDEC standard package sizes and ball layouts. However, I'm pretty sure the DRAM they buy is still bog-standard. Photos of the M1 Max show that Apple is using LPDDR5 packages from both SK hynix and Samsung. These packages have to be something that those suppliers can actually produce given their respective capabilities from DRAM parts that they currently manufacture in volume.
 
  • Like
Reactions: Tlh97 and Lodix

Doug S

Platinum Member
Feb 8, 2020
2,266
3,516
136
At a certain point, the laws of physics will tell them they can't.

The ECC calculations do have to be done per channel; that is the only way it works. Yes, you can run LPDDR dies in parallel to create wider channels, but you're sharing the command/address lines when you do that and increasing the loading on them. Five dies for a single-rank 80-bit channel with side-band ECC might be possible from a loading perspective, but you would run into other issues by creating an implementation like that. Power and performance would both take a pretty serious hit, and you would give up most of the benefits of LPDDR by attempting to use it like conventional DDR. Also, the problem of how you build the actual package (die stacking and wire bonding) becomes non-trivial.

This document pertains to LPDDR4, so it assumes dual-channel dies, but it still does a pretty good job of describing the different ways you can connect multiple LPDDR dies and highlighting the pros and cons of each method:

I think we can agree that Apple is one of the top purchasers of LPDDR DRAM, and they obviously have no problem using non-JEDEC standard package sizes and ball layouts. However, I'm pretty sure the DRAM they buy is still bog-standard. Photos of the M1 Max show that Apple is using LPDDR5 packages from both SK hynix and Samsung. These packages have to be something that those suppliers can actually produce given their respective capabilities from DRAM parts that they currently manufacture in volume.


You don't need to do anything special. If you issue a 64 bit load across four 16 bit channels, you operate them independently to point to the desired bytes. Why do you think you need to "share" then? If you want to load a 64 bit value on an M1, how do you think that works? You don't really believe that 8 consecutive bytes will be stored on a single LPDDR5 channel, do you? They will be interleaved 16 bits at a time across multiple channels - at the very least 64 bits across 4 LPDDR5 channels, but if I had to bet I'd say 128 bits across 8 channels is far more likely given the layout of the M1.