Discussion Apple Silicon SoC thread

Eug · Nov 10, 2020

M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:

M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:

Page 78 - Discussion - Apple Silicon SoC thread

Page 78 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M1 Ultra discussion here:

Page 109 - Discussion - Apple Silicon SoC thread

Page 109 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M2 discussion here:

Page 127 - Discussion - Apple Silicon SoC thread

Page 127 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:

Page 215 - Discussion - Apple Silicon SoC thread

Page 215 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

Matthewhoward · Mar 29, 2022

waiting for the benchmark

repoman27 · Mar 29, 2022

Doug S said:
You don't need to do anything special. If you issue a 64 bit load across four 16 bit channels, you operate them independently to point to the desired bytes. Why do you think you need to "share" then? If you want to load a 64 bit value on an M1, how do you think that works? You don't really believe that 8 consecutive bytes will be stored on a single LPDDR5 channel, do you? They will be interleaved 16 bits at a time across multiple channels - at the very least 64 bits across 4 LPDDR5 channels, but if I had to bet I'd say 128 bits across 8 channels is far more likely given the layout of the M1.

I was referring to electrical loading, not as in load/store. The DRAM dies all have to be connected electrically to the memory controller in order to function.

The cache line size for the M1 is 128 B, and the minimum burst size for an LPDDR5 channel is 32 beats, which works out to 64 B for a 16-bit channel or 128 B for a 32-bit channel. So 8 consecutive bytes absolutely will be read from / written to a single channel because it would be inefficient to do otherwise. This also points out that Apple probably isn't using a channel width greater than 32-bit because burst length would exceed the cache line size.

With in-line ECC you have to issue a separate overhead read or write command for the ECC code, but if you've done things correctly, that code is being read from or written to a page that is already open. If you used a separate channel for ECC, how would that possibly work? If you had multiple independent channels that all had to read or write ECC code to a separate shared channel in order to complete a transfer, it would be a disaster. Even if you could cleverly sort out the addressing, you'd still be in a near constant state of page-miss and write/read transition. The latency would be appalling. ECC has to be implemented at the physical channel level.

Doug S · Mar 29, 2022

repoman27 said:
I was referring to electrical loading, not as in load/store. The DRAM dies all have to be connected electrically to the memory controller in order to function.

The cache line size for the M1 is 128 B, and the minimum burst size for an LPDDR5 channel is 32 beats, which works out to 64 B for a 16-bit channel or 128 B for a 32-bit channel. So 8 consecutive bytes absolutely will be read from / written to a single channel because it would be inefficient to do otherwise. This also points out that Apple probably isn't using a channel width greater than 32-bit because burst length would exceed the cache line size.

With in-line ECC you have to issue a separate overhead read or write command for the ECC code, but if you've done things correctly, that code is being read from or written to a page that is already open. If you used a separate channel for ECC, how would that possibly work? If you had multiple independent channels that all had to read or write ECC code to a separate shared channel in order to complete a transfer, it would be a disaster. Even if you could cleverly sort out the addressing, you'd still be in a near constant state of page-miss and write/read transition. The latency would be appalling. ECC has to be implemented at the physical channel level.

I don't understand your objection electrical loading wise. The M1 has one LPDDR5 package for every 128 bits of width, i.e. one LPDDR5 package is connected to eight LPDDR5 channels. Most likely, the package has 8 x16 LPDDR5 chips, so each controller is connected to exactly one chip. What I had described added a 9th controller, and a 9th chip in the package. How is that going to create problems with electrical loading? Everything is still connected one to one.

Thanks for the pointer about the burst length in LPDDR5. I was trying to ignore cache lines, hoping to simplify things by talking about a single load at a time rather than a whole cache line. But I didn't realize the burst length for LPDDR was so long - I assumed it was shorter to save power but I suppose it makes sense to have longer bursts to allow for filling a 64 byte cache line from a single x16 device with one command.

So after looking into it, according to Synopsys at speeds above 3200 Mbps you can either use bank group mode with burst lengths of 16 or 32, or eight bank mode with burst length of 32. So you'd have a choice, either feed a 128 byte cache line out of four parallel devices (64 bits) using bank group mode or out of two parallel devices (32 bits) The former would seem to be the desirable choice, since you'd reduce the time required to operate on a full cache line, unless there is some gotcha with bank group mode? So what I had described is feasible using bank group mode and BL16.

But you are 100% correct that carrying the ECC on a separate channel would a problem if done as I describe. Let's say you have 9 LPDDR5 channels in a complex (i.e. the channels linked to a single LPDDR5 package on the M1) The first four we'll call unit 1 and the next four unit 2, and the last is unit 3. Let's ignore pages, rows and columns, and simplify things by talking about addresses, starting at '0'. That's reasonable since all the LPDDR5 chips in a package are identical, each is connected to its own controller, so therefore the bank/page/row/column for address '0' (or any address 'x') is the same for all chips in the package.

Let's say you do a load from address '0' of each device in unit 1. That would give you eight bytes (then the burst gives you address '2' from each device for another eight bytes and so on) You can also do a load from address '0' of each device in unit 2, which also gives you eight bytes. If your ECC is in unit 3, then a load from address '0' gives you the ECC for unit 1 in the first byte and the ECC for unit 2 in the second byte.

You're correct that unit 3's channel would be overloaded trying to share ECC between the other two units. It would be fine if every time you loaded from address 'x' in unit 1 you were also loading from address 'x' in unit 2 - that way you could simultaneously load from address 'x' in unit 3 and get the ECC information you require with no conflicts anywhere. But if you are loading from address 'x' in unit 1 and address 'y' in unit 2, you need one byte from address 'x' and one byte from address 'y' in unit 3, and that's not going to work. The only way what I describe would work is if you parallel across all eight controllers, so the ECC in the 9th is at the same "address" (i.e. same page/row/column) which due to the minimum burst length would require Apple's SLC have a line size of 256. We know the L1 has a line size of 128, but I can't find anything listing the line size of the L3/SLC. Anyone know?

Mopetar · Mar 29, 2022

Doug S said:
Apple does not have to do ECC calculations on 16 bit wide channels just because that's LPDDR5's basic unit. They could choose to treat a gang of 4 or even 8 16 bit channels as a single unit for ECC calculations. Who is going to tell them they can't? They would still have JEDEC compliant LPDDR5 memory controllers and use JEDEC compliant LPDDR5; there's no one to enforce how they must calculate ECC on the result.

Isn't the ECC usually baked in to the hardware for the RAM? The odds of a one-off error occurring during transmission is vanishingly small.

It doesn't make sense to do error checking or correction on the chip itself because it adds the extra overhead of transmitting the parity bits. If you're using non-ECC memory you need to develop software to simulate it, which isn't going to be nearly as effective as just using ECC RAM.

ECC RAM usually has an extra memory chip on the DIMMs over non-ECC memory. The extra chip is what stores the parity bits needed to detect or correct the errors. That's why I'm assuming Charlie was saying to count the chips.

Doug S · Mar 29, 2022

Mopetar said:
Isn't the ECC usually baked in to the hardware for the RAM? The odds of a one-off error occurring during transmission is vanishingly small.

It doesn't make sense to do error checking or correction on the chip itself because it adds the extra overhead of transmitting the parity bits. If you're using non-ECC memory you need to develop software to simulate it, which isn't going to be nearly as effective as just using ECC RAM.

ECC RAM usually has an extra memory chip on the DIMMs over non-ECC memory. The extra chip is what stores the parity bits needed to detect or correct the errors. That's why I'm assuming Charlie was saying to count the chips.

Traditional ECC lives entirely in the memory controller. The DIMM is just different in terms of the number of bits as you say. That's what I'm describing here, a hypothetical scheme that Apple could use for ECC that would require one additional chip (the in-band ECC LPDDR5 natively supports would not require an extra chip, though I suppose if Charlie heard the Studio has LPDDR5 but didn't know how it was implemented...)

LPDDR's in band reserves some of the memory in the chip for ECC. Not really sure how it works, maybe Mopetar has a link that better describes it. I guess it is doing ECC over a full line at once, since he says it only uses 1/32 of the capacity so it can't operate over something as small as 64 bits at once like the ECC DDRx does.

LPDDR5 also supports something called "link ECC" which provides ECC for the data transmission that you would use in addition to in band ECC to get the same level of protection that DDRx's ECC offered . You might think the chance of errors during transmission are "vanishingly small", but you'd be wrong. I mean, on a per read/write basis it is very low, but when you have memory interfaces transferring 50 GB/sec, even a rate of one error per petabit is a big problem if you care about data integrity.

repoman27 · Mar 30, 2022

Doug S said:
I don't understand your objection electrical loading wise. The M1 has one LPDDR5 package for every 128 bits of width, i.e. one LPDDR5 package is connected to eight LPDDR5 channels. Most likely, the package has 8 x16 LPDDR5 chips, so each controller is connected to exactly one chip. What I had described added a 9th controller, and a 9th chip in the package. How is that going to create problems with electrical loading? Everything is still connected one to one.

If you run each x16 interface independently, there is no issue with loading. You were talking about how Apple could treat several x16 interfaces as a single wider interface. Electrically, that means connecting the data lines in parallel and the CA lines in series. I believe you were thinking that Apple could somehow do that logically while keeping the channels independent electrically, but that wouldn't work in practice for the reasons I pointed out. So instead I went down the path of considering what might actually be possible as far as running LPDDR dies in parallel to do side-band ECC.

Doug S said:
Thanks for the pointer about the burst length in LPDDR5. I was trying to ignore cache lines, hoping to simplify things by talking about a single load at a time rather than a whole cache line. But I didn't realize the burst length for LPDDR was so long - I assumed it was shorter to save power but I suppose it makes sense to have longer bursts to allow for filling a 64 byte cache line from a single x16 device with one command.

So after looking into it, according to Synopsys at speeds above 3200 Mbps you can either use bank group mode with burst lengths of 16 or 32, or eight bank mode with burst length of 32. So you'd have a choice, either feed a 128 byte cache line out of four parallel devices (64 bits) using bank group mode or out of two parallel devices (32 bits) The former would seem to be the desirable choice, since you'd reduce the time required to operate on a full cache line, unless there is some gotcha with bank group mode? So what I had described is feasible using bank group mode and BL16.

The minimum burst doesn't need to fill the cache line, but it would be pretty inefficient if it were to overflow it. The reason for increasing prefetch/burst length is to keep the data bus full. The DRAM cores are only operating at 100-266 MHz and need to fill a 6400 Mbit/s bus. A 32n prefetch will get you there, but you can also use multiple bank groups to keep the bus full and not have the minimum burst exceed the cache line size. LPDDR5X apparently ditched 8 Bank mode in favor of Bank Group mode, so I'm assuming that's actually the preferred way of doing things. But if you drop down to BL16, you're relying on interleave between the bank groups to keep the bus full, which makes the timings a bit trickier.

Doug S said:
But you are 100% correct that carrying the ECC on a separate channel would a problem if done as I describe.

Yes. You can either do side-band ECC or in-line ECC. Side-band is not practical for LPDDR; in-line is the preferred solution.

Mopetar said:
Isn't the ECC usually baked in to the hardware for the RAM? The odds of a one-off error occurring during transmission is vanishingly small.

It doesn't make sense to do error checking or correction on the chip itself because it adds the extra overhead of transmitting the parity bits. If you're using non-ECC memory you need to develop software to simulate it, which isn't going to be nearly as effective as just using ECC RAM.

ECC RAM usually has an extra memory chip on the DIMMs over non-ECC memory. The extra chip is what stores the parity bits needed to detect or correct the errors. That's why I'm assuming Charlie was saying to count the chips.

ECC is traditionally implemented as a side-band solution for DDR SDRAM. You add additional DRAM devices in parallel to store the Hamming code used for ECC. For a 64-bit DDR channel you add an x8 DRAM which widens the channel to 72-bits. Every time the memory controller performs a read or write transaction it also reads or writes the corresponding ECC code word, which it can use upon read completion to correct any single bit error or detect double errors (SECDED). The ECC calculations are performed by the memory controller, but the ECC code words are stored in a separate DRAM. Side-band ECC provides end-to-end protection against single bit errors.

DDR5 implements on-die ECC, which happens entirely on an individual DRAM device. For every 128 bits of data, the DDR5 die includes an additional 8 bits for ECC storage. The DRAM device calculates and stores the ECC data internally and can use it to correct any single bit error during a read operation. This does not cover errors that occur in the channel, so DDR5 still requires additional side-band ECC for end-to-end protection. Because DDR5 DIMMs are organized as two separate 32-bit sub-channels, two x8 DRAM devices are added to create an 80-bit ECC DIMM with two 40-bit sub-channels.

Side-band ECC isn't practical for LPDDR with its 16-bit channels and devices, so in-line ECC is used instead. It is implemented on the memory controller hardware just like side-band ECC, however, the ECC code is stored alongside the data in the same DRAM device. One possible implementation would be to reserve one beat out of every burst for the ECC code word, that way every transfer is covered. This of course requires an additional read or write to get a full burst worth of data, but sequential transfers would be much less affected by the overhead. The in-line or in-band technique is an end-to-end solution and can be applied to any conventional non-ECC DRAM channel, however it is primarily used for GDDR and LPDDR systems.

LPDDR5/X also offers link ECC which uses the DMI and RDQS_t signals to transmit ECC code out-of-band to cover both data and DMI in transit. The ECC code is generated by the memory controller and checked by the DRAM device for writes, with the roles being reversed for reads. The ECC code is only used during transfers and is not stored in DRAM. Link ECC can correct any single bit errors that occur in the channel, but it does impose a slight latency penalty.

edit: to add to an already lengthy post... Bit error rates can become problematic when pushing ever higher transfer rates (6400 - 8533 Mbit/s) and DRAM densities (16-32 Gbit on 1ɑ node). LPDDR is also being used extensively for automotive applications and industrial control systems where errors could result in seriously bad things happening.

Doug S · Mar 30, 2022

For those hoping for Linux on M1 to move from hobbyist to useful work category, Linus just stated in RWT that he's going to be switching to an M1 laptop in a few months. So both ARM64 in general and M1 in particular are going to get some additional attention soon.

This isn't because he has any particular love for Apple, but because he's been saying he wants a usable ARM64 dev platform forever and Apple (along with a bunch of crazy people porting Linux to their hardware) happened to get there first.

igor_kavinski · Mar 30, 2022

Let's hope Apple doesn't decide they are getting unwanted attention because every Apple device not connected to their store is not in line with their base strategy of making mountains of cash. They may not be able to stop current owners but they could further harden future CPU's to alternative OS booting.

Doug S · Mar 31, 2022

igor_kavinski said:
Let's hope Apple doesn't decide they are getting unwanted attention because every Apple device not connected to their store is not in line with their base strategy of making mountains of cash. They may not be able to stop current owners but they could further harden future CPU's to alternative OS booting.

The "Mac store" is not used much, almost all applications are sold and installed the regular way, so Apple doesn't care about that.

They have had plenty of opportunity to block the M1 Linux folks, but they've gone the other way. They actually make it easier to dual boot with the M1 Macs versus x86 Macs, offering a way to support two secure boot operating systems at once - something x86 UEFI PCs do not.

Ajay · Apr 1, 2022

I'm not going to link all the videos/stories I've looked into, but it sure does appear that the Mac Studio is being constrained somehow in performance. With the M1 Max SoC, performance is similar between the MacBook Pro and the Mac Studio, despite radically different thermals. Some suggest this is just segmenting - Apple doesn't want the Studio to have performance advantage over the MacBook Pro with similar hardware. Looking at power usage numbers, I'm curious as to whether or not there is just some inherent problem with the M1 Max right now. It is the largest SoC Apple has designed. I believe @Doug S made a comment about this being Apple's first shot at this silicon. Maybe there are issues that haven't been solved yet - in terms of internal throughput or power balance, etc. Of course, there could be firmware/OS optimizations that could improve performance in the future - and these new devices are showing themselves to be dependent on software being optimized to take full advantage of the new architecture.

In any case, other than the price, my personal interest in switching over has waned quite a bit. Seem like there is definitely a fair bit of room for improvement, maybe improvements in the M2 architecture will ameliorate some of this problems. I find this particularly important for for a system that cannot be upgrade (I'm spoiled by years of x86 PCs). Anyway, my two cents from a guy wanting to leave Windows in the dust and finding Linux a bit more challenging in meeting my application needs than I expected.

Doug S · Apr 1, 2022

I don't think we should find it surprising that M1 Max performance is similar between the Macbook and Studio. It isn't like they have a higher power limit SKU available for the Studio, so if the Max is never thermally throttled in the Macbook it will perform the same as one in the Studio.

The Studio's cooling system sure seems to be massive overkill for the M1 Max and M1 Ultra power dissipation. We'll have to wait and see if there is some reason for that - i.e. the Studio will get a 4 SoC version in the M2 generation, the M2 generation will allow higher clock rates at higher power consumption for less thermally limited scenarios, or Apple's engineers were told "make it basically silent at all performance levels".

Eug · Apr 1, 2022

Reviewers like Max keep claiming that Mac Studio is software constrained, but the reasoning is rather dubious.

One difference though is that everyone says the Mac Studio is always quiet, whereas the MacBook Pro in some reviews will ramp up the fans under continued full load.

Ajay · Apr 2, 2022

Eug said:
Reviewers like Max keep claiming that Mac Studio is software constrained, but the reasoning is rather dubious.

One difference though is that everyone says the Mac Studio is always quiet, whereas the MacBook Pro in some reviews will ramp up the fans under continued full load.

Well, I think the only concrete example he shows is Apple's own Final Cut Pro - so it does stand to reason that some other software packages may need some tweaks to take advantage of the Max/Ultra's increased resources. Other software that's lagging either hasn't been re-compiled for Apple's native ARM uarch, or doesn't support Metal, at least from what I've seen. It's kind of like the PPC -> x86 transition; it takes time. I haven't looked it up, but if Xcode supported cross compiling, that would make the transition much easier.

Eug · Apr 2, 2022

Ajay said:
Well, I think the only concrete example he shows is Apple's own Final Cut Pro - so it does stand to reason that some other software packages may need some tweaks to take advantage of the Max/Ultra's increased resources. Other software that's lagging either hasn't been re-compiled for Apple's native ARM uarch, or doesn't support Metal, at least from what I've seen. It's kind of like the PPC -> x86 transition; it takes time. I haven't looked it up, but if Xcode supported cross compiling, that would make the transition much easier.

Sorry, I wasn't clear. What I mean is that IIRC from my understanding of his reviews, he implying that there could be a software/OS limiter on the SoC, preventing it from maxing out performance. Sort of a pre-emptive throttling, if you will. For example, he sees the clock speed listed at 3.0x GHz in one test, and suggests there is a limitation there because it's not 3.2 GHz perhaps to maintain low temps.

Personally, I think he's just jumping to conclusions. If you look at other comparative reviews elsewhere, you'll see that the M1 Max in the MacBook Pro performs pretty much exactly the same as the M1 Max in the Mac Studio. However, the difference is that the Mac Studio is silent, whereas the MacBook Pro sometimes is not. Not loud, but not silent.

You think if there were truly such throttling going on, it'd kick in earlier with the MacBook Pro to keep it quiet.

Ajay · Apr 2, 2022

Eug said:
Sorry, I wasn't clear. What I mean is that IIRC from my understanding of his reviews, he implying that there could be a software/OS limiter on the SoC, preventing it from maxing out performance. Sort of a pre-emptive throttling, if you will. For example, he sees the clock speed listed at 3.0x GHz in one test, and suggests there is a limitation there because it's not 3.2 GHz perhaps to maintain low temps.

Personally, I think he's just jumping to conclusions. If you look at other comparative reviews elsewhere, you'll see that the M1 Max in the MacBook Pro performs pretty much exactly the same as the M1 Max in the Mac Studio. However, the difference is that the Mac Studio is silent, whereas the MacBook Pro sometimes is not. Not loud, but not silent.

You think if there were truly such throttling going on, it'd kick in earlier with the MacBook Pro to keep it quiet.

I understand, thanks. If anything, Apple could be holding things back a bit in Firmware, but for what reason I don't know. Apple may just be playing it 'safe' with M1 Max voltages to max out yields or, as Doug suggested, maybe Apple just wanted the Studio to be super quiet. Whatever the case, my thinking is that Apple won't make any changes till the M2s come along. I don't think there is any incentive for Apple to bump up performance by 10% with some later firmware release. Oh well - this has been fun

.

Eug · Apr 2, 2022

Ajay said:
Whatever the case, my thinking is that Apple won't make any changes till the M2s come along.

Agreed.

In any case, personally I just want a Mac mini with more ports, but something which isn't as chonky and pricey as the Mac Studio. I don't think we'll get it with the M2, but there's still hope for a mythical M1 Pro or M2 Pro Mac mini.

Ajay · Apr 2, 2022

Eug said:
Agreed.

In any case, personally I just want a Mac mini with more ports, but something which isn't as chonky and pricey as the Mac Studio. I don't think we'll get it with the M2, but there's still hope for a mythical M1 Pro or M2 Pro Mac mini.

Hmm, I thought Apple was done with M1 product releases. I would think the next Mac Mini would have an M2 chip - maybe I just don't understand Apple's comments. Problem for me with the Mini is the 16GB limitation. I would want to be able to spool up a VM with Win10 if I needed. Also, the graphics performance, even on an M1 Pro, are less than I'd like for future gaming (not that I game a ton, but I do still enjoy gaming).

Heartbreaker · Apr 2, 2022

Ajay said:
Hmm, I thought Apple was done with M1 product releases. I would think the next Mac Mini would have an M2 chip - maybe I just don't understand Apple's comments. Problem for me with the Mini is the 16GB limitation. I would want to be able to spool up a VM with Win10 if I needed. Also, the graphics performance, even on an M1 Pro, are less than I'd like for future gaming (not that I game a ton, but I do still enjoy gaming).

Apple said it was done with new M1 "chips". M1-Ultra being the last. So there is no M1-Extreme coming for the Mac Pro.

Nothing stopping them from releasing more models based on M1.

Eug · Apr 2, 2022

Ajay said:
Hmm, I thought Apple was done with M1 product releases. I would think the next Mac Mini would have an M2 chip - maybe I just don't understand Apple's comments. Problem for me with the Mini is the 16GB limitation. I would want to be able to spool up a VM with Win10 if I needed. Also, the graphics performance, even on an M1 Pro, are less than I'd like for future gaming (not that I game a ton, but I do still enjoy gaming).

guidryp said:
Apple said it was done with new M1 "chips". M1-Ultra being the last. So there is no M1-Extreme coming for the Mac Pro.

Nothing stopping them from releasing more models based on M1.

Also, while they did say the iMac 27" is dead, some pundits are still trying to convince us (or convince themselves?) a new "iMac Pro" is coming eventually, along with a new high end Mac mini. And Ming-Chi Kuo is claiming the new updated MacBook Air will use M1 again, although Mark Gurman says it will be M2.

All we know for sure is Apple is still selling the high end Intel Mac mini. I am hopeful that the pundits are right, and that it's going to be replaced with a higher end Apple Silicon model, but I'll admit it's possible Apple just deletes it and replaces it with nothing.

jpiniero · Apr 2, 2022

If they were going to make the M1 Pro an option on the Mini, they would have done so by now.

Heartbreaker · Apr 2, 2022

jpiniero said:
If they were going to make the M1 Pro an option on the Mini, they would have done so by now.

Probably, so more likely they will fill that gap with an M2 mini. The lowest end M2 chip will likely perform slightly better than M1-Mini, improve port situation a bit, and allow for 32 GB of RAM.

Ticking most of the M1-Pro boxes people want in an M1-Pro Mini, but with a smaller, less expensive to produce chip than an M1-Pro.

M1 Mini stays in production for the low end, M2 Mini is midrange, and Studio is high end.

Doug S · Apr 2, 2022

Eug said:
Sorry, I wasn't clear. What I mean is that IIRC from my understanding of his reviews, he implying that there could be a software/OS limiter on the SoC, preventing it from maxing out performance. Sort of a pre-emptive throttling, if you will. For example, he sees the clock speed listed at 3.0x GHz in one test, and suggests there is a limitation there because it's not 3.2 GHz perhaps to maintain low temps.

Personally, I think he's just jumping to conclusions. If you look at other comparative reviews elsewhere, you'll see that the M1 Max in the MacBook Pro performs pretty much exactly the same as the M1 Max in the Mac Studio. However, the difference is that the Mac Studio is silent, whereas the MacBook Pro sometimes is not. Not loud, but not silent.

You think if there were truly such throttling going on, it'd kick in earlier with the MacBook Pro to keep it quiet.

His idea they are deliberately limiting performance by clocking lower than the 3.2 GHz default is ludicrous. What would be the goal, avoid embarrassing Intel? Leave room for a bigger jump to M2?

There is tons of cooling headroom in the Studio, if he's seeing it running at less than 3.2 GHz then it is either a bug in whatever is telling him that, or it is a bug in the Studio that's prematurely throttling. If it was a software bug we would expect it to be fixed fairly quickly, but I suppose it is possible such a bug could be present in the M1 itself.

Doug S · Apr 2, 2022

guidryp said:
Probably, so more likely they will fill that gap with an M2 mini. The lowest end M2 chip will likely perform slightly better than M1-Mini, improve port situation a bit, and allow for 32 GB of RAM.

Ticking most of the M1-Pro boxes people want in an M1-Pro Mini, but with a smaller, less expensive to produce chip than an M1-Pro.

M1 Mini stays in production for the low end, M2 Mini is midrange, and Studio is high end.

They may do like they do for iPhone, and keep M1 and M2 Minis around at the same time, with the M1 model selling at a discount and only available with the lowest end 8/256 config.

I agree they will not bother with an M1 Pro Mini. I'm sure we'll see one with the M2 this fall, so why introduce something that will be obsoleted in six months? There's a good reason to have the x86 Mini on the price list for now, there are some customers who will prefer Intel regardless of any performance disadvantage to the M1 because they need to run some Windows stuff. Assuming the Qualcomm exclusive ends soon and Microsoft can officially support Macs running Windows/ARM in a VM, that worry disappears.

Ajay · Apr 2, 2022

Doug S said:
If it was a software bug we would expect it to be fixed fairly quickly, but I suppose it is possible such a bug could be present in the M1 itself.

Apple did have a lot of turn over in it's processor design teams around the time the M series was under development. Anyway, we won't know unless somebody pries the info out of Apple's cold dead hands.

Doug S said:
Assuming the Qualcomm exclusive ends soon and Microsoft can officially support Macs running Windows/ARM in a VM, that worry disappears.

Oh, forgot about that. Was wondering why Parallel's was in still in Beta and VMWare Fusion was AWOL for ARM Macs.

repoman27 · Apr 2, 2022

Eug said:
Sorry, I wasn't clear. What I mean is that IIRC from my understanding of his reviews, he implying that there could be a software/OS limiter on the SoC, preventing it from maxing out performance. Sort of a pre-emptive throttling, if you will. For example, he sees the clock speed listed at 3.0x GHz in one test, and suggests there is a limitation there because it's not 3.2 GHz perhaps to maintain low temps.

Personally, I think he's just jumping to conclusions. If you look at other comparative reviews elsewhere, you'll see that the M1 Max in the MacBook Pro performs pretty much exactly the same as the M1 Max in the Mac Studio. However, the difference is that the Mac Studio is silent, whereas the MacBook Pro sometimes is not. Not loud, but not silent.

You think if there were truly such throttling going on, it'd kick in earlier with the MacBook Pro to keep it quiet.

Is this a single-threaded benchmark or something? Because as Andrei pointed out in his original review, the Firestorm performance cores in the M1 Max top out at 3036 MHz when you have all four cores in the same cluster active. 3228 MHz is the max for a single core per cluster, and 3132 MHz for two cores. I realize "Apple" + "throttling" is a perpetually trending combination, but this would seem to be the expected behavior, no?

Discussion Apple Silicon SoC thread

Lifer

Junior Member

Senior member

Platinum Member

Diamond Member

Platinum Member

Senior member

Platinum Member

Lifer

Platinum Member

Lifer

Platinum Member

Lifer

Lifer

Lifer

Lifer

Lifer

Lifer

Diamond Member

Lifer

Lifer

Diamond Member

Platinum Member

Platinum Member

Lifer

Senior member