Question Apple Silicon M series thread

Page 124 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Eug

Lifer
Mar 11, 2000
23,250
730
126
Is this a single-threaded benchmark or something? Because as Andrei pointed out in his original review, the Firestorm performance cores in the M1 Max top out at 3036 MHz when you have all four cores in the same cluster active. 3228 MHz is the max for a single core per cluster, and 3132 MHz for two cores. I realize "Apple" + "throttling" is a perpetually trending combination, but this would seem to be the expected behavior, no?
Sounds spot on. That was multi-core IIRC, so normal behaviour.
 

Doug S

Golden Member
Feb 8, 2020
1,110
1,655
106
Is this a single-threaded benchmark or something? Because as Andrei pointed out in his original review, the Firestorm performance cores in the M1 Max top out at 3036 MHz when you have all four cores in the same cluster active. 3228 MHz is the max for a single core per cluster, and 3132 MHz for two cores. I realize "Apple" + "throttling" is a perpetually trending combination, but this would seem to be the expected behavior, no?
That clock limiting behavior made little sense in the M1 Mini, it makes even less sense in the M1 Max / Ultra Studio. Neither are power limited since they don't run on battery, and both have way over the top cooling systems (the Mini because it inherited a system built to cool so-called "65W" Intel CPUs)

The list of possible reasons I can think of is pretty short, can anyone add to it:

1) that behavior is managed by the SoC and they didn't bother making it tweakable via software for the 1.0 version
2) the power vias to the performance core cluster can't supply enough current to operate all cores at full speed
3) they would have fewer "working" M1s if they required all cores to function at 3.2 GHz at default voltage instead of just one (i.e. it is somewhat binning related)
 

igor_kavinski

Platinum Member
Jul 27, 2020
2,536
1,292
96
The list of possible reasons I can think of is pretty short, can anyone add to it:

1) that behavior is managed by the SoC and they didn't bother making it tweakable via software for the 1.0 version
2) the power vias to the performance core cluster can't supply enough current to operate all cores at full speed
3) they would have fewer "working" M1s if they required all cores to function at 3.2 GHz at default voltage instead of just one (i.e. it is somewhat binning related)
It's Apple. They will reserve it for later so they can get existing users to upgrade in a year or two. They are only generous when they get more money.
 

Ajay

Lifer
Jan 8, 2001
11,218
5,033
136
Is this a single-threaded benchmark or something? Because as Andrei pointed out in his original review, the Firestorm performance cores in the M1 Max top out at 3036 MHz when you have all four cores in the same cluster active. 3228 MHz is the max for a single core per cluster, and 3132 MHz for two cores. I realize "Apple" + "throttling" is a perpetually trending combination, but this would seem to be the expected behavior, no?
I suppose it looks that way to some because it doesn’t appear that the Max is hitting any thermal or power limit in the Mac Studio. I’m becoming of the mind that the power monitoring software being used in these YouTube videos aren’t working correctly.
 

poke01

Member
Mar 8, 2022
27
12
41
It's Apple. They will reserve it for later so they can get existing users to upgrade in a year or two. They are only generous when they get more money.
What!!! Intel, AMD,Nvidia and Qualcomm are all saints and never do this. Only Apple sucks all your money.

I am more inclined to believe it is has nothing to do with Doug S has said but rather it how Apple made this architechure. It is mobile first and power saving and efficiency is important. The more workstation focused Mac Pro should be different as it codename for the SoC is 6500.

For reference the M1 Pro and M1 Max is 6001 and 6002 respectively.
 

jeanlain

Member
Oct 26, 2020
129
101
76
I’m becoming of the mind that the power monitoring software being used in these YouTube videos aren’t working correctly.
I think it's based on Apple's own powermetrics command. Andrei F. compared its output to power measured at the wall on the laptops and found consistent results IIRC (i.e., wall power being slightly higher in most cases).
 

Doug S

Golden Member
Feb 8, 2020
1,110
1,655
106
What!!! Intel, AMD,Nvidia and Qualcomm are all saints and never do this. Only Apple sucks all your money.

I am more inclined to believe it is has nothing to do with Doug S has said but rather it how Apple made this architechure. It is mobile first and power saving and efficiency is important. The more workstation focused Mac Pro should be different as it codename for the SoC is 6500.

For reference the M1 Pro and M1 Max is 6001 and 6002 respectively.

The Mac Pro will be based on M2, so comparing code names with M1 is pointless.

There's no way Apple will do a different SoC for the Mac Pro. The volume is too low to amortize the NRE of a whole design team doing a separate SoC.

Best case you could hope for is that Apple takes the M2 Max and does another version of it using HPC cells and TSMC's other tweaks, which based on their claims would gain 15-20% more performance (though likely at 2-3x the power) If the NRE is just a tapeout and some tweaks for timing closure but not a complete redesign, it might be doable depending on how many Studios and Mac Pros Apple thinks it can ship per year.

There's zero evidence Apple will do that though, other than the overengineered cooling in the Studio.
 
  • Like
Reactions: oak8292

moinmoin

Diamond Member
Jun 1, 2017
3,303
4,537
136
That clock limiting behavior made little sense in the M1 Mini, it makes even less sense in the M1 Max / Ultra Studio.
It's not that long ago that Intel chips had hard coded turbo tables like this. Not sure why people expect Apple to have completely free form boosting akin to AMD's PBO going with their very first line of homemade laptop/desktop chips when even Intel hasn't reached that point yet. Apple's chips so far have been highly optimized for the specific iPhone form factor where such flexibility just isn't necessary, and all the M1 variants obviously still build upon that.
 

repoman27

Senior member
Dec 17, 2018
219
275
106
Revisiting the subject of the NAND modules, I grabbed some ioreg output from a base model M1 Max Mac Studio. Here is the entry for the AppleANS3NVMeController object:

Code:
AppleANS3NVMeController  <class AppleANS3NVMeController, id 0x10000033c, registered, matched, active, busy 0 (73 ms), retain 20>
{
  "IOMatchedAtBoot" = Yes
  "IOPolledInterface" = "IONVMeControllerPolledAdapter is not serializable"
  "IOMinimumSaturationByteCount" = 8388608
  "IOMinimumSegmentAlignmentByteCount" = 4096
  "IOMaximumByteCountWrite" = 1048576
  "Physical Interconnect" = "Apple Fabric"
  "Physical Interconnect Location" = "Internal"
  "Vendor Name" = "Apple"
  "Serial Number" = "XXXXXXXXXXXXXXXX"
  "IOMaximumSegmentByteCountWrite" = 4096
  "IOMaximumByteCountRead" = 1048576
  "Model Number" = "APPLE SSD AP0512R"
  "IOPropertyMatch" = {"role"="ANS2"}
  "AppleNANDStatus" = "Ready"
  "IOCommandPoolSize" = 253
  "Chipset Name" = "SSD Controller"
  "IOPersonalityPublisher" = "com.apple.iokit.IONVMeFamily"
  "IOPowerManagement" = {"DevicePowerState"=1,"CurrentPowerState"=1,"CapabilityFlags"=32768,"MaxPowerState"=1}
  "Firmware Revision" = "387.100."
  "NVMe Revision Supported" = "1.10"
  "CFBundleIdentifier" = "com.apple.iokit.IONVMeFamily"
  "IOMaximumSegmentCountWrite" = 256
  "IOProviderClass" = "RTBuddyService"
  "IOReportLegendPublic" = Yes
  "IOMaximumSegmentByteCountRead" = 4096
  "IOClass" = "AppleANS3NVMeController"
  "CFBundleIdentifierKernel" = "com.apple.iokit.IONVMeFamily"
  "IOPlatformPanicAction" = 0
  "IOMaximumSegmentCountRead" = 256
  "DeviceOpenedByEventSystem" = Yes
  "IOReportLegend" = ({"IOReportChannels"=((5644784279684675442,8590065666,"NVMe Power States")),"IOReportGroupName"="NVMe","IOReportChannelInfo"={"IOReportChannelUnit"=72058115876454424}})
  "IOMatchCategory" = "IODefaultMatchCategory"
  "Controller Characteristics" = {"default-bits-per-cell"=3,"firmware-version"="387.100.","controller-unique-id"="XXXXXXXXXXXXXXXX    ","capacity"=512000000000,"pages-per-block-mlc"=1152,"pages-in-read-verify"=384,"sec-per-full-band-slc"=52224,"pages-per-block0"=0,"cell-type"=3,"bytes-per-sec-meta"=16,"Preferred IO Size"=1048576,"program-scheme"=0,"bus-to-msp"=(0,0,1,1,2,2,3,3),"num-dip"=34,"nand-marketing-name"="itlc_3d_g4_2p_256               ","package_blocks_at_EOL"=31110,"sec-per-full-band"=156672,"cau-per-die"=2,"page-size"=16384,"pages-per-block-slc"=384,"sec-per-page"=4,"nand-device-desc"=3248925,"num-bus"=8,"block-pairing-scheme"=0,"chip-id"="S5E","Encryption Type"="AES-XTS","vendor-name"="Toshiba         ","blocks-per-cau"=974,"dies-per-bus"=(3,2,2,2,2,2,2,2),"msp-version"="2.8.10.2.0      ","manufacturer-id"=<983c98b3f6e30000>}
  "IOProbeScore" = 300000
}
Despite the NAND being on modules for the Mac Studio, the SSD model number and pretty much all of the other parameters are exactly the same as they are on the M1 Pro and Max MacBook Pros (ugh, these names). Hector Martin referred to the M1's embedded SSD controller as "ANS2", but it looks like this may actually be an iteration thereof called "ANS3". And the "S5E", which he described as Apple's "raw NAND controller/bridge", seems to also be referred to here as an "msp" or memory signal processor. This is something I guess I never realized was going on with Apple's in-house designed SSDs and goes back to their acquisition of Anobit in 2012 (Flash Memory Summit 2010 presentation on Memory Signal Processing). So MSP is yet another differentiating factor for Apple's SSDs. However, I'm not convinced that Hector Martin is correct in his assessment of PCI Express being used for the signaling between the NAND packages and SoC. I don't see traces typical of high-speed differential signaling on the PCBs or the necessary PHYs anywhere on the M1 die layouts. I still believe an ONFI style NAND interface is being used for the signaling between the packages and SoC.

A quick look at the Controller Characteristics property shows us why you can't just plug two lower capacity modules into the two slots of a Mac Studio and successfully combine them into a single SSD. There are a total of 8 busses or channels ("num-bus"=8) connecting the dies to four MSPs ("bus-to-msp"=(0,0,1,1,2,2,3,3)). But when you look at how the dies are connected to the busses ("dies-per-bus"=(3,2,2,2,2,2,2,2)) you see that there's actually an extra die in one of the packages, for a total of 17 dies. Each die has two concurrently addressable units or planes ("cau-per-die"=2), leading to a total of 34 planes which are referred to here as "dip", possibly for "dies in parallel" ("num-dip"=34). Taking that number along with several parameters that appear to be common across at least the last few generations of Apple SSDs:

"cell-type"=3, "default-bits-per-cell"=3 — this is a TLC NAND drive with SLC cache
"blocks-per-cau"=974 — 974 blocks per plane
"pages-per-block-slc"=384, "pages-per-block-mlc"=1152 — 384 pages per block as SLC or 1152 pages per block as TLC
"sec-per-page"=4 — 4 sectors per page

And we arrive at the number of sectors per full band, which in this case is 156,672 for TLC ("sec-per-full-band"=156672) and 52,224 for SLC ("sec-per-full-band-slc"=52224). Apple doesn't sell SSD models with 34 dies / 68 planes or provide firmware that could support such configurations. I'm pretty sure the supported configurations go something like:

"Model Number" = "APPLE SSD AP0512R"
"capacity"=512000000000
"dies-per-bus"=(3,2,2,2,2,2,2,2)
"num-dip"=34
sec-per-full-band-slc"=52224, "sec-per-full-band"=156672

"Model Number" = "APPLE SSD AP1024R"
"capacity"=1024000000000
"dies-per-bus"=(5,4,4,4,4,4,4,4)
"num-dip"=66
sec-per-full-band-slc"=101376, "sec-per-full-band"=304128

"Model Number" = "APPLE SSD AP2048R"
"capacity"=2048000000000
"dies-per-bus"=(8,8,8,8,8,8,8,8)
"num-dip"=128
sec-per-full-band-slc"=196608, "sec-per-full-band"=589824

I don't currently have ioreg data for the 4 and 8 TB models, but you might be able to combine two 2 TB modules to make a working 4 TB drive. However, I'm not sure that anyone in their right mind would ever do such a thing, because buying a second Mac Studio for $2600 to achieve what amounts to a $600 upgrade doesn't make any sense. So despite the narrative that Apple was locking their SSDs to screw over customers and prevent "upgrading", what they were actually doing was giving customers purchasing lower capacity drives an extra NAND die so that they could have better performance, endurance, and reliability (more dies in parallel, larger SLC cache, more spare area, possibility of surviving a partial or full die failure).

And just to touch on another tidbit that can be gleaned from the Controller Characteristics property, "vendor-name"="Toshiba" and "nand-marketing-name"="itlc_3d_g4_2p_256" would indicate that we're looking at 256 Gbit 4th generation 96-layer 3D TLC BiCS NAND flash dies from Kioxia (formerly Toshiba). All of the ≤ 1 TB Q series SSDs in M1 Macs and ≤ 2 TB R series SSDs in M1 Pro/Max/Ultra Macs that I've seen thus far use those same dies.
 
Last edited:

MadRat

Lifer
Oct 14, 1999
11,767
117
106
It's not that long ago that Intel chips had hard coded turbo tables like this. Not sure why people expect Apple to have completely free form boosting akin to AMD's PBO going with their very first line of homemade laptop/desktop chips when even Intel hasn't reached that point yet. Apple's chips so far have been highly optimized for the specific iPhone form factor where such flexibility just isn't necessary, and all the M1 variants obviously still build upon that.
That behavior is cooked into system settings, not the OS. This is unlikely as an OS change increases performance. It could be as simple as the differences in the scheduler. We could be seeing products segmented by scheduler limitations.
 

igor_kavinski

Platinum Member
Jul 27, 2020
2,536
1,292
96

Doug S

Golden Member
Feb 8, 2020
1,110
1,655
106
If they manage even 10K NAND write-erase cycles with MSP, that's a huge competitive advantage for them.

I'm skeptical that is much of a real world advantage. Sure, the specs for SSD write life would look better, but is write life really an issue for the higher end SSDs people would put into a higher end PC that would compete with Mac Studio? Typically those higher end SSDs provide increased write life when compared to consumer models by having more reserved space, so maybe there is a cost advantage for Apple.

SSD write life is sort of like smartphone battery life. If you don't have enough to last for the life of your PC under your usage profile that's bad. Similarly if you have less than a day's battery life for your phone under your usage profile that's bad. In both cases if you have enough for that plus sufficient buffer so you aren't nervously looking at the battery percentage towards the end of the day or getting SMART warnings about your SSD media wearout dropping below 20%, anything beyond that is mostly useless.

People worry too much about write life, because of problems with early models of SSD that were mostly due to people not understanding the limitations of that early technology (i.e. trying to use it for something with extreme write rates like a database's online redo logs) or operating systems that weren't updated to mesh well with the differences in how SSDs work vs HDDs leading to issues like write amplification.

It requires someone with a usage that includes very high write rates, and going cheap and buying a consumer level SSD for their professional level usage. Most people severely overestimate the volume of writes they do. If they look at the actual SMART values for media wearout after say six months of normal use and multiply by how low they plan to keep that PC in service, almost everyone finds they have no worries about wearing out their SSD. And SMART's media wearout uses the manufacturer specs which are in most cases wildly conservative - at least based on testing Storage Review did a few years ago.
 

igor_kavinski

Platinum Member
Jul 27, 2020
2,536
1,292
96
All good points but the price of a 8GB 256GB Macbook Air would get me an x86 laptop with 1 TB SSD. The endurance of the Apple SSD has to be at least twice to thrice better for me to consider it (M1 is already a desirable CPU). So it's good to know that Apple is giving something extra and not just using some cheap off the shelf NAND+flash controller in their expensive devices.
 

Doug S

Golden Member
Feb 8, 2020
1,110
1,655
106
All good points but the price of a 8GB 256GB Macbook Air would get me an x86 laptop with 1 TB SSD. The endurance of the Apple SSD has to be at least twice to thrice better for me to consider it (M1 is already a desirable CPU). So it's good to know that Apple is giving something extra and not just using some cheap off the shelf NAND+flash controller in their expensive devices.

Apple uses their own flash controller in the SoC, they aren't buying some off the shelf third party controller. How many times does this have to be mentioned before people get it?

Are you saying you are buying a laptop with 1 TB when you need only 256 GB because you are worried about flash lifetime? You're the same person who wants a laptop with 256 GB of RAM because your browser is leaking memory!

I have serious doubts about your buying acumen but I'm sure sales people love it when they see you coming. Do you buy Y speed rated tires even though you never exceed 90 mph because you're worried someday you might be driving down a mountain with a strong tailwind?
 
  • Like
Reactions: Tlh97 and Mopetar

eek2121

Golden Member
Aug 2, 2005
1,800
1,982
136
All good points but the price of a 8GB 256GB Macbook Air would get me an x86 laptop with 1 TB SSD. The endurance of the Apple SSD has to be at least twice to thrice better for me to consider it (M1 is already a desirable CPU). So it's good to know that Apple is giving something extra and not just using some cheap off the shelf NAND+flash controller in their expensive devices.
It won’t get you the kind of quality you will get out of the new Macbooks.

None of the PC OEMs come close to current Apple quality.

Shoot, given performance, battery life, power, noise, build quality, mag safe, weight, and screen, I will say not a single PC vendor comes close.
 

Mopetar

Diamond Member
Jan 31, 2011
6,677
3,720
136
I do prefer to spend a little extra for a bit more longevity rather than buy and discard on a frequent basis.
I recall some time ago when people were worried about this that someone calculated it would take about 3 years to exhaust the write cycles in the MacBook at that time.

However that was 3 years if it were being written to at constantly all day, every day. It's not something that people really need to worry about.

You'll have issues with not having enough space before you ever have to worry about wearing it out.
 

gdansk

Senior member
Feb 8, 2011
918
611
136
After the confusing TLB introduction it seems to boil down to "TBDR is different". But even if most MacOS applications are using OpenGL -> Metal or MoltenVK it's hard to blame application developers when (nearly?) every application tested is unable to reach the max TDP.

People are used to parts being thermally constrained. But big M1 GPUs are limited by some other criteria. I'd wager memory latency of some sort.

But I don't think the number of TLB entries is indicative of an 'oversight' as the # of TLB entries seems higher than competitors. Like 2x Skylake. And 1.5x Zen 3. Maybe it is lacking large pages? But that hasn't been a problem with Intel Macs where the page size was 4KiB rather than 16KiB in ARM Macs.
 
Last edited:

Doug S

Golden Member
Feb 8, 2020
1,110
1,655
106
After the confusing TLB introduction it seems to boil down to "TBDR is different". But even if most MacOS applications are using OpenGL -> Metal or MoltenVK it's hard to blame application developers when (nearly?) every application tested is unable to reach the max TDP.

People are used to parts being thermally constrained. But big M1 GPUs are limited by some other criteria. I'd wager memory latency of some sort.

But I don't think the number of TLB entries is indicative of an 'oversight' as the # of TLB entries seems really high assuming 4KiB pages. Like 5x Skylake. Maybe it is lacking large pages? But that hasn't been a problem with Intel Macs where the feature went unused.

Apple switched to a 16K page size back when they went to 64 bits. AFAIK they do not support an additional larger size page.

The issue with TBDR has been well known for a long time, and is what makes most comparisons between Apple and PC GPUs difficult - we know Apple is dragging an anchor behind them in almost every cross platform benchmark. But that's fair, because that's the anchor they're dragging behind them in the actual applications. The GPU may be capable of much better performance for something written for its rendering model, but the PC market is much larger for cross platform applications so they will write/optimize it for PC and take the easiest path to making it run on Mac. Which does not include a rewrite for a different rendering model. If the Mac had the big market share the issue would be reversed and Nvidia would be the one dragging the anchor in benchmarks (or perhaps they would have chosen to go along with TBDR instead of fighting the entrenched system like Apple)

Increasing TLB coverage will help but I have to think there is a better solution, like supporting larger pages (even if they are only used in memory shared between the CPU and GPU) or perhaps something even more clever in hardware or software will alleviate the TLB pressure. I don't have any suggestions as to what that might be, it requires someone more clever than I :)

Like I keep saying, M1 is the version 1.0 of Apple Silicon. There is no way they could have anticipated every weak point and glass jaw in the design. Especially something like this which never showed up when they had Macs running on an A8X or A10X in some secret Apple lab. Even if they hacked up some 2 or 4 A12X SoC Frankstein system using a less well developed version of Ultra Fusion to validate it.

If they didn't become aware of this issue until they started running stuff on M1 Max, I'm not sure we'll see a fix on M2. Best case, first silicon for M1 Max arrived about two years ago, but that's only if it taped out when A14/M1 did. It may well have been months later. So even if they figured it out immediately after getting a running M1 Max system it may have been too late for any fix to be introduced into the M2 core design which would have taped out last fall assuming it uses the A16 CPU/GPU cores and ships this fall.
 

gdansk

Senior member
Feb 8, 2011
918
611
136
Well traditional GPUs solved it by having their own TLBs. Perhaps, with 'unified memory', Apple was hoisted by their own petard.
 

moinmoin

Diamond Member
Jun 1, 2017
3,303
4,537
136
Seems like a non-issue to me. Software can be optimized, so those (few) who actually care enough can do just that. Hardware can be improved in future gens, Apple will very likely do just that and advertise the huge improvement accordingly.
 

ASK THE COMMUNITY