I was referring to electrical loading, not as in load/store. The DRAM dies all have to be connected electrically to the memory controller in order to function.You don't need to do anything special. If you issue a 64 bit load across four 16 bit channels, you operate them independently to point to the desired bytes. Why do you think you need to "share" then? If you want to load a 64 bit value on an M1, how do you think that works? You don't really believe that 8 consecutive bytes will be stored on a single LPDDR5 channel, do you? They will be interleaved 16 bits at a time across multiple channels - at the very least 64 bits across 4 LPDDR5 channels, but if I had to bet I'd say 128 bits across 8 channels is far more likely given the layout of the M1.
I was referring to electrical loading, not as in load/store. The DRAM dies all have to be connected electrically to the memory controller in order to function.
The cache line size for the M1 is 128 B, and the minimum burst size for an LPDDR5 channel is 32 beats, which works out to 64 B for a 16-bit channel or 128 B for a 32-bit channel. So 8 consecutive bytes absolutely will be read from / written to a single channel because it would be inefficient to do otherwise. This also points out that Apple probably isn't using a channel width greater than 32-bit because burst length would exceed the cache line size.
With in-line ECC you have to issue a separate overhead read or write command for the ECC code, but if you've done things correctly, that code is being read from or written to a page that is already open. If you used a separate channel for ECC, how would that possibly work? If you had multiple independent channels that all had to read or write ECC code to a separate shared channel in order to complete a transfer, it would be a disaster. Even if you could cleverly sort out the addressing, you'd still be in a near constant state of page-miss and write/read transition. The latency would be appalling. ECC has to be implemented at the physical channel level.
Isn't the ECC usually baked in to the hardware for the RAM? The odds of a one-off error occurring during transmission is vanishingly small.Apple does not have to do ECC calculations on 16 bit wide channels just because that's LPDDR5's basic unit. They could choose to treat a gang of 4 or even 8 16 bit channels as a single unit for ECC calculations. Who is going to tell them they can't? They would still have JEDEC compliant LPDDR5 memory controllers and use JEDEC compliant LPDDR5; there's no one to enforce how they must calculate ECC on the result.
Isn't the ECC usually baked in to the hardware for the RAM? The odds of a one-off error occurring during transmission is vanishingly small.
It doesn't make sense to do error checking or correction on the chip itself because it adds the extra overhead of transmitting the parity bits. If you're using non-ECC memory you need to develop software to simulate it, which isn't going to be nearly as effective as just using ECC RAM.
ECC RAM usually has an extra memory chip on the DIMMs over non-ECC memory. The extra chip is what stores the parity bits needed to detect or correct the errors. That's why I'm assuming Charlie was saying to count the chips.
If you run each x16 interface independently, there is no issue with loading. You were talking about how Apple could treat several x16 interfaces as a single wider interface. Electrically, that means connecting the data lines in parallel and the CA lines in series. I believe you were thinking that Apple could somehow do that logically while keeping the channels independent electrically, but that wouldn't work in practice for the reasons I pointed out. So instead I went down the path of considering what might actually be possible as far as running LPDDR dies in parallel to do side-band ECC.I don't understand your objection electrical loading wise. The M1 has one LPDDR5 package for every 128 bits of width, i.e. one LPDDR5 package is connected to eight LPDDR5 channels. Most likely, the package has 8 x16 LPDDR5 chips, so each controller is connected to exactly one chip. What I had described added a 9th controller, and a 9th chip in the package. How is that going to create problems with electrical loading? Everything is still connected one to one.
The minimum burst doesn't need to fill the cache line, but it would be pretty inefficient if it were to overflow it. The reason for increasing prefetch/burst length is to keep the data bus full. The DRAM cores are only operating at 100-266 MHz and need to fill a 6400 Mbit/s bus. A 32n prefetch will get you there, but you can also use multiple bank groups to keep the bus full and not have the minimum burst exceed the cache line size. LPDDR5X apparently ditched 8 Bank mode in favor of Bank Group mode, so I'm assuming that's actually the preferred way of doing things. But if you drop down to BL16, you're relying on interleave between the bank groups to keep the bus full, which makes the timings a bit trickier.Thanks for the pointer about the burst length in LPDDR5. I was trying to ignore cache lines, hoping to simplify things by talking about a single load at a time rather than a whole cache line. But I didn't realize the burst length for LPDDR was so long - I assumed it was shorter to save power but I suppose it makes sense to have longer bursts to allow for filling a 64 byte cache line from a single x16 device with one command.
So after looking into it, according to Synopsys at speeds above 3200 Mbps you can either use bank group mode with burst lengths of 16 or 32, or eight bank mode with burst length of 32. So you'd have a choice, either feed a 128 byte cache line out of four parallel devices (64 bits) using bank group mode or out of two parallel devices (32 bits) The former would seem to be the desirable choice, since you'd reduce the time required to operate on a full cache line, unless there is some gotcha with bank group mode? So what I had described is feasible using bank group mode and BL16.
Yes. You can either do side-band ECC or in-line ECC. Side-band is not practical for LPDDR; in-line is the preferred solution.But you are 100% correct that carrying the ECC on a separate channel would a problem if done as I describe.
ECC is traditionally implemented as a side-band solution for DDR SDRAM. You add additional DRAM devices in parallel to store the Hamming code used for ECC. For a 64-bit DDR channel you add an x8 DRAM which widens the channel to 72-bits. Every time the memory controller performs a read or write transaction it also reads or writes the corresponding ECC code word, which it can use upon read completion to correct any single bit error or detect double errors (SECDED). The ECC calculations are performed by the memory controller, but the ECC code words are stored in a separate DRAM. Side-band ECC provides end-to-end protection against single bit errors.Isn't the ECC usually baked in to the hardware for the RAM? The odds of a one-off error occurring during transmission is vanishingly small.
It doesn't make sense to do error checking or correction on the chip itself because it adds the extra overhead of transmitting the parity bits. If you're using non-ECC memory you need to develop software to simulate it, which isn't going to be nearly as effective as just using ECC RAM.
ECC RAM usually has an extra memory chip on the DIMMs over non-ECC memory. The extra chip is what stores the parity bits needed to detect or correct the errors. That's why I'm assuming Charlie was saying to count the chips.
The "Mac store" is not used much, almost all applications are sold and installed the regular way, so Apple doesn't care about that.Let's hope Apple doesn't decide they are getting unwanted attention because every Apple device not connected to their store is not in line with their base strategy of making mountains of cash. They may not be able to stop current owners but they could further harden future CPU's to alternative OS booting.
Well, I think the only concrete example he shows is Apple's own Final Cut Pro - so it does stand to reason that some other software packages may need some tweaks to take advantage of the Max/Ultra's increased resources. Other software that's lagging either hasn't been re-compiled for Apple's native ARM uarch, or doesn't support Metal, at least from what I've seen. It's kind of like the PPC -> x86 transition; it takes time. I haven't looked it up, but if Xcode supported cross compiling, that would make the transition much easier.Reviewers like Max keep claiming that Mac Studio is software constrained, but the reasoning is rather dubious.
One difference though is that everyone says the Mac Studio is always quiet, whereas the MacBook Pro in some reviews will ramp up the fans under continued full load.
Sorry, I wasn't clear. What I mean is that IIRC from my understanding of his reviews, he implying that there could be a software/OS limiter on the SoC, preventing it from maxing out performance. Sort of a pre-emptive throttling, if you will. For example, he sees the clock speed listed at 3.0x GHz in one test, and suggests there is a limitation there because it's not 3.2 GHz perhaps to maintain low temps.Well, I think the only concrete example he shows is Apple's own Final Cut Pro - so it does stand to reason that some other software packages may need some tweaks to take advantage of the Max/Ultra's increased resources. Other software that's lagging either hasn't been re-compiled for Apple's native ARM uarch, or doesn't support Metal, at least from what I've seen. It's kind of like the PPC -> x86 transition; it takes time. I haven't looked it up, but if Xcode supported cross compiling, that would make the transition much easier.
I understand, thanks. If anything, Apple could be holding things back a bit in Firmware, but for what reason I don't know. Apple may just be playing it 'safe' with M1 Max voltages to max out yields or, as Doug suggested, maybe Apple just wanted the Studio to be super quiet. Whatever the case, my thinking is that Apple won't make any changes till the M2s come along. I don't think there is any incentive for Apple to bump up performance by 10% with some later firmware release. Oh well - this has been funSorry, I wasn't clear. What I mean is that IIRC from my understanding of his reviews, he implying that there could be a software/OS limiter on the SoC, preventing it from maxing out performance. Sort of a pre-emptive throttling, if you will. For example, he sees the clock speed listed at 3.0x GHz in one test, and suggests there is a limitation there because it's not 3.2 GHz perhaps to maintain low temps.
Personally, I think he's just jumping to conclusions. If you look at other comparative reviews elsewhere, you'll see that the M1 Max in the MacBook Pro performs pretty much exactly the same as the M1 Max in the Mac Studio. However, the difference is that the Mac Studio is silent, whereas the MacBook Pro sometimes is not. Not loud, but not silent.
You think if there were truly such throttling going on, it'd kick in earlier with the MacBook Pro to keep it quiet.
Agreed.Whatever the case, my thinking is that Apple won't make any changes till the M2s come along.
Hmm, I thought Apple was done with M1 product releases. I would think the next Mac Mini would have an M2 chip - maybe I just don't understand Apple's comments. Problem for me with the Mini is the 16GB limitation. I would want to be able to spool up a VM with Win10 if I needed. Also, the graphics performance, even on an M1 Pro, are less than I'd like for future gaming (not that I game a ton, but I do still enjoy gaming).Agreed.
In any case, personally I just want a Mac mini with more ports, but something which isn't as chonky and pricey as the Mac Studio. I don't think we'll get it with the M2, but there's still hope for a mythical M1 Pro or M2 Pro Mac mini.
Apple said it was done with new M1 "chips". M1-Ultra being the last. So there is no M1-Extreme coming for the Mac Pro.Hmm, I thought Apple was done with M1 product releases. I would think the next Mac Mini would have an M2 chip - maybe I just don't understand Apple's comments. Problem for me with the Mini is the 16GB limitation. I would want to be able to spool up a VM with Win10 if I needed. Also, the graphics performance, even on an M1 Pro, are less than I'd like for future gaming (not that I game a ton, but I do still enjoy gaming).
Hmm, I thought Apple was done with M1 product releases. I would think the next Mac Mini would have an M2 chip - maybe I just don't understand Apple's comments. Problem for me with the Mini is the 16GB limitation. I would want to be able to spool up a VM with Win10 if I needed. Also, the graphics performance, even on an M1 Pro, are less than I'd like for future gaming (not that I game a ton, but I do still enjoy gaming).
Also, while they did say the iMac 27" is dead, some pundits are still trying to convince us (or convince themselves?) a new "iMac Pro" is coming eventually, along with a new high end Mac mini. And Ming-Chi Kuo is claiming the new updated MacBook Air will use M1 again, although Mark Gurman says it will be M2.Apple said it was done with new M1 "chips". M1-Ultra being the last. So there is no M1-Extreme coming for the Mac Pro.
Nothing stopping them from releasing more models based on M1.
Probably, so more likely they will fill that gap with an M2 mini. The lowest end M2 chip will likely perform slightly better than M1-Mini, improve port situation a bit, and allow for 32 GB of RAM.If they were going to make the M1 Pro an option on the Mini, they would have done so by now.
Sorry, I wasn't clear. What I mean is that IIRC from my understanding of his reviews, he implying that there could be a software/OS limiter on the SoC, preventing it from maxing out performance. Sort of a pre-emptive throttling, if you will. For example, he sees the clock speed listed at 3.0x GHz in one test, and suggests there is a limitation there because it's not 3.2 GHz perhaps to maintain low temps.
Personally, I think he's just jumping to conclusions. If you look at other comparative reviews elsewhere, you'll see that the M1 Max in the MacBook Pro performs pretty much exactly the same as the M1 Max in the Mac Studio. However, the difference is that the Mac Studio is silent, whereas the MacBook Pro sometimes is not. Not loud, but not silent.
You think if there were truly such throttling going on, it'd kick in earlier with the MacBook Pro to keep it quiet.
Probably, so more likely they will fill that gap with an M2 mini. The lowest end M2 chip will likely perform slightly better than M1-Mini, improve port situation a bit, and allow for 32 GB of RAM.
Ticking most of the M1-Pro boxes people want in an M1-Pro Mini, but with a smaller, less expensive to produce chip than an M1-Pro.
M1 Mini stays in production for the low end, M2 Mini is midrange, and Studio is high end.
Apple did have a lot of turn over in it's processor design teams around the time the M series was under development. Anyway, we won't know unless somebody pries the info out of Apple's cold dead hands.If it was a software bug we would expect it to be fixed fairly quickly, but I suppose it is possible such a bug could be present in the M1 itself.
Oh, forgot about that. Was wondering why Parallel's was in still in Beta and VMWare Fusion was AWOL for ARM Macs.Assuming the Qualcomm exclusive ends soon and Microsoft can officially support Macs running Windows/ARM in a VM, that worry disappears.
Is this a single-threaded benchmark or something? Because as Andrei pointed out in his original review, the Firestorm performance cores in the M1 Max top out at 3036 MHz when you have all four cores in the same cluster active. 3228 MHz is the max for a single core per cluster, and 3132 MHz for two cores. I realize "Apple" + "throttling" is a perpetually trending combination, but this would seem to be the expected behavior, no?Sorry, I wasn't clear. What I mean is that IIRC from my understanding of his reviews, he implying that there could be a software/OS limiter on the SoC, preventing it from maxing out performance. Sort of a pre-emptive throttling, if you will. For example, he sees the clock speed listed at 3.0x GHz in one test, and suggests there is a limitation there because it's not 3.2 GHz perhaps to maintain low temps.
Personally, I think he's just jumping to conclusions. If you look at other comparative reviews elsewhere, you'll see that the M1 Max in the MacBook Pro performs pretty much exactly the same as the M1 Max in the Mac Studio. However, the difference is that the Mac Studio is silent, whereas the MacBook Pro sometimes is not. Not loud, but not silent.
You think if there were truly such throttling going on, it'd kick in earlier with the MacBook Pro to keep it quiet.