Oh, right. I totally forgot about LPDDR5X.
One thing I noticed about Andrei's testing was that a single core could pull 102.36 GB/s across the system fabric from main memory. The theoretical bandwidth to a single memory package with a 128-bit LPDDR5-6400 interface is 102.4 GB/s. That may not be entirely coincidental.
That number is so good I'm still a bit suspicious there may have been some issues with the test showing what it is supposed to show. I've never seen a system able to basically deliver 100% of theoretical memory bandwidth, but maybe the more recent DDR standards have improved upon the areas where inefficiencies used to show up? What's the best number observed in Intel or AMD systems?
Regardless of whether that number is correct or a bit overmeasured, it is clear Apple's memory subsystem is extremely efficient, so the fact it can't get much past 200 GB/sec even with all the cores an M1 Max is not due to inefficiencies or overhead but obviously due to limitations in the design. Some paths will need to be wider and/or faster just to fully exploit LPDDR5, and will need another 33% beyond that to handle the fastest currently available LPDDR5X.
I am really curious to see how they handle the fabric for high end Mac Pros with four M* Max. Will there be enough bandwidth between SoCs to carry all 400 GB/sec (or 533 GB/sec if LPDDR5X is used) from each SoC's DRAM? I'm assuming it will be fully connected since there are only four SoCs. That's a lot of very high speed wires!
Since we now know that LPDDR5X will be available in up to 64 GB modules, I can at least stop worrying about whether Apple will need to support DIMMs for larger configurations
