There is a way how to scale performance with just one chip:
- MacBook Air .... low power / low clocks
- MacBook Pro ... higher clocks
- iMac ................. 2xCPU + 2x mem channels (not perfect due to NUMA obsticles although easy)
- MacPro ............ 4xCPU (AMD EPYC1/Naples style) 4x mem channel
4x8 high-performance cores is 32c total, that sounds pretty reasonable for iMacPro as a replacement of 18c Xeon.
Exactly how Apple will handle scaling from iPhones to Mac Pro's remains THE fascinating question, as opposed to the more silly analyses you see on the web.
The choice of chiplets rather than multiple bespoke die sizes (or a single large die that's fused off) seems more or less obvious. But even once you accept that, there remain a variety of choices
- how much of IO and memory control do you put in a separate hub(s) vs on the main SoC. Separate hub means you can use a cheaper process. But memory controller on the SoC means you get memory bandwidth scaling with CPU count in a nice way.
- do you use as your baseline chiplet something like at A14X? This means only one die; but also means a fair fraction (15-25%?) of the die is things like security, ISP, media encode/decode that don't need to have multiple copies on iMacs and Mac Pros.
Or do you have a third Z SoC that's a stripped-down X SoC? Remove all that one-off stuff, and add chiplet communication channels.
- what to do about GPU? If you do the math, the A12X GPU, as far as GeekBench5 compute results is concerned, is about 1/6th the top results for an iMac Pro. So assume a 50% boost for the A13X GPU (iPhone GPU saw 50% boost) and assuming the (possibly very dodgy...) hypothesis that GB5 Compute is a good representation of all a GPU needs to do, that means you need 4 A13X chiplets to match iMac Pro. It's within the bounds of plausibility, but it's not ideal -- sync between the different GPUs will be much more expensive than on a monolithic GPU.
The second issue is bandwidth. Apple's System cache works extremely well, as does tiling, but you still want bandwidth for some desktop GPU tasks...
So four alternatives present themselves
+ give up on GPU, at least this time round. Maybe have one iPad-class GPU somewhere (in the IO hub?) for low-power work, plus use a standard nV or AMD external GPU on PCIe.
+ design an Apple GPU based on what Apple already has, but scaled up much larger, taking over an entire die. Then package that with HBM, and connect it either via PCIe or via some Apple internal bus to the rest of the system.
+ use a GPU that's distributed across the chiplets, one piece on each of the 1, 2, 4 chiplets used in different models. Plus HBM somewhere on the same interposer.
+ finally like the above, but no HBM and just rely on LP-DDR5 (run fast and wide)
You can then go through the same GPU analysis wrt NPU...
So what WILL Apple choose? I don't think we can usefully go beyond listing possibilities.
The exact choice will depend on both performance factors and cost factors -- and we outsiders don't have a clue as to either, certainly not enough even to make a reasonable guess.