If you kept your hottest section of a layer - the arwa housing CPU cores - all in a corner of a die, each layer could be rotated so CPU sections never stack above another.
I was thinking about this last night and made a few post that probably don’t reflect my current conclusions. Moore’s law is dead had a video about next gen AMD GPUs. They describe these as having a base die with a compute die on top. Two of these together with two HBM stacks seem to comprise a single unit with around 150 W of power consumption. The base die was described as 6 nm with 5 nm graphics die on top. They showed up to 4 of these combined together for 8 compute die and 8 stacks of HBM.
I was thinking of bridge die acting as cache die, but with 512 MB of infinity cache rumored for a desktop gpu, that is a lot of die area, in fact likely larger than the graphics die. At 150 W for the whole 2 base die + 2 HBM unit, the graphics die itself must be rather low power, so perhaps not an issue to get the power up the stack. If you take that 64 MB is around 40 mm2 at 7 nm, then the base die, possibly with 256 MB each seems like it must be over 150 mm2; the cache will probably not shrink that much going from 7 nm to 6 nm. The base die would probably have EFB links to the other base die and possibly EFB links to HBM. The base die may have a couple of infinity fabric links for pci-express and other memory controllers to support DDR5. HBM interfaces take up very little space. The DDR5 and pci-express might take up a lot of space, if required. The CDNA GPUs have many IF links that are unnneded on the desktop. The DDR5 memory controller would also not be needed on purely gpu products. Perhaps that is an opportunity to have some separate IO chips in there somewhere for different products. The two base die units look like they would be connected with EFB; it is unclear how multiple units are connected though. For more flexibility of placement, they might be IFOP style infinity fabric links instead of EFB. They could use very wide links.
This could be the new AMD modular architecture. The base die would likely be the same across all products. They could stack RDNA graphics die, CDNA compute die, cpu die, FPGA die, etc. It is unclear whether this would be SoIC or some micro-bump BEOL tech. SoIC allows much greater connectivity and better thermals, but everything must be designed together and probably all made at TSMC. SoIC would allow it to act as massive L3 cache. I would lean towards it being an SoIC based stack. The HBM would be micro-bump, probably EFB connected rather than large interposers. There may be some other, perhaps Global Foundries made, silicon in there to support different IO for different products.
Bergamo might be the first cpu product to use this; it may be the default for Zen 5, although getting power up the stack may still be an issue. Zen 4c is specifically very low power cores; Genoa, without stacking are the high power devices. If this is the architecture, it seems like it would be 2 cpu die per base die given the small size, so only 4 base die to get 128 cores. That would be 1 GB of infinity cache. This seems like it would allow the possibility of HBM on a cpu product, although I don’t know how necessary it would be with such large SRAM caches. It also would allow a combined CPU / GPU product in an Epyc socket.
That is all very interesting, if true, but I don’t know if they would use it for desktop parts other than a possible Threadripper replacement. The minimum configuration would be something like 16 cores with 256 MB cache. If would be great if they introduce an in between socket for workstation and Threadripper that is half of Epyc. Given the modularity this would have, it doesn’t seem like it would be difficult. These would all be rather high end products. I suspect most of the mainstream market will be APUs. They can fit a lot on a single die at 5 nm. I hope they make a product with many channels of DDR5 mounted on the package or just an APU with an HBM stack for high end mobile devices.
And I wrote a too long post again.