L3<-> L3 transfer perhaps?
Compared to IFOP, you can do that now through a wider, faster and lower latency interface to the IOD
Or daisy-chaining of CCDs </s>
But yes, they tripled RAM bandwidth to around 1.6 TByte/s for the Top-End. With 8 CCD you'd need an interconnect to be at least as wide as 200 GByte/s in order to saturate this. And that is with each CCD demanding an equal share. Current GMI-Wide delivers 128 GByte/s (read) IIRC.
So 256 GByte/s/CCD or even more don't seem like overkill to me.
That is a very interesting idea, indeed. For Zen 6 I do not expect something like that to happen. For Zen 7 I think not as well (16/33C CCDs, bigger L3$ and simply faster cores are already a decent enough update). But Zen 7 could still introduce it (core count mania). Would be sick to see a 512C Zen 7 SKU
As the beachfront of the IOD is limited, daisy-chaining makes very much sense in the mid- to longterm. It are just a few hundred of GByte/s if putting 2x CCDs in series. Such a concept opens up the door to very huge core count scalings without adding too much cost (much bigger CCDs, much more IOD area, ...).
- Even with 512 GByte/s it is not an issue, the power draw is still much lower than 128 GByte/s of an existing IFOP interface (~10x less power required)
- RDNA3 MCDs already delivered ~900 GByte/s per chiplet
- Zen 7 will probably introduce an outsourced L3$ on a bottom 3D-Stacked Die. Adding 2x IF-PHY on two sides of this base Die (for daisy-chaining), which gets manufactured in an older node like N4, would not hurt regarding costs.