Yea, cc interconnects for 32+ high-perf cores are normally and cheaply done. For example ThunderX2 and Falkor use a ring. No info about X-Gene3/Ampere. The latency thing creeps back (sometimes very scary).Again, this is not black magic or anything new. Ampere's just released ARM Server Processor sports a very similar architecture, except it's monolithic (https://amperecomputing.com/wp-content/uploads/2018/02/ampere-product-brief.pdf). AMD certainly does not lack expertise in this.
Btw since people juggle numerous different dies, chiplets, etc. What about a monolithic 64c die? That would be a truly unexpected and performant SKU.
7nm based Fujitsu A64FX packs 48c each with a two huge 512b SIMD units + 4 IO cores. No L3 but still, why not? Check their core/CMG/chip/interconnect architecture.
Last edited: