On the question of silicon interposer/bridges for the "Zen 4" generation, AMD's paper by Sam Naffziger et al goes into detail about their considerations of using silicon interposer for the "Zen 3" generation, explaining why they ended up using IFOP (Infinity Fabric On Package) in the organic package substrate instead. It was primarily down to high cost and lack of reach. Relying on IFOP was quite a feat, though, and comes with its own drawbacks. In particular, there is not much room in the package for links so they had to be very creative, including co-design in the chiplets themselves (compromises/innovation in power delivery). Obviously, IFOP has higher latency. Energy per bit for IFOP is over an order of magnitude greater than links on silicon. And, 14% of the CCD die size is devoted to IFOP due to the large interconnect bump pitch. Surprisingly, IFOP isn't a bandwidth limiter though — it has ample bandwidth for the "Zen 3" generation, at least.
"3) Packaging Technology Decisions: AMD was among the first companies to commercially introduce silicon interposer technologies starting with the AMD Radeon™ R9 “Fury”
GPUs with high-bandwidth memory (HBM) in 2015 [16]. A natural question for our chiplet-based products is why we chose to use package substrate routing rather than the higher density interconnects enabled by silicon interposers. There are several factors that drove the decision to not use silicon interposers for our chiplet-based processors. The first is the communication requirements of our chiplets. With eight CCDs and eight memory channels, on average each chiplet’s IFOP only needs to handle approximately one DDR4 channel’s worth of bandwidth. Using DDR4-2933 as an example, a single channel would correspond to ~23.5 GB/s of peak bandwidth. Even accounting for some load imbalance across the CCDs, a single CCD’s IFOP would still be expected to observe no more than a few tens of GB/s of traffic, and in fact each link can support approximately 55GB/s of effective bandwidth. Point-to-point links in the package substrate routing layers are more than sufficient to handle this modest level of bandwidth. In contrast, a single HBM stack can deliver hundreds of GB/s of memory bandwidth, which far exceeds the capabilities of the organic package substrate, and this is why HBM-enabled GPU products need a higher-bandwidth solution such as silicon interposers [2][16][17]. The second factor against silicon interposers for our chiplet-based processors is the reach of the interposer-based interconnects. While interposers can provide great signal density for very high bandwidths, the lengths of the signals are limited and as such constrain the connections to edge-to-edge links. The reach of interposer-based interconnects can in principle be extended using wider metal routes and greater spacing between routes, but this would decrease the effective bandwidth per interface because fewer total routes could be supported for a fixed width of routing tracks. This argument also applies to silicon bridge technologies [12]. The next subsection describes the challenges of providing sufficient IFOP bandwidth across the package substrate. Figure 10 illustrates a hypothetical interposer-based processor design. The edge connectivity constraint would limit the architecture to only four CCDs, which would render the product concept to be far less compelling. Even if interconnect reach was not a limiting factor, the IOD and the eight CCDs would require so much area that the underlying interposer would greatly exceed the reticle limit (while a passive interposer does not contain any transistors, the metal layers are still lithographically created and therefore must stay within the same reticle field constraints). Figure 10 shows the placement where an additional CCD would have to be, which is both outside the boundary of a maximum-sized interposer and too far for the unbuffered interposer routes to reach while supporting required bandwidths. Recent advancements in silicon interposer manufacturing have enabled reticle stitching to create very large interposers [11], but such an approach would have been cost prohibitive for this market segment. Last, the silicon interposer itself adds more cost to the overall solution. A CCD with the twice the core count could have been used, but that would have resulted in lower yield and decreased configurability. For all these reasons, routing IFOP directly across the package substrate was chosen for this product family. The total area consumed by multiple chiplets is typically greater than a monolithic chip with equivalent functionality. While this could theoretically cause a corresponding increase in the overall package size, the size of the SP3 processor package used by AMD EPYC™ processors is primarily determined by the large number of package pins required to support the eight DDR memory channels, 128 lanes of PCIe plus other miscellaneous I/O, and all the power and ground connections."
Pioneering Chiplet Technology and Design for the AMD EPYC™ and Ryzen™ Processor Families: Industrial Product (computer.org)
I think the reasoning behind their decision strongly hints that silicon interposer/bridges will be coming as soon as reach and cost allow. However, there is a chance there might be low-risk and low-cost reasons to extend the current scheme, if they find it is sufficient for their performance targets. If they can elongate the package a little more, and there is room underneath the CCDs to route another IFOP link (or two), maybe they can fit another 4 (or 8) CCDs on the package, in a manner compatible with the ugly mock-ups shared by leakers. But the routing is already pretty cramped, as can be seen in this figure from the paper:
View attachment 45971
PS. Could we perhaps see 96-core "Genoa" using IFOP, and a 128-core follow-up using silicon interposer/bridges? Could they do that in the same socket?