Yeah, rising power draw is no joke. Before zen 2, server cpus uses 150-180w and we're going into a 300+w world. In accelerator world, server pcie slots tapped out at 300w so OAM3 supports up to 500w with CDNA2 and SXM is now up to 700w with hopper, cpu sockets are in the same systems so they can use more, to be slightly on topic, stacked cache has shown increased power draw anyway.
So in a situation where ccds with large local stacked cache actually reduces the required interconnect bandwidth the chances of seeing bridge chips is reduced.
No one's going to not buy what is probably going to be most efficient and most performant cpus on the market for at least one more generation just because the IO die is using an extra 50w, which is just a trade off for reduced costs by using less silicon on bridge chips and less advanced packaging.
The IO die does actually hurt the low end server that xeon bronze competes in but thare are rumours of a smaller zen4 epyc socket to address that.
Zen5 is rumoured for 2023 and it's new design family which opens up the book for new designs so maybe it's active bridge/interposers then.
What is using silicon bridges are devices that require large aggregate bandwidth, namely gpus and apus, nvidia XX100 all using hbm, cdna2 uses fan out for hbm and interconnect, sapphire rapids uses emib for shared l3 and shared hbm bandwidth (apple's apu similar) and ponte vecchio that uses everything you can imagine.
Grace cpus look more traditional. The images we have been shown look like discrete packages and all material says lots of nvlink nvlink nvlink that has no problem running over copper currently.
There was a good presentation last hot chips on 3d stacking chips and the design considerations around it that make reusable silicon, especially something intrinsic like a cache bridge chip, really unlikely. Even on chiplets we haven't seen anything other than the ccd used across products in desktop and server that hardly differ in intended use, we didn't even see a tiny gpu chiplet that could be used in a low end discrete cpu or attached to an io die or even has a different io+gpu die, nothing fun at all.
There was a leak in ~2016 of a huge amd 16 or 32 core apu with hbm with stacked all sorts design that hasn't seen the light of day yet, ponte vechio looks most similar to that minus cpu chiplets and m1max is doing big apu but not exactly in an interesting design.
I think that the Apple M1 Ultra (with a silicon bridge connecting 2 M1 Max chips) is an interesting design. I don’t think we know specifically if it is the same EFB that AMD would use, but it seems to be a silicon bridge of some kind, so likely the same tech from TSMC will be used for infinity cache GPUs. We don’t know how the Grace cpu is connected together either. The images are almost certainly just a rendering. If the CPUs are directly adjacent, it would make a lot more sense to use a silicon bridge for the kind of bandwidth they are talking about. The gpu would then be connected by NVlink. It isn’t going to be available for a long time, which makes me wonder if Nvidia just wanted to talk about it first such that they look like they are leading the technology. If AMD announces a similar device later, it looks a little more like they are the follower, even if their device ends up being available first.
For Bergamo, with 128-cores, a lot of bandwidth will be required. Zen 4 may have significantly increased floating point compute over Zen 3 and Bergamo may not have any of that cut out. AMD has dealt with bandwidth requirements in their GPUs by using infinity cache. I believe AMD had talked about using infinity cache across multiple products. it would be a good way to reduce interconnect power, increase bandwidth, and add a lot of cache back. Bergamo does not seem to be using v-cache, but it might not be necessary if it has infinity cache.
Charlie at semiaccurate has called Bergamo a monster. It doesn’t sound like one of it is just two 8-core CCX with 16 MB L3 per die connected by serdes. It is a lot of cores and they should be more power efficient, meaning they may actually be able to sustain good clock speeds, so perhaps the performance could be massive without extra cache. Extra cache can help a lot though, as Milan-X has demonstrated.
The memory interface for current Epyc is very wide, at 512-bit. That is, in fact, wider than most current GPUs. That is GDDR6 vs DDR4, but with DDR5, it goes up to 768-bit (64 x 12 or possibly more accuratly 32 x 24) for SP5 which is near 500 GB/s per Epyc socket. That will be close to the bandwidth of an Nvidia P100 GPU. Current nvidia A10 GPUs are only 600 GB/s. Saying that Bergamo isn’t a “high aggregate bandwidth” device doesn’t seem correct. We also may get a significant DDR5 speed increase before Bergamo comes out next year. I assume the 460 GB/s number I have seen is actually with rather low clocked DDR5; I will need to look that up.
There could be some other secret sauce to Bergamo or some Genoa derivative. I don’t know how well it would compete with cpus using integrated HBM. The infinity cache would be one way to compete without going full (and expensive) HBM, just like AMD did with their GPUs. The infinity cache would just be a 1 or 2 embedded die rather than something like 4 or 8 stacks of HBM. AMD integrating HBM on a cpu, unless it included with a GPU, seems unlikely. HBM can obviously supply huge bandwidth, but it still has DRAM latency. It is plausible that AMD will start making parts with multiple levels of v-cache rather than using infinity cache chips.
I am just speculating here. We very well might not see an EFB part until Zen 5, but that is a long way off. It could be that Grace/Hopper is out in a similar time frame as Zen 5 using EFB. I could see AMD using Bergamo as a bit of a test bed for Zen 5 though. That is, the Bergamo IO die might be the same as what is used with Zen 5. Using infinity cache doesn’t fit with some of the rumors, but it makes a lot of sense to me. It also could be the case that it is just a bunch of serdes connected chips with small caches. Rather boring, but entirely possible.