Very tangible synchronisation and memory access overhead for anything graphics that could easily kill perf.
It's kind of hurts compute too, particularly training (hence why we use xboxhueg dies like V100/Spring Crest and weird scale up setups like DGX-2/whatever Nervana is doing), but it's manageable there.
Compute suffers just as much from synchronisation and memory access overhead. (Love how you added "Very tangible" to your statement, ) This is all discussed in the paper and like I have mentioned at least two times before this there are some simple optimizations that they presented in the paper to assist with overcoming those issues. You don't need to entirely eliminate the overhead, just reduce it enough that you net more performance because of the greater overall GPU resources you have available. The paper presents it very well -- they talk about the performance of the largest possible gpu that you could physically build and then ways to get more performance than that.
If you are trying to come up with a good argument as to why compute is easy to do on multi-chip, but graphics is not, you need to talk about problems that are unique to (or at least more difficult on) graphics. You keep mentioning that this is "impossible for graphics" but you haven't said anything specifically about how the additional aspects of graphics (stuff like geometry, rasterization, texturing, etc) cannot scale across multiple dies. Rasterization is already done in tiles on a lot of architectures, and each tile can be small and worked on independently. In fact, this approach was initially thought of to save on memory access and improve data locality -- exactly what we need here. Geometry and texturing can similarly be done by splitting the scene up into chunks. I do agree that graphics is more difficult, but I do not agree that is impossible.
Not sure why you mention DGX as an example of avoiding synchronisation and memory access overhead, as that uses multi gpu with NVLink and has much worse synchronisation and memory access overhead than even a hypothetical multi-chip gpu using on-package interconnects.
Also, I just want to add the whole discussion about what is or isn't MCM is pointless. The point is whether a GPU manufacturer has made the choice to invest into solutions to overcoming the hurdles that building a GPU out of multiple dies presents. For the purposes of this discussion it doesn't really matter if the dies are on an organic substrate, a ceramic one, a silicon interposer, or even a more exotic solution. Sure, the more exotic ones will probably allow better interconnects but that's not the point here, remember your argument is that multi die graphics GPU's are essentially impossible and that it will be 5-7+ years before we see
any multi chip gpu solution.