The only thing I see "wrong" with the Rome layout is that they have moved the memory controller away from the CCX. That effectively prevents them from using the same dice in a Matisse product, unless they intend to go with a "chiplet" design in the client CPUs as well.
Right now AMD has two dice: the CPU dice you get in everything except their APUs, and then the APU dice. And they had to make a separate APU die just to include Vega. I do not think AMD was seriously entertaining the notion of something along the lines of KabyLake-G for their own products.
Anyway, setting aside the APUs, all AMD products are nothing more than constant repetition of the CPU dice. Want more cores? Then add more dice. It allows them to keep the CPUs relatively simple in terms of packaging. The 2990WX is sort of an outlier since it is basically an EPYC with two of the dice not linked to DIMM slots on the board (yay product differentiation). But it's still just four Zen+ dice, regardless.
If we are to believe in the diagram from the OP, now you have a situation where every CPU based on Zen2 will have a minimum of two dice, assuming AMD wants to stick with the "interchangeable parts" strategy. For example, they can ill afford to produce one Zen2 die for Matisse that is one "chiplet" plus a dumbed-down version of the central die from the diagram (one without l4, no SERDES support, and a memory controller with two channels instead of eight). The cost appeal of Zen from the beginning is, again, repetition of the same die design, over and over again. Rome itself would have nine different dice (8 CCX dice and the central l4/IMC die), none of which they could use in client products.
AMD would need to use common CCX dice while altering the central "control" die based on the application. So for example, we get the heavy I/O and major memory bandwidth of the Rome die, but the Matisse die would be smaller and more pedestrian. Then they would link it (Matisse "central"/SoC die) to a single CCX die via IF, meaning a minimum of two dice for any Zen2 product. That introduces the potential for higher memory latency and other "fun" latency effects by moving all the SoC functions to a separate die, connected by IF. And now we also have the potential for high memory latency, the likes of which we currently only see on the 2990WX when attempting to access main memory from a thread pegged to one of the dice with a crippled DDR4 interface.