It is unclear how large of a chunk is interleaved between caches. Byte interleave seems too small. Perhaps a 32 byte cache line is interleaved across the 4 slices. They have shown slides with 32 bytes a cycle. That would be 8 bytes/64-bits from each cache slice. It is also unclear what you think a “link” is.
That is not how things work with caches. x86 standardized on 64 byte cache lines and common sense strategy is to use lower address bits as a selector to cache slice that is supposed to hold said cache line. There is no "interleaving" going on, as unit that is coherency etc tracked is cache line, so a slice either has (full) cache line for said address lower bits or it does not (and request goes to mem). You then transfer said cache line in as many cycles as your link width allows for to where it is needed.
With ZEN "coherency" domain is 4 core sized CCX that consists of cores+L1+L2 and L3 + tags (but not data) that hold what cache lines are inside the cores.
ZEN3 expands this coherency domain to 8 cores, but there is nothing there forcing AMD to keep same arrangement of 1 per core L3 slice, they can use for example 4 slices, while keeping tag part of coherency domain next to core.
Latency is going to rise anyway due to physical constraints, but 32+MB is what is needed for 8 cores as potent as Zen2.