I expect memory latency to be one of MTL's great weaknesses. AFAIK, one of their main architects in that area left the company midway through MTL. And that's even ignoring any die to die penalty.
This requires a deeper discussion. Nowadays things have moved beyond just memory latency and i prefer term "memory subsystem". So things like larger L2 shielding weak L3, L3 cache size, X3D caches and L4 caching schemes are all accommodated into "average" memory latency that core has during certain workload.
So going from lower levels:
1) L2 to remain 2MB, this is a strong point of their memory subsystem and a lot of requests that spill from L2 on Z4, stay near core and save power and improve performance.
2) L3 => ADL had disaster level of L3, RPL mostly fixed it. I find it hard to imagine* that Intel managed to degrade L3 cache in compute die. That would make little sense to me. In fact their clock target rumours, the fact that IO is no longer on die and there are less nodes on intereconnect inside compute dies might mean:
L3 slice clock is now synchronous with cores and that improves both bandwidth and latency big time. AMD has excellent L3 for 8C CCD and there is no reason why Intel cannot have even better one.
So 1-2 are huge positives and would improve performance and power efficiency quite some. There can be L3 latency improvement of some 5-8ns.
3) This is gonna be Intel's ZEN2 like multi chip so i would not expect it to have good performance. "Infinity fabric" type of problems with clocking, latency and bandwidth are practically a certainty.
I think 2 and 3 have a chance to completely cancel out latency penalties, Intels current L3 is that bad even in RPL incarnation.
4) So now we have reached SoC tile where IMC and L4 cache resides. Without L4 cache enabled i think the chip could have similar memory subsystem performance to RPL, esp if DDR5 speed support is increased and IMC team does better than average.
It's with L4 enabled that things get hazy. checking tags for 512MB of L4 is not free operation and would add a quite some latency to each miss that has to go to memory. That's where I expect memory miss latency to degrade in "absolute" ns terms.
Except obviously having 128-512MB of L4 would have "scaling" benefits for a lot of workloads.
One thing to consider about L4 cache -> 64MB of EDRAM in Broadwell era required ~ 2MB L3 used for tags.
We have another vendor throwing silicon in chunks of 64MB of L3 that clocks 5Ghz in X3D chips.
Therefore SRAM for checking tags is not that big of a problem anymore. Beancounter problem, not technical "if we use silicon for L4 tags we will destroy our L3 transistor budget" problem.
* i also had trouble imagining that there are idiots that can design L3 that has DDR4 level of latency, but it was "achieved" by Intel this year in server chip.