That L0-L1-L2-L3 change is real strange, i don't quite get it.
The old ( Nehalem to Skylake ) scheme was the following:
small but fast 256KB L2 inclusive in L3 cache.
L3 cache that had varied performance, but while it had low inter core latency it suffered from cross core pollution of L3 cache etc and Skylake-X server variant already switched to non inclusive L3 ( with disaster level of L3 performance, anemic in size and horrible latency/bw ).
Cloud guys loved the new server L3 scheme, cause workloads were mostly contained in L2 and L3 cross pollution was no longer consideration in predictable perf. Said cloud guys moved on to love AMD's Z3 chiplets even more as it had actually functional L3 instead.
Nowadays we have the following:
So for client Intel came up with idea of enlarging L2 ( costs quite some latency ), making it no longer inclusive in L3. L3 is still a very weak point on client ( too little size, high latency, low bw ). On server it is full on disaster, reaching 50ns latency, clearly a design by idiots supervised by morons.
So instead of fixing L3 cache, they are throwing in additional level of caching? It probably does not take a rocket scientist, but i feel like the "new L2" is like AMD's L3 -> shared by core complex of 4 or maybe even 8? And then there is "the new L3" that serves the whole SoC.
It could work, Apple's SoCs are something like that.
So it would look like this:
L1 => 48KB of data cache
L2 => 256kb or 512kb INCLUSIVE in L3
L3 = > some 4-8MB per core, Apple has 16MB of L2 shared by perf core complex
L4 => most likely no longer "sliced" but rather served by SoC as system level cache, maybe even on different chip, 3D stacked, why not
Would make Apple/AMD like chip. If 8C perf complex with 32MB of L3 sounds familiar, well it's cause likely quite some of us are reading this on it today.
What i don't quite understand is how it would fix Intel's server pains unless they are going full chiplet route. Or stop designing server CPUs. Tho considering their recent "performance" i would not be suprised by another abomination where L3 caches actually hurt performance most of the time.
P.S. The more i think about it: this whole "L0 naming scheme" is inflicted by Intel's byzantine internal corporate politics and marketing idiots, who after realizing that it's basically carbon copy of AMD's chiplets, went on to obfuscate things with "L0 level" of caching.