Intel was already running in to the wall with rings during Haswell-Broadwell generation. The largest server CPUs had two rings of 12 cores each and each ring had up to 5 more additional stops ( 2 for inter-ring, 1 for I/O, 1 for QPI IO, 1 for IMC/HA ).
It's a long story, but two fundamental things about Nehalem->SKL (client)) architectures worked to wreck this setup:
1) L3 cache was inclusive, that means everything in L2 had duplicate cache lines in L3, so any reads into L2 from memory or various "writes" were hitting L3
2) L3 was organized in slices on ring next to cores and slice to house certain cache line are selected by address bits. Basically if a core does a read or write, data request goes to certain slice on ring.
You can already see how that is very wrong when two rings are involved: CPU has cache lines all over two rings with varying levels of latency, IMCs are spread over two rings, so you touch both rings with read/write requests.
What is more, several cores with intensive memory access patterns can kill L3 caching (by evicting L3 cache lines for other cores) and saturate ring with request for whole chip,. And these chips all had 256KB of L2, they were super depended on LLC to carry the day.
Intel had mitigation of "Cluster on Die" -> basically splitting two rings into two NUMA nodes and going even further to split LLC cache domains, to contain LLC slices in "home" ring only. And they had a truckload of various monitoring tools, LLC allocation for VM "schemes" etc. Band aids at best and actively counterproductive in some cases.
And during HSW/BDW era cloud was taking off, noone wanted a system where a "rogue" client with 1-2 thread allocation, running say JVM type of load would kill performance for whole 48T chip?
So Intel set out to solve this problem in two ways:
1) Skylake server chips did away with 256KB of L2 and went to 1MB of L2 cache
2) L3 cache was no longer inclusive and die area was eaten by 4x larger L2 and massive AVX512 FMA units and supporting 512bit register arrays.
So amount of dependency on L3 and inter core bandwidth requirement during normal operation were cut big time due to these two factors and cloud guys (including server chimp like me) were happy. Of course Intel had to equip that chip with most anemic mesh and LLC scheme the world has ever seen (might have been beaten since by those ARM 80-128 core monsters that have kilobytes of LLC per core) . The performance scaling with more L3 was non-existant, total L3 bandwidth was horrible, resulting memory latency was okayish but as chip was basically relying on larger L2 to carry it.
I don't have any problems with ring on Alder Lake nor 2 more stops on Raptor Lake will "ruin" it. The ring has become bidirectional and wider, cores have 1.25-2MB of private L2, mitigating the achiles heel of Skylake stuff.
What is the problem, is horrible L3 latency and that is where Intel can make improvements. And it seems they are doing them in Raptor Lake.