SANDRA confirms it.
If we work under the assumption that the world's best server processor maker is competent, we should be trying to understand why Intel made the choices that they did.
Coming from you that is amazingly funny. I was deliberately stirring as it is known that intel is moving away from the full dual lane ring bus. But even if its still ring bus and is 1.25Mb per core the L3 still has a reason, its to keep cache coherency "simple".
KNL doesn't have an L3. It has to keep all the L2's coherrent, it also shares its L2 between two cores. KNL also has 3 modes the interconnect operates in because im assuming all-to-all has quite bad latency , i can't find any latency number for any of the modes, im guessing their not that great (they dont need to be for KNL, it just needs to have massive bandwidth). On top of this the L2 can also only do one read and write per cycle (it has two cores attached).
That doesn't sound very good for a latency sensitive core. If the fabric logic is the same for Skylake Xeon then you would expect each tile to only serve one core, now the questions are:
is the L3 victim or write back?
what mode does the fabric run in?
1. if the L3 is victim and the fabric is in Quadrant Mode, then you have something very similar to Zen CCX + GMI, but it would probably have more inter quadrant/ccx bandwidth.
2. If the L3 is write back then its not there for extra storage but to keep the L2 fast for the local core maybe they could run that in all-to-all mode. Hell EX cut L2 in half to 1mb per core and there are very few workloads that saw big performance losses but i think the first option looks better. The crazy choice would be both factors are configurable.