Question How does the Intel CPU "Ring" work?

Timur Born · Jan 4, 2023

Hello.

Looking at both Anandtech's 13900K and 12900K review and my own core-to-core latency tests demonstrates that instead of ring or mesh like latencies there is a star like concentration of lower latencies around one physical core. Transfers from and to that core and its directly surrounding cores is faster, as if all load goes through that core first and then to the final destination core. And while one direction is slightly faster than the other the faster core-to-core connections are still faster even the other way around.

For Anandtech's 13900K sample this was physical core 2, on their 12900K sample it was core 5, on my own 13900K it is either core 6 or maybe 7 (they are very close). The 12900K review states "In our core-to-core latency test which showcases the physical topology of the chip", unfortunately there are no further details.

On a ring I would expect neighboring cores to always have lower latencies than further away cores. On a mesh I would expect interconnections between specific cores (like opposite of the ring) to be faster. But here I we see one specific core being the nexus of all transfers. Can someone explain this further?

This has practical implications. My own CPU uses cores 2+3 as highest clocking (58x) cores at stock. But going by core-to-core latencies and their equal quality in stability and temps it seems better to use cores 6+7 as highest clocking cores instead.

lopri · Jan 4, 2023

Wouldn’t that latency be decided by the clock speed? Higher frequency cores will likely have lower latencies, and vice versa. And which core clocks higher is a matter of silicon quality.

In any case you’re splitting hair.

Timur Born · Jan 4, 2023

No, frequency does not reflect the latencies I measured. My cores 4+5 are the lowest clocking ones at 55x. They still get lower latencies to core 6/7 than cores 2+3 that use higher frequencies.

Timur Born · Jan 4, 2023

And when I measured my latencies my core 3 was the highest clocking one at 60, with 2+3+6 at 59 and 7 at 58. I only switched clocks around now, because of the earlier measured latencies.

IntelUser2000 · Jan 5, 2023

lopri said:
Wouldn’t that latency be decided by the clock speed? Higher frequency cores will likely have lower latencies, and vice versa. And which core clocks higher is a matter of silicon quality.

Well, this is ring latency, so it'll be determined by the frequency of the RING not the cores.

Also, since the latency of the ring stays same, the relative latency of the ring will be slower for a higher clocked core than compared to a lower clocked core. Not that it'll matter since higher clocked core trumps slightly lower ring latency.

We don't know the exact details as to the latency spread. Since it goes to the System Agent and also to the integrated graphics, it could very well be faster from certain cores.

lopri · Jan 5, 2023

Are you talking about clock cycles? I am not sure what ‘relative latency’ means.

I thought the chart was showing latency in ns.

TheELF · Jan 5, 2023

Timur Born said:
This has practical implications. My own CPU uses cores 2+3 as highest clocking (58x) cores at stock. But going by core-to-core latencies and their equal quality in stability and temps it seems better to use cores 6+7 as highest clocking cores instead.

It doesn't matter, software threads that can use high clocks are not going to be cross talking to other threads constantly.
They (all threads always) have sync points where they send data to a "policing" thread that makes sure that everything runs well every once in a while, so cross core latency doesn't really matter.

There might be extremely niche server workloads where cross core latency might play a factor to actual performance.

Timur Born · Jan 5, 2023

So why does each of the tested CPUs (2x Anandtech + 1x mine) show latencies to and from one particular core and its neighbors to be fasters than other cores? How does that fit the "ring" concept?

IntelUser2000 · Jan 7, 2023

Timur Born said:
So why does each of the tested CPUs (2x Anandtech + 1x mine) show latencies to and from one particular core and its neighbors to be fasters than other cores? How does that fit the "ring" concept?

Read my post above.

Concept and theory is one thing, real world is another. Only the engineers know such fine details.

@lopri Yes same thing. Your CPU clocks faster, while the ring stays same, so relatively ring is slower. Because they are decoupled it does not matter what core you use to clock it higher.

Topweasel · Jan 7, 2023

The ring at least on this level wouldn't be an actual ring. Also just looking at the layout is more of a hybrid, setup with dies across being as directly connected as the ones left and right (and maybe even more so). So cores in the center would have more direct connections and lower latency out, so not like a star, more like a wave increasing latency as it moves out. Its so its like to a die kitty corner across, where on a circle it would seem to be the worst latency, now would be tied for second best. Edge cores on the opposite side and row would still have the worst latency, but it isn't going to be as straight forward as core 5 having the worst latency to core 1. Now the worst to core 1 would be core 7.

lopri · Jan 7, 2023

IntelUser2000 said:
@lopri Yes same thing. Your CPU clocks faster, while the ring stays same, so relatively ring is slower. Because they are decoupled it does not matter what core you use to clock it higher.

I know the difference between latency and clock cycles. (and they're not the same) OP apparently observes consistent Ring latency disparity from theoretically equal cores. I was wondering whether the ring will "visit" the faster core first in its route. (first among equals?)

In any case, it is a theory crafting as you implied above.

Timur Born · Jan 7, 2023

IntelUser2000 said:
Concept and theory is one thing, real world is another. Only the engineers know such fine details.

"Real world" demonstrates that core-to-core latencies revolve around a single core having faster lines to all other cores. Communicating between cores 0 and 1 is slower than communicating between cores 0 and 6. Now we try to make sense of this "real world" observation. You can try to contribute to that or not, but telling me off for discussing a "real world" test result seems rather odd.

I wish Anandtech would not have teased us with "In our core-to-core latency test which showcases the physical topology of the chip" just to *not* tell us about said topology then.

Timur Born · Jan 7, 2023

Topweasel said:
The ring at least on this level wouldn't be an actual ring. Also just looking at the layout is more of a hybrid, setup with dies across being as directly connected as the ones left and right (and maybe even more so).

If dies across were connected (as I hoped they would be) then I would expect latencies between cores 0 to 1 or 2 to 3 to be lower than latencies between cores 0 to 7 or 2 to 7. This was the original reason why I even conducted the tests, so find out if pairs of opposite cores have direct connections.

Edge cores on the opposite side and row would still have the worst latency, but it isn't going to be as straight forward as core 5 having the worst latency to core 1. Now the worst to core 1 would be core 7.

But this is not what we see here. Especially we see that my own CPU has cores *all* connections to and from cores 6/7 being the fastest while the Anandtech 12900K review sample had the same on core 2 and their 12900K sample had the same on core 5.

Seeing different samples of the same CPU behave differently makes me at least suspect that Intel Memory Latency Checker might mislead us. But Anandtech writes: "we feel ours is the most accurate to how quick an access between two cores can happen".

MadRat · Jan 7, 2023

Could be the lowest latency core simply is master, and all others act as slaves to IO.

I'm curious, though, how do you filter out byproducts of the OS scheduler?

Timur Born · Jan 7, 2023

Intel Memory Latency Checker works with fixed core affinities. You define a source and a destination core via command-line. It also makes sense to stop various background processes to make sure nothing is messing too much with the measurement. But even different per core frequencies don't seem to affect it too much (my core 2 was the highest clocking one at the time of the measurement and it still "lost" to cores 6/7).

Khanan · Jan 7, 2023

Isn’t it a double ring bus? 1 for the P cores and 1 for the E cores and then intersectioned.

Search

Question How does the Intel CPU "Ring" work?

Timur Born

Senior member

lopri

Elite Member

Timur Born

Senior member

Timur Born

Senior member

IntelUser2000

Elite Member

lopri

Elite Member

TheELF

Diamond Member

Timur Born

Senior member

IntelUser2000

Elite Member

Topweasel

Diamond Member

lopri

Elite Member

Timur Born

Senior member

Timur Born

Senior member

MadRat

Lifer

Timur Born

Senior member

Khanan

Senior member

TRENDING THREADS