- Mar 3, 2017
- 1,747
- 6,598
- 136
I don't think the ring interconnect would be attached to IF. L3 also is not directly attached to IFOP but rather to SDF/SCF. IFOP/IFIS is at least a level below.How come, you are so sure, that the IFoP is even a part of the ring? I could also imagine that it is directly attached to the L3 on a separate path. IMHO all memory traffic goes through that path which is why latencies of all cache stages as well as memory latency add up.
As to the calculations:
I know how to calculate the average hops. The thing is, that for each topology the formula is different. And for more complex topologies like a bisected ring or the ladder (which sounds like a 2x4 grid/mesh to me) it is a bit of a pain in the a.. to simply count them. That is why I am wondering that seemingly no one in the world made an online calculator - maybe this is my next hobby project 🤔

Isn't that a bit nitpicky - is it? 😉I don't think the ring interconnect would be attached to IF. L3 also is not directly attached to IFOP but rather to SDF/SCF. IFOP/IFIS is at least a level below.
L2$ is inclusive of L1$ and L3$ has shadow tags for core private L2$. So if one core needs data from another core, the shadow tags are used to find out which core has the data. The data is not routed via L3$ however so I assumed there is some interconnect here, L3$ only contains data ejected from any of the L2$. Other than this I don't know of any other primitives for barrier synchronization, message passing etc., between cores. But for obvious reasons there will never be any public info around this anyway.Core to core communication over several CCDs is handled by L3 coherency, but on the same CCD that is not needed.
Data to and from IOD needs exactly same routing that other L3-traffic. Why would AMD make duplicated interconnect network for IOD-traffic only? Intel designs have memory controller a part of ring - as does AMD GPUs.Isn't that a bit nitpicky - is it? 😉
But yeah, that was my saying all along - the "interconnect to the IOD" - to be as general as possible, is not part of the ring.
Maybe you misunderstood me:Data to and from IOD needs exactly same routing that other L3-traffic. Why would AMD make duplicated interconnect network for IOD-traffic only? Intel designs have memory controller a part of ring - as does AMD GPUs.
Maybe you misunderstood me:
The L3 on Zen is exclusive to each CCD (unlike SPR by default). So there is absolutely zero L3 traffic via IFoP/ IOD - except for the L3 coherency, where AMD uses some kind of MOESI. And that is exactly the way the cores talk to each other when on separate CCDs - otherwise they would have horrible latency when going to the RAM. And that is the beauty: Although IFoP bandwidth is very limited, there is no common workload to my knowledge, where this is detrimental.
Care to share a source, that it is a ring stop? At least I can't see why this should be a given.Every bit of data in and out from CCD goes through that ifop link. And Ifop link is one of Zen3 ring stops. Just like Intel chips with ringbus - memory controller is at one ring stop.
why would you have 16 connections to a port that has such little bandwidth relative to the number of connections?Care to share a source, that it is a ring stop? At least I can't see why this should be a given.
Care to share a source, that it is a ring stop? At least I can't see why this should be a given.
Exactly because it is such a small bandwidth connection which costs relatively few transistors and nets you uniform RAM latency for each cache slice and doesn't introduce cross talk to the ring.why would you have 16 connections to a port that has such little bandwidth relative to the number of connections?
I am still failing to see proof in this. All I see is 8 cores connected to an L3 block which, as we already knew from another source, has its slices connected via some form of bidirectional ring. And then we have another connection from that block to the outside - but we have no idea how this is implemented.AMD's Zen3 presentation:
![]()
Every core and every L3-slice needs connection to other cores and IO. Ringbus is one widely used interconnection for that.
![]()
AMD Announces Ryzen 7 5800X3D, World's Fastest Gaming Processor -
AMD today announced its Spring 2022 update for the company’s Ryzen desktop processors, with as many as seven new processor models in the retail channel. The lineup is led by … Read Morewww.screenhacker.com
Well, yes, it would be through the SDF. I/O and memory need to be connected to the ring somehow. That’s the point of a ring. It wouldn’t make any sense to add a mesh or P2P interconnect for data under the ring. Using the ring only for cache snooping and l3$ to l3$ data transfers.I don't think the ring interconnect would be attached to IF. L3 also is not directly attached to IFOP but rather to SDF/SCF. IFOP/IFIS is at least a level below.
Ifop is not a small bandwidth connection. Ring also double acts as request queue/load balancing - with direct connection from each L3 slice to ifop there needs to be other ways to implement those, basically duplicated second interconnection network.Exactly because it is such a small bandwidth connection which costs relatively few transistors and nets you uniform RAM latency for each cache slice and doesn't introduce cross talk to the ring.
The L3 is unified to the CCD, so there is A LOT of traffic going on from L3 accesses alone - more than enough to justify a ring solely for this.Well, yes, it would be through the SDF. I/O and memory need to be connected to the ring somehow. That’s the point of a ring. It wouldn’t make any sense to add a mesh or P2P interconnect for data under the ring. Using the ring only for cache snooping and l3$ to l3$ data transfers.
Maybe we have a different understanding of "large bandwidth". The IFoP has 64/32 GByte/s, while the L3 has almost 1.5 TByte/s, see https://chipsandcheese.com/2023/04/23/amds-7950x3d-zen-4-gets-vcache/Ifop is not a small bandwidth connection. Ring also double acts as request queue/load balancing - with direct connection from each L3 slice to ifop there needs to be other ways to implement those, basically duplicated second interconnection network.
DUH. Let’s just say I hit my head this morning.The L3 is unified to the CCD, so there is A LOT of traffic going on from L3 accesses only - more than enough to justify a ring solely for this.
Maybe we have a different understanding of "large bandwidth". The IFoP has 64/32 GByte/s, while the L3 has almost 1.5 TByte/s, see https://chipsandcheese.com/2023/04/23/amds-7950x3d-zen-4-gets-vcache/
That is around 20x more, in case you might have missed that. At this point I am not so sure, if you got your facts together. So your statements seem less and less trustworthy.
That one is fake, just an edit of already existing shot
It's stretched like the CPU shothow blurry the shot was should have been a dead giveaway it doesn't have the noisy fine grain a slipped shot takes from a partner testing an es. what's with the yellow animal?
what's with the yellow animal?
