- Mar 3, 2017
- 1,777
- 6,791
- 136
You must have had a bump earlier today on your head because Paul stopped being ceo a decade ago and hasn't been among the living for half that! But I agree with what you said. Jensen, Lisa and Pat need to get together for a CEO's luncheon, have a lot of drinks and some oregano in between bites of food and take turns prank calling that leno jawed bonehead whose videos are worth less than dung.I guess it's Pat now.
How come, you are so sure, that the IFoP is even a part of the ring? I could also imagine that it is directly attached to the L3 on a separate path. IMHO all memory traffic goes through that path which is why latencies of all cache stages as well as memory latency add up.A split ring (bisected), just means two rings - with one 'stop' (switch/router) from each ring to the other. 2 actually, for c-w and cc-w; but still just one stop for computing latency. You also need a 'node' for the IF connection to the ring (again, two of them). There is a math formula for it, but I don’t recall it at the moment. So, just start at node n=1 to n=10 and sum the latency for each node to the other, then divide by 10. I’m sure the route uses SPF (shortest path first). AMD could probably just make the IF node another route point - but I don’t know if they do. Intel had separate points for memory, I/O and ring to ring connections.
Anyway, pretty sure this is close to correct for napkin math - without knowing implementation details. Anyone who cares to can correct me.
You must have had a bump earlier today on your head because Paul stopped being ceo a decade ago and hasn't been among the living for half that!
That one would be a double p.I'm just going to blame it on the fact that they both have names starting with P. That and I think my brain wants to forget that Brian Krzanich ever happened to Intel.
I don't think the ring interconnect would be attached to IF. L3 also is not directly attached to IFOP but rather to SDF/SCF. IFOP/IFIS is at least a level below.How come, you are so sure, that the IFoP is even a part of the ring? I could also imagine that it is directly attached to the L3 on a separate path. IMHO all memory traffic goes through that path which is why latencies of all cache stages as well as memory latency add up.
As to the calculations:
I know how to calculate the average hops. The thing is, that for each topology the formula is different. And for more complex topologies like a bisected ring or the ladder (which sounds like a 2x4 grid/mesh to me) it is a bit of a pain in the a.. to simply count them. That is why I am wondering that seemingly no one in the world made an online calculator - maybe this is my next hobby project 🤔
Isn't that a bit nitpicky - is it? 😉I don't think the ring interconnect would be attached to IF. L3 also is not directly attached to IFOP but rather to SDF/SCF. IFOP/IFIS is at least a level below.
L2$ is inclusive of L1$ and L3$ has shadow tags for core private L2$. So if one core needs data from another core, the shadow tags are used to find out which core has the data. The data is not routed via L3$ however so I assumed there is some interconnect here, L3$ only contains data ejected from any of the L2$. Other than this I don't know of any other primitives for barrier synchronization, message passing etc., between cores. But for obvious reasons there will never be any public info around this anyway.Core to core communication over several CCDs is handled by L3 coherency, but on the same CCD that is not needed.
Data to and from IOD needs exactly same routing that other L3-traffic. Why would AMD make duplicated interconnect network for IOD-traffic only? Intel designs have memory controller a part of ring - as does AMD GPUs.Isn't that a bit nitpicky - is it? 😉
But yeah, that was my saying all along - the "interconnect to the IOD" - to be as general as possible, is not part of the ring.
Maybe you misunderstood me:Data to and from IOD needs exactly same routing that other L3-traffic. Why would AMD make duplicated interconnect network for IOD-traffic only? Intel designs have memory controller a part of ring - as does AMD GPUs.
Maybe you misunderstood me:
The L3 on Zen is exclusive to each CCD (unlike SPR by default). So there is absolutely zero L3 traffic via IFoP/ IOD - except for the L3 coherency, where AMD uses some kind of MOESI. And that is exactly the way the cores talk to each other when on separate CCDs - otherwise they would have horrible latency when going to the RAM. And that is the beauty: Although IFoP bandwidth is very limited, there is no common workload to my knowledge, where this is detrimental.
Care to share a source, that it is a ring stop? At least I can't see why this should be a given.Every bit of data in and out from CCD goes through that ifop link. And Ifop link is one of Zen3 ring stops. Just like Intel chips with ringbus - memory controller is at one ring stop.
why would you have 16 connections to a port that has such little bandwidth relative to the number of connections?Care to share a source, that it is a ring stop? At least I can't see why this should be a given.
Care to share a source, that it is a ring stop? At least I can't see why this should be a given.
Exactly because it is such a small bandwidth connection which costs relatively few transistors and nets you uniform RAM latency for each cache slice and doesn't introduce cross talk to the ring.why would you have 16 connections to a port that has such little bandwidth relative to the number of connections?
I am still failing to see proof in this. All I see is 8 cores connected to an L3 block which, as we already knew from another source, has its slices connected via some form of bidirectional ring. And then we have another connection from that block to the outside - but we have no idea how this is implemented.AMD's Zen3 presentation:
![]()
Every core and every L3-slice needs connection to other cores and IO. Ringbus is one widely used interconnection for that.
![]()
AMD Announces Ryzen 7 5800X3D, World's Fastest Gaming Processor -
AMD today announced its Spring 2022 update for the company’s Ryzen desktop processors, with as many as seven new processor models in the retail channel. The lineup is led by … Read Morewww.screenhacker.com
Well, yes, it would be through the SDF. I/O and memory need to be connected to the ring somehow. That’s the point of a ring. It wouldn’t make any sense to add a mesh or P2P interconnect for data under the ring. Using the ring only for cache snooping and l3$ to l3$ data transfers.I don't think the ring interconnect would be attached to IF. L3 also is not directly attached to IFOP but rather to SDF/SCF. IFOP/IFIS is at least a level below.
Ifop is not a small bandwidth connection. Ring also double acts as request queue/load balancing - with direct connection from each L3 slice to ifop there needs to be other ways to implement those, basically duplicated second interconnection network.Exactly because it is such a small bandwidth connection which costs relatively few transistors and nets you uniform RAM latency for each cache slice and doesn't introduce cross talk to the ring.
The L3 is unified to the CCD, so there is A LOT of traffic going on from L3 accesses alone - more than enough to justify a ring solely for this.Well, yes, it would be through the SDF. I/O and memory need to be connected to the ring somehow. That’s the point of a ring. It wouldn’t make any sense to add a mesh or P2P interconnect for data under the ring. Using the ring only for cache snooping and l3$ to l3$ data transfers.
Maybe we have a different understanding of "large bandwidth". The IFoP has 64/32 GByte/s, while the L3 has almost 1.5 TByte/s, see https://chipsandcheese.com/2023/04/23/amds-7950x3d-zen-4-gets-vcache/Ifop is not a small bandwidth connection. Ring also double acts as request queue/load balancing - with direct connection from each L3 slice to ifop there needs to be other ways to implement those, basically duplicated second interconnection network.
DUH. Let’s just say I hit my head this morning.The L3 is unified to the CCD, so there is A LOT of traffic going on from L3 accesses only - more than enough to justify a ring solely for this.
Maybe we have a different understanding of "large bandwidth". The IFoP has 64/32 GByte/s, while the L3 has almost 1.5 TByte/s, see https://chipsandcheese.com/2023/04/23/amds-7950x3d-zen-4-gets-vcache/
That is around 20x more, in case you might have missed that. At this point I am not so sure, if you got your facts together. So your statements seem less and less trustworthy.