Intel "mesh" vs intel "Ring"

csbin

Senior member
Feb 4, 2013
838
351
136
:sweat:

TzJMY.jpg

You need to put a description of the image in the post. This is an official warning
Markfw
AT Moderator
 
Last edited by a moderator:

Edrick

Golden Member
Feb 18, 2010
1,939
230
106
Sacrifice latency for more bandwidth. Probably more important in the server level tasks these chips were designed for.
 
  • Like
Reactions: nathanddrews

Ferzerp

Diamond Member
Oct 12, 1999
6,438
107
106
You're forgetting something that a ring bus necessitates, and that is a linear latency increase with each added node. If you tried to scale the ring up to those counts, the ring latency would be greater, so in fact no, they haven't "sacrificed latency for more bandwidth". It's actually reduced latency for what a ring would have.
 

NTMBK

Lifer
Nov 14, 2011
10,232
5,013
136
You're looking at too low a core count. The ring bus was killing them in the 24-core die last time, they had to use two rings tied together with buffered queues:

intel-xeon-e5-v4-hcc-1.png


The mesh is designed to scale much better to those core counts. The mesh latency should scale as sqrt(n), instead of linearly.

I'm curious to see what the next Xeon D uses. My guess is that it will stick with the "consumer" cores with no AVX512 and use a ringbus.
 

moinmoin

Diamond Member
Jun 1, 2017
4,944
7,656
136
The change from ring to mesh is absolutely a necessity as the amount of cores increases. What's concerning is the huge increase in die size that comes along with it. Or is that down to AVX512?
 
  • Like
Reactions: Arachnotronic

jpiniero

Lifer
Oct 1, 2010
14,585
5,209
136
That's the weird thing about the mesh. I was thinking the latency would be faster, not slower, since it could get to any other core in less hops.

This is a mess.
 
  • Like
Reactions: krumme

moinmoin

Diamond Member
Jun 1, 2017
4,944
7,656
136
What is said die size?
Was thinking of this post:
https://forums.anandtech.com/thread...-19-9-00-am-et.2428363/page-496#post-38943496
BTW, the die size has also grown quite a bit.

Broadwell-E 10C die is like 240; Skylake-X 10C die is around 325.

By my totally reliable estimations by looking at the die pict, the core size (inc L3) is about 17 mm2. Kaby Lake is by the same estimation 12.2 mm2.

It has very significantly grown to 17.0 mm2 indeed.

Ryzen_vs_Skylake_core_detail_small.jpg
 

wildhorse2k

Member
May 12, 2017
180
83
71
Actually the Ryzen latencies make more sense in the long run. You want very low latencies for low thread count applications and applications needing more threads need to optimize. Giving bad latency to all cores is a bad idea.

The problem with Ryzen is that perhaps the cross CCX latencies are way too high. We will see how it goes for Threadripper, but cross die latencies will probably be even higher than 140ns.
 

mtcn77

Member
Feb 25, 2017
105
22
91
Actually the Ryzen latencies make more sense in the long run. You want very low latencies for low thread count applications and applications needing more threads need to optimize. Giving bad latency to all cores is a bad idea.

The problem with Ryzen is that perhaps the cross CCX latencies are way too high. We will see how it goes for Threadripper, but cross die latencies will probably be even higher than 140ns.
You purchase low latency ram - problem solved.
 

moinmoin

Diamond Member
Jun 1, 2017
4,944
7,656
136
You purchase low latency ram - problem solved.
On that matter for Intel, we know that the ring bus didn't improve as much with higher speed/lower latency ram. Do we already have numbers on how the mesh behaves with higher speed/lower latency ram?
 

wildhorse2k

Member
May 12, 2017
180
83
71
On that matter for Intel, we know that the ring bus didn't improve much with higher speed/lower latency ram. Do we already have numbers on how the mesh behaves with higher speed/lower latency ram?

PcPer review shows it. DDR4 2400 you get about 105ns latency. DDR4 2800/uncore 2800 (stock uncore is 2.4Ghz) you get about 95ns. I saw comment from der8auer on youtube that you can run uncore up to 3-3.2Ghz. According to him the highest DDR4 frequency is 3600-4000. With that you could get down to perhaps 90ns-85ns (absolute best case). But TDP will very likely increase.

It would be unfair to only point out the negatives. According to http://www.tomshardware.com/reviews/intel-core-i9-7900x-skylake-x,5092-3.html cache has superior multi thread throughput.

On Ryzen by using DDR4 3200 the inter CCX latency reportedly drops to about 105ns which isn't too bad (not shown in review).
 

lolfail9001

Golden Member
Sep 9, 2016
1,056
353
96
On that matter for Intel, we know that the ring bus didn't improve as much with higher speed/lower latency ram. Do we already have numbers on how the mesh behaves with higher speed/lower latency ram?
Uncore is not really tied to memory clock on Intel CPUs.
 

Atari2600

Golden Member
Nov 22, 2016
1,409
1,655
136
The clowns were the ones ignoring or dismissing the issue.

You'd think that now the two main x86 design houses have decided to trade-off latency for improved bandwidth or scaleability then folks would accept that maybe latency isn't as important as they think.

Obviously not.

Well - I guess we'll know when benchmarks come out of SKL-X on games (since a high proportion of clowns use these exclusively as reflective of "performance") as to how important latency is when the software, compiler and scheduler are optimised to keep threads on the same cores where possible and prefetch data where possible.
Oh actually no. What will happen is the same clowns will pick out inappropriate results from inappropriate code and use it to justify their pre-conceived idiocy.
 
  • Like
Reactions: Drazick

itsmydamnation

Platinum Member
Feb 6, 2011
2,764
3,131
136
Actually the Ryzen latencies make more sense in the long run. You want very low latencies for low thread count applications and applications needing more threads need to optimize. Giving bad latency to all cores is a bad idea.

The problem with Ryzen is that perhaps the cross CCX latencies are way too high. We will see how it goes for Threadripper, but cross die latencies will probably be even higher than 140ns.
I think it'll be about 160ns to go inter die on there test, the big bit of the latency number will be cache coherency, which shouldnt grow compared to a single zepplin.
 

itsmydamnation

Platinum Member
Feb 6, 2011
2,764
3,131
136
There is a paper that i have a copy of ( i dont know where i got it) that has bulldozers NUMA latencies. Im basing my guesses of that. For BD going intra package was 41ns. Going one socket hop was only an extra 7ns on that. But its interconnects where much slower and its internal system topology is significantly worse.

But even in Ryzen the cache directory when going inter CCX will be checked to see if the dest memory address in is the other CCX or the fetch is to main memory, so that time shouldn't grow much as its just the physical transmit latency to add when querying a remote cache directory/memory controller + then the extra physical latency for the returned data/result.

Thus my guess of an extra 20ns.
 

beginner99

Diamond Member
Jun 2, 2009
5,210
1,580
136
Well - I guess we'll know when benchmarks come out of SKL-X on games (since a high proportion of clowns use these exclusively as reflective of "performance") as to how important latency is when the software, compiler and scheduler are optimised to keep threads on the same cores where possible and prefetch data where possible.
Oh actually no. What will happen is the same clowns will pick out inappropriate results from inappropriate code and use it to justify their pre-conceived idiocy.

We will see how latency and mesh vs ringbus affects gaming when Skylake-X and coffelake gaming reviews are here. The 6-core coffelake can then be directly compared to 7800kand my bet is coffelake will be the clear winner (even at same clocks).
 
  • Like
Reactions: DooKey